Forum > Windows
GPU crunching question
Freddy:
Tested with 8800GTS 640MB Version (nothing done about the clock rate of memory or GPU)
min_n = 4
max_n = 4
RapidMind FFT Benchmark
-----------------------------------------------
Length: 16 = 2^4
Warming up...
Run timings, to and from host (in us):
10095.2 8976.7 9132.39 8718.98 8906.92
8904.71 8715.21 8833.48 8783.14 8836.1
8674.97 8913.12 8764.64 8645.37 8741.8
8818.75 9024.37 8807.76 8826.81 8911.87
9002.08 9067.97 8945.69 8910.78 8722.34
8785.37 8814.4 8836.28 8834.39 8795.27
8778.69 8968.62 8747 8943.26 9291.43
8890.32 8932.17 8860.98 8739.06 8734.42
8871.18 8755.89 8868.9 9068.03 8763.38
9002.55 8814.57 8864.37 8823.38 8856.53
8831.87 8614.2 8851.8 8697.95 8952.61
8711.42 8683.05 8912.46 8763.43 8755.46
8718.52 9060.99 8932.78 8812.21 8834.16
8825.66 8653.1 8801.54 8859.38 8665.22
8906.53 8957.47 8860.75 8777.11 8759.25
8845.62 9030.77 8915.02 8858.34 8676.31
8819.07 9009.46 8837.26 8762.6 8834.04
7046.69 8719.74 8610.55 8890.17 8839.04
9646.3 8775.46 8739.86 8720.51 9064.7
8947.07 8705.96 8704.77 8867.14 8880.16
Average execution time: 8842.67us
Normalized execution time (T/N): 552.667us/sample
Normalized by complexity (T/N lg N): 138.167
Mflops (5 N lg N/T): 0.0361882
Average execution time: 8842.67us
Minimum execution time: 7046.69us
Normalized average execution time (T/N): 552.667us/sample
Normalized minimum execution time (T/N): 440.418us/sample
Average time normalized by complexity (T/N lg N): 138.167
Minimum time normalized by complexity (T/N lg N): 110.105
Average Mflops (5 N lg N/T): 0.0361882
Peak Mflops (5 N lg N/T): 0.0454114
---
Warming up...
Run timings, GPU-local (in us):
8263.18 8381.39 8462.2 8356.22 8373.54
8503.47 8716.67 8385.77 8394.17 8419.64
8659.13 8294.88 8407.95 8567.22 8493.25
8384.13 8477.74 8508.42 8552.66 8398.76
8761.34 8573.63 8430.25 8437 8615.68
8464.32 8483.02 8540.84 8564.65 8566.38
8503.04 8614.77 8437.5 8545.99 8401.69
8442.15 8832.88 8638.04 8456.14 8492.51
8693.16 8371.29 8350.92 8427.35 8414.12
8851.89 8438.03 8443.12 8503.04 8665.21
8719.99 8375.58 8501.07 8526.01 8325.1
8614.5 8433.29 8432.5 8532.22 8529.62
8481.02 8251.49 8543.71 8523.21 8422.35
8640.62 8603.52 8661.46 8479.36 8548.6
8649.6 8542.74 8373.39 8379.29 8413.56
8598.13 8549.43 8460.99 8544.15 8515.79
8576.4 8485.85 8558.77 8380.95 8520.18
8764.88 8403.96 8483.77 8752.86 7361.6
8661.36 8332.67 8480.45 8310.8 8649.39
8708.75 8560.87 8488.33 8491.4 8473.15
Average execution time: 8495.79us
Minimum execution time: 7361.6us
Normalized average execution time (T/N): 530.987us/sample
Normalized minimum execution time (T/N): 460.1us/sample
Average time normalized by complexity (T/N lg N): 132.747
Minimum time normalized by complexity (T/N lg N): 115.025
BenchFFT average Mflops (5 N lg N/T): 0.0376657
BenchFFT peak Mflops (5 N lg N/T): 0.0434688
Residuals (compare with inverse):
Average absolute: 1.26059e-008
Maximum absolute: 5.96046e-008
Average relative: -1.#IND
Maximum relative: 1.#INF
-----------------------------------------------
RapidMind 2D FFT Benchmark
===============================================
Size: 256 x 256 = 2^8 x 2^8
Radix: 4 = 2^2
Total number of floating point operations: 5.24288e+006
Run timings, to and from host (in ms):
Average execution time: 13.7757ms
Overall average execution time: 13.7762ms
Minimum execution time: 13.2051ms
Average Mflops: 380.589
Peak Mflops: 397.035
Run timings, GPU-local (in ms):
Average execution time: 12.1273ms
Overall average execution time: 12.1279ms
Minimum execution time: 11.7326ms
Average Mflops: 432.32
Peak Mflops: 446.865
Both Tests end with an memory read error.
OS is Windows XP Pro 32 Bit .Net 2.0 is not installed
Serching for Errors will be done later when work is over...
Devaster:
for G80 is better a CUDA version , i may search on my home computer some apps by Hans Dorn - he had builded some test apps based on CUDA ...
WR-HW95:
With 8800GTX @ 612/975
--- Code: ---C:\Release-vc8>fft.exe
min_n = 4
max_n = 4
RapidMind FFT Benchmark
-----------------------------------------------
Length: 16 = 2^4
Warming up...
Run timings, to and from host (in us):
11561.3 10482.5 8229.39 12829.6 8740.71
9539.26 9745.74 10875.1 11149.2 9760.27
12356 8845.49 11541.2 8558.26 9808.89
9916.74 9238.06 9773.12 8477.23 7909.47
11607.7 10333.6 7918.13 11377.5 7920.09
10473.6 8454.32 9801.9 10972.9 10767
9267.11 11145.3 9876.5 9839.62 13427.2
8664.71 10973.7 11119.3 9176.86 9062.31
9811.68 8923.72 7202.85 9036.6 9994.13
8747.42 10002.8 10443.1 9761.39 9866.44
10177.1 10808.3 8371.89 10052 9621.96
10266 11904.4 9640.12 9375.24 8899.69
9294.78 10726.2 6828.72 12483.1 9911.99
12466.6 8385.58 7925.68 10416.3 9766.97
9917.02 11196.4 9642.64 10324.1 11035.8
9518.3 8512.15 10829 9727.86 12404.3
10707.5 10192.5 10868.4 7899.13 9340.32
8048.62 7750.77 11226.9 8889.35 9273.54
7777.87 7842.69 7471.92 8830.4 10697.4
11466.3 8701.59 8419.39 7942.44 9761.11
Average execution time: 9788.45us
Normalized execution time (T/N): 611.778us/sample
Normalized by complexity (T/N lg N): 152.945
Mflops (5 N lg N/T): 0.0326916
Average execution time: 9788.45us
Minimum execution time: 6828.72us
Normalized average execution time (T/N): 611.778us/sample
Normalized minimum execution time (T/N): 426.795us/sample
Average time normalized by complexity (T/N lg N): 152.945
Minimum time normalized by complexity (T/N lg N): 106.699
Average Mflops (5 N lg N/T): 0.0326916
Peak Mflops (5 N lg N/T): 0.0468609
---
Warming up...
Run timings, GPU-local (in us):
10815.9 11730.4 7816.99 7627.83 9804.42
9321.6 9801.34 9725.06 7585.92 9003.07
9982.68 6766.42 10917.9 8505.45 7894.38
10349.5 8926.79 11731.8 7668.62 8905.56
11206.2 9771.44 11598.2 8679.8 9933.78
9116.51 8855.83 9696 9815.87 8695.17
12109.5 9716.4 8787.65 8662.48 8444.54
7717.24 8718.36 9792.96 10747.7 9169.6
11555.5 8955.85 9709.7 6659.12 10377.2
9286.95 10160.9 11761.7 8587.87 12249.8
8761.67 10833.5 9495.95 7892.71 9270.47
9678.68 10709.1 9684.55 7819.5 10225.5
8822.58 12600.2 8660.8 8996.09 11010.3
6783.74 10320.5 10069.9 9703.83 10450.1
7650.74 10810.8 10639.8 9755.24 11815.3
8054.21 7740.15 10277.5 10128.5 10209.3
6895.78 7671.42 9653.26 9822.86 12298.4
10547.4 7820.62 7712.77 6761.39 8859.18
7419.95 8623.08 7702.71 8842.41 9383.91
9820.06 7636.21 8563.29 9718.36 8473.6
Average execution time: 9385.19us
Minimum execution time: 6659.12us
Normalized average execution time (T/N): 586.574us/sample
Normalized minimum execution time (T/N): 416.195us/sample
Average time normalized by complexity (T/N lg N): 146.644
Minimum time normalized by complexity (T/N lg N): 104.049
BenchFFT average Mflops (5 N lg N/T): 0.0340963
BenchFFT peak Mflops (5 N lg N/T): 0.0480544
Residuals (compare with inverse):
Average absolute: 1.26059e-008
Maximum absolute: 5.96046e-008
Average relative: -1.#IND
Maximum relative: 1.#INF
-----------------------------------------------
--- End code ---
--- Code: ---C:\Release-vc8>fft2d.exe
RapidMind 2D FFT Benchmark
===============================================
Size: 256 x 256 = 2^8 x 2^8
Radix: 4 = 2^2
Total number of floating point operations: 5.24288e+006
Run timings, to and from host (in ms):
Average execution time: 15.6239ms
Overall average execution time: 15.6285ms
Minimum execution time: 13.4389ms
Average Mflops: 335.568
Peak Mflops: 390.126
Run timings, GPU-local (in ms):
Average execution time: 13.8474ms
Overall average execution time: 13.851ms
Minimum execution time: 10.7656ms
Average Mflops: 378.619
Peak Mflops: 487.004
--- End code ---
It looks like this likes pretty much cpu speed too... above is ran with 2xrosetta and 3.05GHz Opteron 175.
I suspended Boinc and ran fft2d again.
--- Code: --- C:\Release-vc8>fft2d.exe
RapidMind 2D FFT Benchmark
===============================================
Size: 256 x 256 = 2^8 x 2^8
Radix: 4 = 2^2
Total number of floating point operations: 5.24288e+006
Run timings, to and from host (in ms):
Average execution time: 14.0743ms
Overall average execution time: 14.0783ms
Minimum execution time: 13.1137ms
Average Mflops: 372.515
Peak Mflops: 399.801
Run timings, GPU-local (in ms):
Average execution time: 12.3266ms
Overall average execution time: 12.3304ms
Minimum execution time: 10.2948ms
Average Mflops: 425.332
Peak Mflops: 509.276
--- End code ---
pepperammi:
--- Quote from: Devaster on 21 Feb 2007, 05:36:35 am ---for G80 is better a CUDA version , i may search on my home computer some apps by Hans Dorn - he had builded some test apps based on CUDA ...
--- End quote ---
I hear the 8900 series will have 25% more shaders or something and still the G80 chips. Apparently there all along. Would that mean anything to all this?
I wonder if will be able to unlock them like I think was possible on some older ATI at some point?
Devaster:
as i have wrote for older card are better a BrookGPU or Rapidmind...
for new cards are better CUDA (nVIDIA) or CTM (ATI)
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version