Forum > GPU crunching
[Split] PowerSpectrum Unit Test
Claggy:
My 128Mb 8400M GS on Vista32, and Merry Christmas:
Device: GeForce 8400M GS, 800 MHz clock, 114 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
FFT+PS+SM( 8) 1.2 GFlops 2.1 GB/s ulps(fft 1.3,ps 4775.9) [OK]
FFT+PS+SM( 16) 1.4 GFlops 1.8 GB/s ulps(fft 1.6,ps 4817.4) [OK]
FFT+PS+SM( 32) 1.4 GFlops 1.5 GB/s ulps(fft 1.6,ps 4628.1) [OK]
FFT+PS+SM( 64) 2.3 GFlops 2.2 GB/s ulps(fft 1.6,ps 4557.6) [OK]
FFT+PS+SM( 128) 3.6 GFlops 2.9 GB/s ulps(fft 2.0,ps 4942.0) [OK]
FFT+PS+SM( 256) 4.7 GFlops 3.4 GB/s ulps(fft 2.0,ps 4967.8) [OK]
FFT+PS+SM( 512) 5.6 GFlops 3.7 GB/s ulps(fft 2.1,ps 5128.1) [OK]
FFT+PS+SM( 1024) 5.5 GFlops 3.2 GB/s ulps(fft 2.5,ps 5552.5) [OK]
FFT+PS+SM( 2048) 5.5 GFlops 3.0 GB/s ulps(fft 2.7,ps 5770.3) [OK]
FFT+PS+SM( 4096) 5.3 GFlops 2.6 GB/s ulps(fft 2.4,ps 5313.7) [OK]
FFT+PS+SM( 8192) 4.7 GFlops 2.2 GB/s ulps(fft 2.8,ps 5881.1) [OK]
FFT+PS+SM( 16384) 4.4 GFlops 1.9 GB/s ulps(fft 3.3,ps 6399.1) [OK]
FFT+PS+SM( 32768) 5.0 GFlops 2.0 GB/s ulps(fft 3.3,ps 6380.1) [OK]
FFT+PS+SM( 65536) 5.2 GFlops 2.0 GB/s ulps(fft 3.4,ps 6534.8) [OK]
FFT+PS+SM(131072) 5.5 GFlops 2.0 GB/s ulps(fft 3.6,ps 6694.2) [OK]
Opt1 (worst case): 64 thrds/block
FFT+PS+SM( 8) 1.6 GFlops 2.8 GB/s ulps(fft 1.3,ps 4637.5) [OK]
FFT+PS+SM( 16) 1.9 GFlops 2.6 GB/s ulps(fft 1.6,ps 4589.2) [OK]
FFT+PS+SM( 32) 2.3 GFlops 2.5 GB/s ulps(fft 1.6,ps 4535.6) [OK]
FFT+PS+SM( 64) 3.1 GFlops 2.9 GB/s ulps(fft 1.6,ps 4426.7) [OK]
FFT+PS+SM( 128) 3.6 GFlops 3.0 GB/s ulps(fft 2.0,ps 4818.1) [OK]
FFT+PS+SM( 256) 4.8 GFlops 3.5 GB/s ulps(fft 2.0,ps 4831.0) [OK]
FFT+PS+SM( 512) 5.8 GFlops 3.8 GB/s ulps(fft 2.1,ps 4987.2) [OK]
FFT+PS+SM( 1024) 5.6 GFlops 3.3 GB/s ulps(fft 2.5,ps 5438.0) [OK]
FFT+PS+SM( 2048) 5.7 GFlops 3.1 GB/s ulps(fft 2.7,ps 5674.7) [OK]
FFT+PS+SM( 4096) 5.3 GFlops 2.7 GB/s ulps(fft 2.4,ps 5202.4) [OK]
FFT+PS+SM( 8192) 4.8 GFlops 2.2 GB/s ulps(fft 2.8,ps 5765.4) [OK]
FFT+PS+SM( 16384) 4.4 GFlops 1.9 GB/s ulps(fft 3.3,ps 6291.8) [OK]
FFT+PS+SM( 32768) 5.0 GFlops 2.0 GB/s ulps(fft 3.3,ps 6275.5) [OK]
FFT+PS+SM( 65536) 5.2 GFlops 2.0 GB/s ulps(fft 3.4,ps 6429.1) [OK]
FFT+PS+SM(131072) 5.5 GFlops 2.0 GB/s ulps(fft 3.6,ps 6590.4) [OK]
and 9800GTX+ on Win 7 x64:
Device: GeForce 9800 GTX/9800 GTX+, 1900 MHz clock, 496 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
FFT+PS+SM( 8) 8.1 GFlops 14.3 GB/s ulps(fft 1.3,ps 4775.9) [OK]
FFT+PS+SM( 16) 12.6 GFlops 17.2 GB/s ulps(fft 1.6,ps 4817.4) [OK]
FFT+PS+SM( 32) 16.6 GFlops 18.5 GB/s ulps(fft 1.6,ps 4628.1) [OK]
FFT+PS+SM( 64) 28.7 GFlops 27.1 GB/s ulps(fft 1.6,ps 4557.6) [OK]
FFT+PS+SM( 128) 42.1 GFlops 34.6 GB/s ulps(fft 2.0,ps 4942.0) [OK]
FFT+PS+SM( 256) 55.5 GFlops 40.4 GB/s ulps(fft 2.0,ps 4967.8) [OK]
FFT+PS+SM( 512) 68.2 GFlops 44.6 GB/s ulps(fft 2.1,ps 5128.1) [OK]
FFT+PS+SM( 1024) 72.3 GFlops 42.9 GB/s ulps(fft 2.5,ps 5552.5) [OK]
FFT+PS+SM( 2048) 70.7 GFlops 38.4 GB/s ulps(fft 2.7,ps 5770.3) [OK]
FFT+PS+SM( 4096) 66.1 GFlops 33.1 GB/s ulps(fft 2.4,ps 5313.7) [OK]
FFT+PS+SM( 8192) 64.2 GFlops 29.8 GB/s ulps(fft 2.8,ps 5881.1) [OK]
FFT+PS+SM( 16384) 60.7 GFlops 26.2 GB/s ulps(fft 3.3,ps 6399.1) [OK]
FFT+PS+SM( 32768) 56.1 GFlops 22.7 GB/s ulps(fft 3.3,ps 6380.1) [OK]
FFT+PS+SM( 65536) 62.0 GFlops 23.6 GB/s ulps(fft 3.4,ps 6534.8) [OK]
FFT+PS+SM(131072) 63.2 GFlops 22.7 GB/s ulps(fft 3.6,ps 6694.2) [OK]
Opt1 (worst case): 64 thrds/block
FFT+PS+SM( 8) 11.1 GFlops 19.6 GB/s ulps(fft 1.3,ps 4637.5) [OK]
FFT+PS+SM( 16) 19.4 GFlops 26.4 GB/s ulps(fft 1.6,ps 4589.2) [OK]
FFT+PS+SM( 32) 27.5 GFlops 30.7 GB/s ulps(fft 1.6,ps 4535.6) [OK]
FFT+PS+SM( 64) 40.8 GFlops 38.6 GB/s ulps(fft 1.6,ps 4426.7) [OK]
FFT+PS+SM( 128) 48.9 GFlops 40.2 GB/s ulps(fft 2.0,ps 4818.1) [OK]
FFT+PS+SM( 256) 64.2 GFlops 46.8 GB/s ulps(fft 2.0,ps 4831.0) [OK]
FFT+PS+SM( 512) 79.3 GFlops 51.8 GB/s ulps(fft 2.1,ps 4987.2) [OK]
FFT+PS+SM( 1024) 82.7 GFlops 49.0 GB/s ulps(fft 2.5,ps 5438.0) [OK]
FFT+PS+SM( 2048) 79.9 GFlops 43.3 GB/s ulps(fft 2.7,ps 5674.7) [OK]
FFT+PS+SM( 4096) 74.3 GFlops 37.2 GB/s ulps(fft 2.4,ps 5202.4) [OK]
FFT+PS+SM( 8192) 71.6 GFlops 33.2 GB/s ulps(fft 2.8,ps 5765.4) [OK]
FFT+PS+SM( 16384) 66.9 GFlops 28.9 GB/s ulps(fft 3.3,ps 6291.8) [OK]
FFT+PS+SM( 32768) 61.4 GFlops 24.9 GB/s ulps(fft 3.3,ps 6275.5) [OK]
FFT+PS+SM( 65536) 68.0 GFlops 25.9 GB/s ulps(fft 3.4,ps 6429.1) [OK]
FFT+PS+SM(131072) 69.3 GFlops 24.9 GB/s ulps(fft 3.6,ps 6590.4) [OK]
Claggy
glennaxl:
-device 0
--- Code: ---Device: GeForce GTX 295, 1476 MHz clock, 874 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
FFT+PS+SM( 8) 17.3 GFlops 30.4 GB/s ulps(fft 1.3,ps 4775.9) [OK]
FFT+PS+SM( 16) 23.2 GFlops 31.7 GB/s ulps(fft 1.6,ps 4817.4) [OK]
FFT+PS+SM( 32) 27.2 GFlops 30.4 GB/s ulps(fft 1.6,ps 4628.1) [OK]
FFT+PS+SM( 64) 43.8 GFlops 41.5 GB/s ulps(fft 1.6,ps 4557.6) [OK]
FFT+PS+SM( 128) 60.7 GFlops 49.9 GB/s ulps(fft 2.0,ps 4942.0) [OK]
FFT+PS+SM( 256) 75.6 GFlops 55.1 GB/s ulps(fft 2.0,ps 4967.8) [OK]
FFT+PS+SM( 512) 91.6 GFlops 59.9 GB/s ulps(fft 2.1,ps 5128.1) [OK]
FFT+PS+SM( 1024) 92.1 GFlops 54.6 GB/s ulps(fft 2.5,ps 5552.5) [OK]
FFT+PS+SM( 2048) 96.9 GFlops 52.6 GB/s ulps(fft 2.7,ps 5770.3) [OK]
FFT+PS+SM( 4096) 93.1 GFlops 46.6 GB/s ulps(fft 2.4,ps 5313.7) [OK]
FFT+PS+SM( 8192) 98.7 GFlops 45.8 GB/s ulps(fft 2.8,ps 5881.1) [OK]
FFT+PS+SM( 16384) 96.1 GFlops 41.6 GB/s ulps(fft 3.3,ps 6399.1) [OK]
FFT+PS+SM( 32768) 96.5 GFlops 39.1 GB/s ulps(fft 3.1,ps 6152.4) [OK]
FFT+PS+SM( 65536) 88.2 GFlops 33.6 GB/s ulps(fft 2.8,ps 5899.2) [OK]
FFT+PS+SM(131072) 94.4 GFlops 33.9 GB/s ulps(fft 3.6,ps 6694.2) [OK]
Opt1 (worst case): 128 thrds/block
FFT+PS+SM( 8) 25.0 GFlops 44.0 GB/s ulps(fft 1.3,ps 4637.5) [OK]
FFT+PS+SM( 16) 37.1 GFlops 50.6 GB/s ulps(fft 1.6,ps 4589.2) [OK]
FFT+PS+SM( 32) 49.8 GFlops 55.6 GB/s ulps(fft 1.6,ps 4535.6) [OK]
FFT+PS+SM( 64) 68.5 GFlops 64.9 GB/s ulps(fft 1.6,ps 4426.7) [OK]
FFT+PS+SM( 128) 81.4 GFlops 67.0 GB/s ulps(fft 2.0,ps 4818.1) [OK]
FFT+PS+SM( 256) 94.6 GFlops 68.9 GB/s ulps(fft 2.0,ps 4831.0) [OK]
FFT+PS+SM( 512) 115.9 GFlops 75.7 GB/s ulps(fft 2.1,ps 4987.2) [OK]
FFT+PS+SM( 1024) 122.4 GFlops 72.6 GB/s ulps(fft 2.5,ps 5438.0) [OK]
FFT+PS+SM( 2048) 124.9 GFlops 67.7 GB/s ulps(fft 2.7,ps 5674.7) [OK]
FFT+PS+SM( 4096) 113.9 GFlops 57.0 GB/s ulps(fft 2.4,ps 5202.4) [OK]
FFT+PS+SM( 8192) 120.5 GFlops 55.9 GB/s ulps(fft 2.8,ps 5765.4) [OK]
FFT+PS+SM( 16384) 121.6 GFlops 52.6 GB/s ulps(fft 3.3,ps 6291.8) [OK]
FFT+PS+SM( 32768) 120.1 GFlops 48.7 GB/s ulps(fft 3.1,ps 6041.9) [OK]
FFT+PS+SM( 65536) 103.7 GFlops 39.5 GB/s ulps(fft 2.8,ps 5782.9) [OK]
FFT+PS+SM(131072) 111.2 GFlops 40.0 GB/s ulps(fft 3.6,ps 6590.4) [OK]
--- End code ---
-device 1
--- Code: ---Device: GeForce GTX 295, 1476 MHz clock, 873 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
FFT+PS+SM( 8) 16.3 GFlops 28.7 GB/s ulps(fft 1.3,ps 4775.9) [OK]
FFT+PS+SM( 16) 22.9 GFlops 31.3 GB/s ulps(fft 1.6,ps 4817.4) [OK]
FFT+PS+SM( 32) 26.3 GFlops 29.3 GB/s ulps(fft 1.6,ps 4628.1) [OK]
FFT+PS+SM( 64) 42.1 GFlops 39.8 GB/s ulps(fft 1.6,ps 4557.6) [OK]
FFT+PS+SM( 128) 63.2 GFlops 52.0 GB/s ulps(fft 2.0,ps 4942.0) [OK]
FFT+PS+SM( 256) 75.0 GFlops 54.6 GB/s ulps(fft 2.0,ps 4967.8) [OK]
FFT+PS+SM( 512) 89.7 GFlops 58.6 GB/s ulps(fft 2.1,ps 5128.1) [OK]
FFT+PS+SM( 1024) 92.9 GFlops 55.1 GB/s ulps(fft 2.5,ps 5552.5) [OK]
FFT+PS+SM( 2048) 96.6 GFlops 52.4 GB/s ulps(fft 2.7,ps 5770.3) [OK]
FFT+PS+SM( 4096) 87.3 GFlops 43.7 GB/s ulps(fft 2.4,ps 5313.7) [OK]
FFT+PS+SM( 8192) 49.6 GFlops 23.0 GB/s ulps(fft 2.8,ps 5881.1) [OK]
FFT+PS+SM( 16384) 98.6 GFlops 42.6 GB/s ulps(fft 3.3,ps 6399.1) [OK]
FFT+PS+SM( 32768) 97.1 GFlops 39.3 GB/s ulps(fft 3.1,ps 6152.4) [OK]
FFT+PS+SM( 65536) 85.5 GFlops 32.6 GB/s ulps(fft 2.8,ps 5899.2) [OK]
FFT+PS+SM(131072) 91.4 GFlops 32.9 GB/s ulps(fft 3.6,ps 6694.2) [OK]
Opt1 (worst case): 128 thrds/block
FFT+PS+SM( 8) 24.5 GFlops 43.2 GB/s ulps(fft 1.3,ps 4637.5) [OK]
FFT+PS+SM( 16) 36.4 GFlops 49.7 GB/s ulps(fft 1.6,ps 4589.2) [OK]
FFT+PS+SM( 32) 48.8 GFlops 54.5 GB/s ulps(fft 1.6,ps 4535.6) [OK]
FFT+PS+SM( 64) 67.0 GFlops 63.4 GB/s ulps(fft 1.6,ps 4426.7) [OK]
FFT+PS+SM( 128) 79.6 GFlops 65.5 GB/s ulps(fft 2.0,ps 4818.1) [OK]
FFT+PS+SM( 256) 92.7 GFlops 67.5 GB/s ulps(fft 2.0,ps 4831.0) [OK]
FFT+PS+SM( 512) 113.9 GFlops 74.4 GB/s ulps(fft 2.1,ps 4987.2) [OK]
FFT+PS+SM( 1024) 118.9 GFlops 70.5 GB/s ulps(fft 2.5,ps 5438.0) [OK]
FFT+PS+SM( 2048) 122.9 GFlops 66.7 GB/s ulps(fft 2.7,ps 5674.7) [OK]
FFT+PS+SM( 4096) 111.8 GFlops 55.9 GB/s ulps(fft 2.4,ps 5202.4) [OK]
FFT+PS+SM( 8192) 117.7 GFlops 54.6 GB/s ulps(fft 2.8,ps 5765.4) [OK]
FFT+PS+SM( 16384) 118.7 GFlops 51.3 GB/s ulps(fft 3.3,ps 6291.8) [OK]
FFT+PS+SM( 32768) 117.7 GFlops 47.7 GB/s ulps(fft 3.1,ps 6041.9) [OK]
FFT+PS+SM( 65536) 101.2 GFlops 38.5 GB/s ulps(fft 2.8,ps 5782.9) [OK]
FFT+PS+SM(131072) 108.6 GFlops 39.0 GB/s ulps(fft 3.6,ps 6590.4) [OK]
--- End code ---
-device 2
--- Code: ---Device: GeForce GTX 260, 1487 MHz clock, 874 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
FFT+PS+SM( 8) 16.5 GFlops 29.2 GB/s ulps(fft 1.3,ps 4775.9) [OK]
FFT+PS+SM( 16) 23.1 GFlops 31.5 GB/s ulps(fft 1.6,ps 4817.4) [OK]
FFT+PS+SM( 32) 25.3 GFlops 28.3 GB/s ulps(fft 1.6,ps 4628.1) [OK]
FFT+PS+SM( 64) 41.3 GFlops 39.1 GB/s ulps(fft 1.6,ps 4557.6) [OK]
FFT+PS+SM( 128) 61.6 GFlops 50.7 GB/s ulps(fft 2.0,ps 4942.0) [OK]
FFT+PS+SM( 256) 72.0 GFlops 52.5 GB/s ulps(fft 2.0,ps 4967.8) [OK]
FFT+PS+SM( 512) 87.7 GFlops 57.3 GB/s ulps(fft 2.1,ps 5128.1) [OK]
FFT+PS+SM( 1024) 94.5 GFlops 56.0 GB/s ulps(fft 2.5,ps 5552.5) [OK]
FFT+PS+SM( 2048) 96.7 GFlops 52.5 GB/s ulps(fft 2.7,ps 5770.3) [OK]
FFT+PS+SM( 4096) 90.5 GFlops 45.2 GB/s ulps(fft 2.4,ps 5313.7) [OK]
FFT+PS+SM( 8192) 95.0 GFlops 44.1 GB/s ulps(fft 2.8,ps 5881.1) [OK]
FFT+PS+SM( 16384) 95.0 GFlops 41.1 GB/s ulps(fft 3.3,ps 6399.1) [OK]
FFT+PS+SM( 32768) 91.2 GFlops 36.9 GB/s ulps(fft 3.1,ps 6152.4) [OK]
FFT+PS+SM( 65536) 83.6 GFlops 31.8 GB/s ulps(fft 2.8,ps 5899.2) [OK]
FFT+PS+SM(131072) 90.6 GFlops 32.6 GB/s ulps(fft 3.6,ps 6694.2) [OK]
Opt1 (worst case): 128 thrds/block
FFT+PS+SM( 8) 24.1 GFlops 42.4 GB/s ulps(fft 1.3,ps 4637.5) [OK]
FFT+PS+SM( 16) 35.3 GFlops 48.2 GB/s ulps(fft 1.6,ps 4589.2) [OK]
FFT+PS+SM( 32) 47.1 GFlops 52.6 GB/s ulps(fft 1.6,ps 4535.6) [OK]
FFT+PS+SM( 64) 64.9 GFlops 61.4 GB/s ulps(fft 1.6,ps 4426.7) [OK]
FFT+PS+SM( 128) 77.0 GFlops 63.3 GB/s ulps(fft 2.0,ps 4818.1) [OK]
FFT+PS+SM( 256) 89.2 GFlops 65.0 GB/s ulps(fft 2.0,ps 4831.0) [OK]
FFT+PS+SM( 512) 110.0 GFlops 71.9 GB/s ulps(fft 2.1,ps 4987.2) [OK]
FFT+PS+SM( 1024) 118.1 GFlops 70.0 GB/s ulps(fft 2.5,ps 5438.0) [OK]
FFT+PS+SM( 2048) 118.8 GFlops 64.5 GB/s ulps(fft 2.7,ps 5674.7) [OK]
FFT+PS+SM( 4096) 110.6 GFlops 55.3 GB/s ulps(fft 2.4,ps 5202.4) [OK]
FFT+PS+SM( 8192) 116.2 GFlops 53.9 GB/s ulps(fft 2.8,ps 5765.4) [OK]
FFT+PS+SM( 16384) 116.1 GFlops 50.2 GB/s ulps(fft 3.3,ps 6291.8) [OK]
FFT+PS+SM( 32768) 108.7 GFlops 44.0 GB/s ulps(fft 3.1,ps 6041.9) [OK]
FFT+PS+SM( 65536) 97.8 GFlops 37.2 GB/s ulps(fft 2.8,ps 5782.9) [OK]
FFT+PS+SM(131072) 108.3 GFlops 38.9 GB/s ulps(fft 3.6,ps 6590.4) [OK]
--- End code ---
Jason G:
oooh, now my eyes have gone funny ;D
@All: Thanks very much and Merry Christmas!
Summary of what I can see:
- The newer&bigger the card, the more we seem to be able to extract
- Opt1 FFT (worst case) pipeline not slower than stock at any speed on any GPU so far. (Even the small GPUs)
- Seems stable [OK] on all
- 200 series holding in there
- Fermi peak starting to push unexpectedly high this early ( but still ~50% theoretical, will need to try streaming next as planned.)
I reckon we're getting a good start toward optimising multibeam now. With FFT, powerspectrum, & Summax reductions covered, we account for about ~40-50% of processing (depending on angle range). With a few more refinements to this area ( mainly streaming & findspikes itself to try) we should be ready to tackle the more challenging areas that remain (& dominate).
Long road still to travel, but I reckon we've managed to nail a few key techniques that will help dramatically with certain problem areas down the road.
Cheers, off to give things a short Christmas break before going through all that with a fine tooth comb.
Jason
_heinz:
Merry Christmas!
Thank you for the Christmas 2010 edition ;)
PowerSpectrumTest9.exe -device 0
Device: GeForce GTX 470, 810 MHz clock, 1248 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
FFT+PS+SM( 8) 10.5 GFlops 18.6 GB/s ulps(fft 1.2,ps 4389.0) [OK]
FFT+PS+SM( 16) 16.6 GFlops 22.6 GB/s ulps(fft 1.6,ps 4518.6) [OK]
FFT+PS+SM( 32) 21.6 GFlops 24.2 GB/s ulps(fft 1.3,ps 3977.6) [OK]
FFT+PS+SM( 64) 36.0 GFlops 34.1 GB/s ulps(fft 1.5,ps 4206.9) [OK]
FFT+PS+SM( 128) 52.7 GFlops 43.3 GB/s ulps(fft 1.7,ps 4351.9) [OK]
FFT+PS+SM( 256) 69.5 GFlops 50.6 GB/s ulps(fft 1.7,ps 4254.8) [OK]
FFT+PS+SM( 512) 94.6 GFlops 61.8 GB/s ulps(fft 1.8,ps 4305.7) [OK]
FFT+PS+SM( 1024) 107.8 GFlops 63.9 GB/s ulps(fft 2.1,ps 4725.7) [OK]
FFT+PS+SM( 2048) 118.0 GFlops 64.0 GB/s ulps(fft 2.2,ps 4918.4) [OK]
FFT+PS+SM( 4096) 125.2 GFlops 62.6 GB/s ulps(fft 2.2,ps 4762.0) [OK]
FFT+PS+SM( 8192) 131.7 GFlops 61.1 GB/s ulps(fft 2.6,ps 5275.5) [OK]
FFT+PS+SM( 16384) 113.8 GFlops 49.2 GB/s ulps(fft 2.6,ps 5355.0) [OK]
FFT+PS+SM( 32768) 121.3 GFlops 49.1 GB/s ulps(fft 2.3,ps 4987.7) [OK]
FFT+PS+SM( 65536) 121.6 GFlops 46.3 GB/s ulps(fft 2.0,ps 4601.3) [OK]
FFT+PS+SM(131072) 100.4 GFlops 36.1 GB/s ulps(fft 2.7,ps 5392.0) [OK]
Opt1 (worst case): 256 thrds/block
FFT+PS+SM( 8) 21.7 GFlops 38.3 GB/s ulps(fft 1.2,ps 4324.2) [OK]
FFT+PS+SM( 16) 37.7 GFlops 51.4 GB/s ulps(fft 1.6,ps 4326.2) [OK]
FFT+PS+SM( 32) 55.7 GFlops 62.1 GB/s ulps(fft 1.3,ps 4003.6) [OK]
FFT+PS+SM( 64) 73.3 GFlops 69.4 GB/s ulps(fft 1.5,ps 4270.2) [OK]
FFT+PS+SM( 128) 75.4 GFlops 62.0 GB/s ulps(fft 1.7,ps 4347.9) [OK]
FFT+PS+SM( 256) 106.5 GFlops 77.6 GB/s ulps(fft 1.7,ps 4261.8) [OK]
FFT+PS+SM( 512) 132.7 GFlops 86.7 GB/s ulps(fft 1.8,ps 4327.4) [OK]
FFT+PS+SM( 1024) 163.9 GFlops 97.2 GB/s ulps(fft 2.1,ps 4727.6) [OK]
FFT+PS+SM( 2048) 179.4 GFlops 97.3 GB/s ulps(fft 2.2,ps 4921.2) [OK]
FFT+PS+SM( 4096) 183.0 GFlops 91.5 GB/s ulps(fft 2.2,ps 4764.3) [OK]
FFT+PS+SM( 8192) 179.3 GFlops 83.2 GB/s ulps(fft 2.6,ps 5278.8) [OK]
FFT+PS+SM( 16384) 161.0 GFlops 69.6 GB/s ulps(fft 2.6,ps 5357.5) [OK]
FFT+PS+SM( 32768) 163.6 GFlops 66.3 GB/s ulps(fft 2.3,ps 4992.8) [OK]
FFT+PS+SM( 65536) 165.4 GFlops 63.0 GB/s ulps(fft 2.0,ps 4604.3) [OK]
FFT+PS+SM(131072) 146.7 GFlops 52.7 GB/s ulps(fft 2.7,ps 5392.8) [OK]
PowerSpectrumTest9.exe -device 1
Device: GeForce GTX 470, 810 MHz clock, 1249 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
FFT+PS+SM( 8) 11.7 GFlops 20.6 GB/s ulps(fft 1.2,ps 4389.0) [OK]
FFT+PS+SM( 16) 19.0 GFlops 26.0 GB/s ulps(fft 1.6,ps 4518.6) [OK]
FFT+PS+SM( 32) 21.7 GFlops 24.2 GB/s ulps(fft 1.3,ps 3977.6) [OK]
FFT+PS+SM( 64) 36.1 GFlops 34.2 GB/s ulps(fft 1.5,ps 4206.9) [OK]
FFT+PS+SM( 128) 52.7 GFlops 43.3 GB/s ulps(fft 1.7,ps 4351.9) [OK]
FFT+PS+SM( 256) 69.7 GFlops 50.8 GB/s ulps(fft 1.7,ps 4254.8) [OK]
FFT+PS+SM( 512) 90.4 GFlops 59.1 GB/s ulps(fft 1.8,ps 4305.7) [OK]
FFT+PS+SM( 1024) 99.8 GFlops 59.2 GB/s ulps(fft 2.1,ps 4725.7) [OK]
FFT+PS+SM( 2048) 109.7 GFlops 59.5 GB/s ulps(fft 2.2,ps 4918.4) [OK]
FFT+PS+SM( 4096) 117.8 GFlops 58.9 GB/s ulps(fft 2.2,ps 4762.0) [OK]
FFT+PS+SM( 8192) 126.7 GFlops 58.8 GB/s ulps(fft 2.6,ps 5275.5) [OK]
FFT+PS+SM( 16384) 113.9 GFlops 49.2 GB/s ulps(fft 2.6,ps 5355.0) [OK]
FFT+PS+SM( 32768) 121.2 GFlops 49.1 GB/s ulps(fft 2.3,ps 4987.7) [OK]
FFT+PS+SM( 65536) 121.5 GFlops 46.3 GB/s ulps(fft 2.0,ps 4601.3) [OK]
FFT+PS+SM(131072) 99.9 GFlops 35.9 GB/s ulps(fft 2.7,ps 5392.0) [OK]
Opt1 (worst case): 256 thrds/block
FFT+PS+SM( 8) 21.8 GFlops 38.5 GB/s ulps(fft 1.2,ps 4324.2) [OK]
FFT+PS+SM( 16) 37.8 GFlops 51.6 GB/s ulps(fft 1.6,ps 4326.2) [OK]
FFT+PS+SM( 32) 55.9 GFlops 62.4 GB/s ulps(fft 1.3,ps 4003.6) [OK]
FFT+PS+SM( 64) 73.6 GFlops 69.7 GB/s ulps(fft 1.5,ps 4270.2) [OK]
FFT+PS+SM( 128) 75.7 GFlops 62.3 GB/s ulps(fft 1.7,ps 4347.9) [OK]
FFT+PS+SM( 256) 107.0 GFlops 77.9 GB/s ulps(fft 1.7,ps 4261.8) [OK]
FFT+PS+SM( 512) 133.3 GFlops 87.1 GB/s ulps(fft 1.8,ps 4327.4) [OK]
FFT+PS+SM( 1024) 164.6 GFlops 97.6 GB/s ulps(fft 2.1,ps 4727.6) [OK]
FFT+PS+SM( 2048) 180.0 GFlops 97.6 GB/s ulps(fft 2.2,ps 4921.2) [OK]
FFT+PS+SM( 4096) 183.0 GFlops 91.5 GB/s ulps(fft 2.2,ps 4764.3) [OK]
FFT+PS+SM( 8192) 179.7 GFlops 83.3 GB/s ulps(fft 2.6,ps 5278.8) [OK]
FFT+PS+SM( 16384) 162.1 GFlops 70.1 GB/s ulps(fft 2.6,ps 5357.5) [OK]
FFT+PS+SM( 32768) 164.3 GFlops 66.6 GB/s ulps(fft 2.3,ps 4992.8) [OK]
FFT+PS+SM( 65536) 165.7 GFlops 63.1 GB/s ulps(fft 2.0,ps 4604.3) [OK]
FFT+PS+SM(131072) 147.5 GFlops 53.0 GB/s ulps(fft 2.7,ps 5392.8) [OK]
.
Done
PowerSpectrumTest9.exe -device 0
Device: ION, 1161 MHz clock, 242 MB memory.Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
FFT+PS+SM( 8) 1.2 GFlops 2.0 GB/s ulps(fft 1.3,ps 4775.9) [OK]
FFT+PS+SM( 16) 1.6 GFlops 2.2 GB/s ulps(fft 1.6,ps 4817.4) [OK]
FFT+PS+SM( 32) 1.6 GFlops 1.8 GB/s ulps(fft 1.6,ps 4628.1) [OK]
FFT+PS+SM( 64) 2.7 GFlops 2.6 GB/s ulps(fft 1.6,ps 4557.6) [OK]
FFT+PS+SM( 128) 3.9 GFlops 3.2 GB/s ulps(fft 2.0,ps 4942.0) [OK]
FFT+PS+SM( 256) 5.1 GFlops 3.7 GB/s ulps(fft 2.0,ps 4967.8) [OK]
FFT+PS+SM( 512) 6.1 GFlops 4.0 GB/s ulps(fft 2.1,ps 5128.1) [OK]
FFT+PS+SM( 1024) 5.9 GFlops 3.5 GB/s ulps(fft 2.5,ps 5552.5) [OK]
FFT+PS+SM( 2048) 6.2 GFlops 3.4 GB/s ulps(fft 2.7,ps 5770.3) [OK]
FFT+PS+SM( 4096) 5.2 GFlops 2.6 GB/s ulps(fft 2.4,ps 5313.7) [OK]
FFT+PS+SM( 8192) 5.1 GFlops 2.4 GB/s ulps(fft 2.8,ps 5881.1) [OK]
FFT+PS+SM( 16384) 4.9 GFlops 2.1 GB/s ulps(fft 3.3,ps 6399.1) [OK]
FFT+PS+SM( 32768) 5.1 GFlops 2.1 GB/s ulps(fft 3.3,ps 6380.1) [OK]
FFT+PS+SM( 65536) 5.3 GFlops 2.0 GB/s ulps(fft 3.4,ps 6534.8) [OK]
FFT+PS+SM(131072) 5.6 GFlops 2.0 GB/s ulps(fft 3.6,ps 6694.2) [OK]
Opt1 (worst case): 64 thrds/block
FFT+PS+SM( 8) 1.9 GFlops 3.3 GB/s ulps(fft 1.3,ps 4637.5) [OK]
FFT+PS+SM( 16) 2.4 GFlops 3.2 GB/s ulps(fft 1.6,ps 4589.2) [OK]
FFT+PS+SM( 32) 2.8 GFlops 3.1 GB/s ulps(fft 1.6,ps 4535.6) [OK]
FFT+PS+SM( 64) 3.8 GFlops 3.6 GB/s ulps(fft 1.6,ps 4426.7) [OK]
FFT+PS+SM( 128) 4.2 GFlops 3.5 GB/s ulps(fft 2.0,ps 4818.1) [OK]
FFT+PS+SM( 256) 5.4 GFlops 3.9 GB/s ulps(fft 2.0,ps 4831.0) [OK]
FFT+PS+SM( 512) 6.6 GFlops 4.3 GB/s ulps(fft 2.1,ps 4987.2) [OK]
FFT+PS+SM( 1024) 6.3 GFlops 3.7 GB/s ulps(fft 2.5,ps 5438.0) [OK]
FFT+PS+SM( 2048) 6.6 GFlops 3.6 GB/s ulps(fft 2.7,ps 5674.7) [OK]
FFT+PS+SM( 4096) 5.6 GFlops 2.8 GB/s ulps(fft 2.4,ps 5202.4) [OK]
FFT+PS+SM( 8192) 5.4 GFlops 2.5 GB/s ulps(fft 2.8,ps 5765.4) [OK]
FFT+PS+SM( 16384) 5.2 GFlops 2.2 GB/s ulps(fft 3.3,ps 6291.8) [OK]
FFT+PS+SM( 32768) 5.4 GFlops 2.2 GB/s ulps(fft 3.3,ps 6275.5) [OK]
FFT+PS+SM( 65536) 5.6 GFlops 2.1 GB/s ulps(fft 3.4,ps 6429.1) [OK]
FFT+PS+SM(131072) 5.8 GFlops 2.1 GB/s ulps(fft 3.6,ps 6590.4) [OK]
.
Done
perryjay:
Here's mine, Merry Christmas
Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.
C:\Users\perry>cd\test
C:\test>powerspectrumtest9.exe
Device: GeForce 9500 GT, 1848 MHz clock, 1006 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
FFT+PS+SM( 8) 1.2 GFlops 2.2 GB/s ulps(fft 1.3,ps 4775.9) [OK]
FFT+PS+SM( 16) 1.3 GFlops 1.8 GB/s ulps(fft 1.6,ps 4817.4) [OK]
FFT+PS+SM( 32) 2.0 GFlops 2.2 GB/s ulps(fft 1.6,ps 4628.1) [OK]
FFT+PS+SM( 64) 2.7 GFlops 2.5 GB/s ulps(fft 1.6,ps 4557.6) [OK]
FFT+PS+SM( 128) 3.8 GFlops 3.2 GB/s ulps(fft 2.0,ps 4942.0) [OK]
FFT+PS+SM( 256) 5.2 GFlops 3.8 GB/s ulps(fft 2.0,ps 4967.8) [OK]
FFT+PS+SM( 512) 3.1 GFlops 2.0 GB/s ulps(fft 2.1,ps 5128.1) [OK]
FFT+PS+SM( 1024) 5.7 GFlops 3.3 GB/s ulps(fft 2.5,ps 5552.5) [OK]
FFT+PS+SM( 2048) 6.5 GFlops 3.5 GB/s ulps(fft 2.7,ps 5770.3) [OK]
FFT+PS+SM( 4096) 5.5 GFlops 2.8 GB/s ulps(fft 2.4,ps 5313.7) [OK]
FFT+PS+SM( 8192) 5.9 GFlops 2.7 GB/s ulps(fft 2.8,ps 5881.1) [OK]
FFT+PS+SM( 16384) 4.7 GFlops 2.0 GB/s ulps(fft 3.3,ps 6399.1) [OK]
FFT+PS+SM( 32768) 6.1 GFlops 2.5 GB/s ulps(fft 3.3,ps 6380.1) [OK]
FFT+PS+SM( 65536) 5.8 GFlops 2.2 GB/s ulps(fft 3.4,ps 6534.8) [OK]
FFT+PS+SM(131072) 7.0 GFlops 2.5 GB/s ulps(fft 3.6,ps 6694.2) [OK]
Opt1 (worst case): 64 thrds/block
FFT+PS+SM( 8) 3.5 GFlops 6.1 GB/s ulps(fft 1.3,ps 4637.5) [OK]
FFT+PS+SM( 16) 5.4 GFlops 7.4 GB/s ulps(fft 1.6,ps 4589.2) [OK]
FFT+PS+SM( 32) 6.1 GFlops 6.8 GB/s ulps(fft 1.6,ps 4535.6) [OK]
FFT+PS+SM( 64) 8.9 GFlops 8.4 GB/s ulps(fft 1.6,ps 4426.7) [OK]
FFT+PS+SM( 128) 10.2 GFlops 8.4 GB/s ulps(fft 2.0,ps 4818.1) [OK]
FFT+PS+SM( 256) 12.2 GFlops 8.9 GB/s ulps(fft 2.0,ps 4831.0) [OK]
FFT+PS+SM( 512) 15.5 GFlops 10.2 GB/s ulps(fft 2.1,ps 4987.2) [OK]
FFT+PS+SM( 1024) 17.0 GFlops 10.1 GB/s ulps(fft 2.5,ps 5438.0) [OK]
FFT+PS+SM( 2048) 18.1 GFlops 9.8 GB/s ulps(fft 2.7,ps 5674.7) [OK]
FFT+PS+SM( 4096) 12.9 GFlops 6.5 GB/s ulps(fft 2.4,ps 5202.4) [OK]
FFT+PS+SM( 8192) 14.3 GFlops 6.7 GB/s ulps(fft 2.8,ps 5765.4) [OK]
FFT+PS+SM( 16384) 14.6 GFlops 6.3 GB/s ulps(fft 3.3,ps 6291.8) [OK]
FFT+PS+SM( 32768) 12.4 GFlops 5.0 GB/s ulps(fft 3.3,ps 6275.5) [OK]
FFT+PS+SM( 65536) 13.6 GFlops 5.2 GB/s ulps(fft 3.4,ps 6429.1) [OK]
FFT+PS+SM(131072) 13.9 GFlops 5.0 GB/s ulps(fft 3.6,ps 6590.4) [OK]
C:\test>
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version