Phew! cool, thanks Not much headroom on that chip either, but I'll be happy with that small fraction improvement on the oldest cards for now. Moving onto test #9 soon, will add in the FFTs, then will stream the test kernels after that, just to see what that does... Progress at last
You're quite welcome. What exactly do you mean with 'not much headroom on that chip'?
Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.Compute capability 2.0Compiled with CUDA 3020. PowerSpectrum+summax Unit test #9 (FFT pipeline) Christmas 2010 edition.Stock: FFT+PS+SM( 8) 12.7 GFlops 22.4 GB/s ulps(fft 1.2,ps 4389.0) [OK] FFT+PS+SM( 16) 20.6 GFlops 28.1 GB/s ulps(fft 1.6,ps 4518.6) [OK] FFT+PS+SM( 32) 25.1 GFlops 28.0 GB/s ulps(fft 1.3,ps 3977.6) [OK] FFT+PS+SM( 64) 43.1 GFlops 40.8 GB/s ulps(fft 1.5,ps 4206.9) [OK] FFT+PS+SM( 128) 63.7 GFlops 52.4 GB/s ulps(fft 1.7,ps 4351.9) [OK] FFT+PS+SM( 256) 85.6 GFlops 62.4 GB/s ulps(fft 1.7,ps 4254.8) [OK] FFT+PS+SM( 512) 114.2 GFlops 74.6 GB/s ulps(fft 1.8,ps 4305.7) [OK] FFT+PS+SM( 1024) 136.7 GFlops 81.0 GB/s ulps(fft 2.1,ps 4725.7) [OK] FFT+PS+SM( 2048) 149.3 GFlops 81.0 GB/s ulps(fft 2.2,ps 4918.4) [OK] FFT+PS+SM( 4096) 154.1 GFlops 77.1 GB/s ulps(fft 2.2,ps 4762.0) [OK] FFT+PS+SM( 8192) 156.2 GFlops 72.4 GB/s ulps(fft 2.6,ps 5275.5) [OK] FFT+PS+SM( 16384) 149.2 GFlops 64.5 GB/s ulps(fft 2.6,ps 5355.0) [OK] FFT+PS+SM( 32768) 155.5 GFlops 63.0 GB/s ulps(fft 2.3,ps 4987.7) [OK] FFT+PS+SM( 65536) 152.0 GFlops 57.9 GB/s ulps(fft 2.0,ps 4601.3) [OK] FFT+PS+SM(131072) 134.7 GFlops 48.4 GB/s ulps(fft 2.7,ps 5392.0) [OK]Opt1 (worst case): 256 thrds/block FFT+PS+SM( 8) 19.2 GFlops 33.8 GB/s ulps(fft 1.2,ps 4324.2) [OK] FFT+PS+SM( 16) 37.0 GFlops 50.5 GB/s ulps(fft 1.6,ps 4326.2) [OK] FFT+PS+SM( 32) 61.1 GFlops 68.2 GB/s ulps(fft 1.3,ps 4003.6) [OK] FFT+PS+SM( 64) 86.9 GFlops 82.2 GB/s ulps(fft 1.5,ps 4270.2) [OK] FFT+PS+SM( 128) 93.4 GFlops 76.8 GB/s ulps(fft 1.7,ps 4347.9) [OK] FFT+PS+SM( 256) 137.0 GFlops 99.8 GB/s ulps(fft 1.7,ps 4261.8) [OK] FFT+PS+SM( 512) 174.8 GFlops 114.2 GB/s ulps(fft 1.8,ps 4327.4) [OK] FFT+PS+SM( 1024) 218.7 GFlops 129.6 GB/s ulps(fft 2.1,ps 4727.6) [OK] FFT+PS+SM( 2048) 231.2 GFlops 125.4 GB/s ulps(fft 2.2,ps 4921.2) [OK] FFT+PS+SM( 4096) 236.8 GFlops 118.4 GB/s ulps(fft 2.2,ps 4764.3) [OK] FFT+PS+SM( 8192) 229.0 GFlops 106.2 GB/s ulps(fft 2.6,ps 5278.8) [OK] FFT+PS+SM( 16384) 223.9 GFlops 96.8 GB/s ulps(fft 2.6,ps 5357.5) [OK] FFT+PS+SM( 32768) 216.0 GFlops 87.5 GB/s ulps(fft 2.3,ps 4992.8) [OK] FFT+PS+SM( 65536) 214.0 GFlops 81.5 GB/s ulps(fft 2.0,ps 4604.3) [OK] FFT+PS+SM(131072) 205.0 GFlops 73.7 GB/s ulps(fft 2.7,ps 5392.8) [OK]
Update: Powerspectrum Test #9 (Xmas edition)- full FFT processing added- Tightened peak/average tolerances to 0.001%- worst case Opt1 onlyTemporary download location: ftp://temp:temp@sinbadsvn.dyndns.org:31469/Jason_PowerSpectrum_Test/PowerSpectrumTest9.7z
Device: GeForce GTX 465, 1215 MHz clock, 994 MB memory.Compute capability 2.0Compiled with CUDA 3020. PowerSpectrum+summax Unit test #9 (FFT pipeline) Christmas 2010 edition.Stock: FFT+PS+SM( 8 ) 10.6 GFlops 18.7 GB/s ulps(fft 1.2,ps 4389.0) [OK] FFT+PS+SM( 16) 16.5 GFlops 22.5 GB/s ulps(fft 1.6,ps 4518.6) [OK] FFT+PS+SM( 32) 16.9 GFlops 18.9 GB/s ulps(fft 1.3,ps 3977.6) [OK] FFT+PS+SM( 64) 29.0 GFlops 27.4 GB/s ulps(fft 1.5,ps 4206.9) [OK] FFT+PS+SM( 128) 43.4 GFlops 35.7 GB/s ulps(fft 1.7,ps 4351.9) [OK] FFT+PS+SM( 256) 57.8 GFlops 42.1 GB/s ulps(fft 1.7,ps 4254.8 ) [OK] FFT+PS+SM( 512) 77.4 GFlops 50.6 GB/s ulps(fft 1.8,ps 4305.7) [OK] FFT+PS+SM( 1024) 92.9 GFlops 55.1 GB/s ulps(fft 2.1,ps 4725.7) [OK] FFT+PS+SM( 2048) 99.7 GFlops 54.1 GB/s ulps(fft 2.2,ps 4918.4) [OK] FFT+PS+SM( 4096) 101.1 GFlops 50.6 GB/s ulps(fft 2.2,ps 4762.0) [OK] FFT+PS+SM( 8192) 103.9 GFlops 48.2 GB/s ulps(fft 2.6,ps 5275.5) [OK] FFT+PS+SM( 16384) 103.1 GFlops 44.6 GB/s ulps(fft 2.6,ps 5355.0) [OK] FFT+PS+SM( 32768) 104.6 GFlops 42.4 GB/s ulps(fft 2.3,ps 4987.7) [OK] FFT+PS+SM( 65536) 102.4 GFlops 39.0 GB/s ulps(fft 2.0,ps 4601.3) [OK] FFT+PS+SM(131072) 93.8 GFlops 33.7 GB/s ulps(fft 2.7,ps 5392.0) [OK]Opt1 (worst case): 256 thrds/block FFT+PS+SM( 8 ) 20.5 GFlops 36.2 GB/s ulps(fft 1.2,ps 4324.2) [OK] FFT+PS+SM( 16) 33.7 GFlops 45.9 GB/s ulps(fft 1.6,ps 4326.2) [OK] FFT+PS+SM( 32) 47.3 GFlops 52.8 GB/s ulps(fft 1.3,ps 4003.6) [OK] FFT+PS+SM( 64) 60.0 GFlops 56.8 GB/s ulps(fft 1.5,ps 4270.2) [OK] FFT+PS+SM( 128) 59.0 GFlops 48.5 GB/s ulps(fft 1.7,ps 4347.9) [OK] FFT+PS+SM( 256) 85.8 GFlops 62.5 GB/s ulps(fft 1.7,ps 4261.8 ) [OK] FFT+PS+SM( 512) 109.0 GFlops 71.2 GB/s ulps(fft 1.8,ps 4327.4) [OK] FFT+PS+SM( 1024) 133.7 GFlops 79.3 GB/s ulps(fft 2.1,ps 4727.6) [OK] FFT+PS+SM( 2048) 136.9 GFlops 74.3 GB/s ulps(fft 2.2,ps 4921.2) [OK] FFT+PS+SM( 4096) 141.5 GFlops 70.7 GB/s ulps(fft 2.2,ps 4764.3) [OK] FFT+PS+SM( 8192) 136.7 GFlops 63.4 GB/s ulps(fft 2.6,ps 5278.8 ) [OK] FFT+PS+SM( 16384) 141.3 GFlops 61.1 GB/s ulps(fft 2.6,ps 5357.5) [OK] FFT+PS+SM( 32768) 134.9 GFlops 54.6 GB/s ulps(fft 2.3,ps 4992.8 ) [OK] FFT+PS+SM( 65536) 132.6 GFlops 50.5 GB/s ulps(fft 2.0,ps 4604.3) [OK] FFT+PS+SM(131072) 130.5 GFlops 46.9 GB/s ulps(fft 2.7,ps 5392.8 ) [OK]
Device: GeForce GTX 460, 1600 MHz clock, 768 MB memory.Compute capability 2.1Compiled with CUDA 3020. PowerSpectrum+summax Unit test #9 (FFT pipeline) Christmas 2010 edition.Stock: FFT+PS+SM( 8) 9.5 GFlops 16.7 GB/s ulps(fft 1.2,ps 4389.0) [OK] FFT+PS+SM( 16) 14.4 GFlops 19.7 GB/s ulps(fft 1.6,ps 4518.6) [OK] FFT+PS+SM( 32) 13.8 GFlops 15.4 GB/s ulps(fft 1.3,ps 3977.6) [OK] FFT+PS+SM( 64) 24.2 GFlops 22.9 GB/s ulps(fft 1.5,ps 4206.9) [OK] FFT+PS+SM( 128) 36.9 GFlops 30.4 GB/s ulps(fft 1.7,ps 4351.9) [OK] FFT+PS+SM( 256) 49.9 GFlops 36.3 GB/s ulps(fft 1.7,ps 4254.8) [OK] FFT+PS+SM( 512) 70.7 GFlops 46.2 GB/s ulps(fft 1.8,ps 4305.7) [OK] FFT+PS+SM( 1024) 90.4 GFlops 53.6 GB/s ulps(fft 2.1,ps 4725.7) [OK] FFT+PS+SM( 2048) 102.7 GFlops 55.7 GB/s ulps(fft 2.2,ps 4918.4) [OK] FFT+PS+SM( 4096) 111.2 GFlops 55.6 GB/s ulps(fft 2.2,ps 4762.0) [OK] FFT+PS+SM( 8192) 97.5 GFlops 45.2 GB/s ulps(fft 2.6,ps 5275.5) [OK] FFT+PS+SM( 16384) 93.4 GFlops 40.4 GB/s ulps(fft 2.6,ps 5355.0) [OK] FFT+PS+SM( 32768) 100.6 GFlops 40.7 GB/s ulps(fft 2.3,ps 4987.7) [OK] FFT+PS+SM( 65536) 106.9 GFlops 40.7 GB/s ulps(fft 2.0,ps 4601.3) [OK] FFT+PS+SM(131072) 86.9 GFlops 31.3 GB/s ulps(fft 2.7,ps 5392.0) [OK]Opt1 (worst case): 256 thrds/block FFT+PS+SM( 8) 16.5 GFlops 29.1 GB/s ulps(fft 1.2,ps 4324.2) [OK] FFT+PS+SM( 16) 27.2 GFlops 37.1 GB/s ulps(fft 1.6,ps 4326.2) [OK] FFT+PS+SM( 32) 38.4 GFlops 42.9 GB/s ulps(fft 1.3,ps 4003.6) [OK] FFT+PS+SM( 64) 49.9 GFlops 47.2 GB/s ulps(fft 1.5,ps 4270.2) [OK] FFT+PS+SM( 128) 45.0 GFlops 37.0 GB/s ulps(fft 1.7,ps 4347.9) [OK] FFT+PS+SM( 256) 64.5 GFlops 47.0 GB/s ulps(fft 1.7,ps 4261.8) [OK] FFT+PS+SM( 512) 82.9 GFlops 54.2 GB/s ulps(fft 1.8,ps 4327.4) [OK] FFT+PS+SM( 1024) 108.0 GFlops 64.0 GB/s ulps(fft 2.1,ps 4727.6) [OK] FFT+PS+SM( 2048) 123.3 GFlops 66.9 GB/s ulps(fft 2.2,ps 4921.2) [OK] FFT+PS+SM( 4096) 132.9 GFlops 66.4 GB/s ulps(fft 2.2,ps 4764.3) [OK] FFT+PS+SM( 8192) 111.0 GFlops 51.5 GB/s ulps(fft 2.6,ps 5278.8) [OK] FFT+PS+SM( 16384) 107.2 GFlops 46.3 GB/s ulps(fft 2.6,ps 5357.5) [OK] FFT+PS+SM( 32768) 111.4 GFlops 45.1 GB/s ulps(fft 2.3,ps 4992.8) [OK] FFT+PS+SM( 65536) 117.4 GFlops 44.7 GB/s ulps(fft 2.0,ps 4604.3) [OK] FFT+PS+SM(131072) 95.6 GFlops 34.4 GB/s ulps(fft 2.7,ps 5392.8) [OK]
And the 460-768...
Rehosting of the test on a faster connection.http://www.arkayn.us/seti/PowerSpectrumTest9.7z
Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.Compute capability 2.0Compiled with CUDA 3020. PowerSpectrum+summax Unit test #9 (FFT pipeline) Christmas 2010 edition.Stock: FFT+PS+SM( 21.2 GFlops 37.3 GB/s ulps(fft 1.2,ps 4389.0) [OK] FFT+PS+SM( 16) 30.5 GFlops 41.6 GB/s ulps(fft 1.6,ps 4518.6) [OK] FFT+PS+SM( 32) 30.7 GFlops 34.2 GB/s ulps(fft 1.3,ps 3977.6) [OK] FFT+PS+SM( 64) 50.3 GFlops 47.6 GB/s ulps(fft 1.5,ps 4206.9) [OK] FFT+PS+SM( 128) 73.0 GFlops 60.0 GB/s ulps(fft 1.7,ps 4351.9) [OK] FFT+PS+SM( 256) 92.7 GFlops 67.5 GB/s ulps(fft 1.7,ps 4254. [OK] FFT+PS+SM( 512) 125.8 GFlops 82.2 GB/s ulps(fft 1.8,ps 4305.7) [OK] FFT+PS+SM( 1024) 149.6 GFlops 88.7 GB/s ulps(fft 2.1,ps 4725.7) [OK] FFT+PS+SM( 2048) 163.0 GFlops 88.4 GB/s ulps(fft 2.2,ps 4918.4) [OK] FFT+PS+SM( 4096) 168.5 GFlops 84.2 GB/s ulps(fft 2.2,ps 4762.0) [OK] FFT+PS+SM( 8192) 170.0 GFlops 78.8 GB/s ulps(fft 2.6,ps 5275.5) [OK] FFT+PS+SM( 16384) 157.2 GFlops 68.0 GB/s ulps(fft 2.6,ps 5355.0) [OK] FFT+PS+SM( 32768) 167.4 GFlops 67.8 GB/s ulps(fft 2.3,ps 4987.7) [OK] FFT+PS+SM( 65536) 164.6 GFlops 62.7 GB/s ulps(fft 2.0,ps 4601.3) [OK] FFT+PS+SM(131072) 141.9 GFlops 51.0 GB/s ulps(fft 2.7,ps 5392.0) [OK]Opt1 (worst case): 256 thrds/block FFT+PS+SM( 37.4 GFlops 65.9 GB/s ulps(fft 1.2,ps 4324.2) [OK] FFT+PS+SM( 16) 58.9 GFlops 80.4 GB/s ulps(fft 1.6,ps 4326.2) [OK] FFT+PS+SM( 32) 81.7 GFlops 91.2 GB/s ulps(fft 1.3,ps 4003.6) [OK] FFT+PS+SM( 64) 102.4 GFlops 96.9 GB/s ulps(fft 1.5,ps 4270.2) [OK] FFT+PS+SM( 128) 100.5 GFlops 82.7 GB/s ulps(fft 1.7,ps 4347.9) [OK] FFT+PS+SM( 256) 142.2 GFlops 103.6 GB/s ulps(fft 1.7,ps 4261. [OK] FFT+PS+SM( 512) 177.3 GFlops 115.9 GB/s ulps(fft 1.8,ps 4327.4) [OK] FFT+PS+SM( 1024) 218.1 GFlops 129.3 GB/s ulps(fft 2.1,ps 4727.6) [OK] FFT+PS+SM( 2048) 233.4 GFlops 126.6 GB/s ulps(fft 2.2,ps 4921.2) [OK] FFT+PS+SM( 4096) 238.4 GFlops 119.2 GB/s ulps(fft 2.2,ps 4764.3) [OK] FFT+PS+SM( 8192) 229.6 GFlops 106.5 GB/s ulps(fft 2.6,ps 5278. [OK] FFT+PS+SM( 16384) 217.5 GFlops 94.1 GB/s ulps(fft 2.6,ps 5357.5) [OK] FFT+PS+SM( 32768) 213.6 GFlops 86.5 GB/s ulps(fft 2.3,ps 4992. [OK] FFT+PS+SM( 65536) 213.2 GFlops 81.2 GB/s ulps(fft 2.0,ps 4604.3) [OK] FFT+PS+SM(131072) 198.0 GFlops 71.2 GB/s ulps(fft 2.7,ps 5392. [OK]
Device: GeForce 9800 GTX/9800 GTX+, 1890 MHz clock, 498 MB memory.Compute capability 1.1Compiled with CUDA 3020. PowerSpectrum+summax Unit test #9 (FFT pipeline) Christmas 2010 edition.Stock: FFT+PS+SM( 8) 6.9 GFlops 12.2 GB/s ulps(fft 1.3,ps 4775.9) [OK] FFT+PS+SM( 16) 11.8 GFlops 16.1 GB/s ulps(fft 1.6,ps 4817.4) [OK] FFT+PS+SM( 32) 15.6 GFlops 17.4 GB/s ulps(fft 1.6,ps 4628.1) [OK] FFT+PS+SM( 64) 26.2 GFlops 24.8 GB/s ulps(fft 1.6,ps 4557.6) [OK] FFT+PS+SM( 128) 36.6 GFlops 30.1 GB/s ulps(fft 2.0,ps 4942.0) [OK] FFT+PS+SM( 256) 48.7 GFlops 35.5 GB/s ulps(fft 2.0,ps 4967.8) [OK] FFT+PS+SM( 512) 57.8 GFlops 37.8 GB/s ulps(fft 2.1,ps 5128.1) [OK] FFT+PS+SM( 1024) 62.9 GFlops 37.3 GB/s ulps(fft 2.5,ps 5552.5) [OK] FFT+PS+SM( 2048) 61.7 GFlops 33.5 GB/s ulps(fft 2.7,ps 5770.3) [OK] FFT+PS+SM( 4096) 57.6 GFlops 28.8 GB/s ulps(fft 2.4,ps 5313.7) [OK] FFT+PS+SM( 8192) 56.7 GFlops 26.3 GB/s ulps(fft 2.8,ps 5881.1) [OK] FFT+PS+SM( 16384) 52.5 GFlops 22.7 GB/s ulps(fft 3.3,ps 6399.1) [OK] FFT+PS+SM( 32768) 50.3 GFlops 20.4 GB/s ulps(fft 3.3,ps 6380.1) [OK] FFT+PS+SM( 65536) 55.3 GFlops 21.1 GB/s ulps(fft 3.4,ps 6534.8) [OK] FFT+PS+SM(131072) 56.9 GFlops 20.5 GB/s ulps(fft 3.6,ps 6694.2) [OK]Opt1 (worst case): 64 thrds/block FFT+PS+SM( 8) 14.9 GFlops 26.2 GB/s ulps(fft 1.3,ps 4637.5) [OK] FFT+PS+SM( 16) 23.3 GFlops 31.8 GB/s ulps(fft 1.6,ps 4589.2) [OK] FFT+PS+SM( 32) 30.5 GFlops 34.0 GB/s ulps(fft 1.6,ps 4535.6) [OK] FFT+PS+SM( 64) 43.2 GFlops 40.9 GB/s ulps(fft 1.6,ps 4426.7) [OK] FFT+PS+SM( 128) 49.8 GFlops 41.0 GB/s ulps(fft 2.0,ps 4818.1) [OK] FFT+PS+SM( 256) 64.9 GFlops 47.3 GB/s ulps(fft 2.0,ps 4831.0) [OK] FFT+PS+SM( 512) 79.3 GFlops 51.8 GB/s ulps(fft 2.1,ps 4987.2) [OK] FFT+PS+SM( 1024) 81.9 GFlops 48.6 GB/s ulps(fft 2.5,ps 5438.0) [OK] FFT+PS+SM( 2048) 78.1 GFlops 42.4 GB/s ulps(fft 2.7,ps 5674.7) [OK] FFT+PS+SM( 4096) 73.3 GFlops 36.7 GB/s ulps(fft 2.4,ps 5202.4) [OK] FFT+PS+SM( 8192) 70.5 GFlops 32.7 GB/s ulps(fft 2.8,ps 5765.4) [OK] FFT+PS+SM( 16384) 65.7 GFlops 28.4 GB/s ulps(fft 3.3,ps 6291.8) [OK] FFT+PS+SM( 32768) 60.7 GFlops 24.6 GB/s ulps(fft 3.3,ps 6275.5) [OK] FFT+PS+SM( 65536) 67.0 GFlops 25.5 GB/s ulps(fft 3.4,ps 6429.1) [OK] FFT+PS+SM(131072) 68.5 GFlops 24.6 GB/s ulps(fft 3.6,ps 6590.4) [OK]
Device: GeForce 9800 GT, 1500 MHz clock, 512 MB memory.Compute capability 1.1Compiled with CUDA 3020. PowerSpectrum+summax Unit test #9 (FFT pipeline) Christmas 2010 edition.Stock: FFT+PS+SM( 8) 6.6 GFlops 11.6 GB/s ulps(fft 1.3,ps 4775.9) [OK] FFT+PS+SM( 16) 10.5 GFlops 14.3 GB/s ulps(fft 1.6,ps 4817.4) [OK] FFT+PS+SM( 32) 13.0 GFlops 14.5 GB/s ulps(fft 1.6,ps 4628.1) [OK] FFT+PS+SM( 64) 22.4 GFlops 21.2 GB/s ulps(fft 1.6,ps 4557.6) [OK] FFT+PS+SM( 128) 33.8 GFlops 27.8 GB/s ulps(fft 2.0,ps 4942.0) [OK] FFT+PS+SM( 256) 45.2 GFlops 32.9 GB/s ulps(fft 2.0,ps 4967.8) [OK] FFT+PS+SM( 512) 56.0 GFlops 36.6 GB/s ulps(fft 2.1,ps 5128.1) [OK] FFT+PS+SM( 1024) 57.6 GFlops 34.1 GB/s ulps(fft 2.5,ps 5552.5) [OK] FFT+PS+SM( 2048) 57.4 GFlops 31.1 GB/s ulps(fft 2.7,ps 5770.3) [OK] FFT+PS+SM( 4096) 50.4 GFlops 25.2 GB/s ulps(fft 2.4,ps 5313.7) [OK] FFT+PS+SM( 8192) 48.9 GFlops 22.7 GB/s ulps(fft 2.8,ps 5881.1) [OK] FFT+PS+SM( 16384) 46.8 GFlops 20.3 GB/s ulps(fft 3.3,ps 6399.1) [OK] FFT+PS+SM( 32768) 42.4 GFlops 17.2 GB/s ulps(fft 3.3,ps 6380.1) [OK] FFT+PS+SM( 65536) 47.8 GFlops 18.2 GB/s ulps(fft 3.4,ps 6534.8) [OK] FFT+PS+SM(131072) 50.5 GFlops 18.1 GB/s ulps(fft 3.6,ps 6694.2) [OK]Opt1 (worst case): 64 thrds/block FFT+PS+SM( 8) 9.7 GFlops 17.2 GB/s ulps(fft 1.3,ps 4637.5) [OK] FFT+PS+SM( 16) 16.0 GFlops 21.9 GB/s ulps(fft 1.6,ps 4589.2) [OK] FFT+PS+SM( 32) 21.5 GFlops 24.0 GB/s ulps(fft 1.6,ps 4535.6) [OK] FFT+PS+SM( 64) 31.1 GFlops 29.4 GB/s ulps(fft 1.6,ps 4426.7) [OK] FFT+PS+SM( 128) 36.3 GFlops 29.9 GB/s ulps(fft 2.0,ps 4818.1) [OK] FFT+PS+SM( 256) 47.7 GFlops 34.8 GB/s ulps(fft 2.0,ps 4831.0) [OK] FFT+PS+SM( 512) 58.6 GFlops 38.3 GB/s ulps(fft 2.1,ps 4987.2) [OK] FFT+PS+SM( 1024) 59.7 GFlops 35.4 GB/s ulps(fft 2.5,ps 5438.0) [OK] FFT+PS+SM( 2048) 59.0 GFlops 32.0 GB/s ulps(fft 2.7,ps 5674.7) [OK] FFT+PS+SM( 4096) 51.9 GFlops 26.0 GB/s ulps(fft 2.4,ps 5202.4) [OK] FFT+PS+SM( 8192) 50.0 GFlops 23.2 GB/s ulps(fft 2.8,ps 5765.4) [OK] FFT+PS+SM( 16384) 47.7 GFlops 20.6 GB/s ulps(fft 3.3,ps 6291.8) [OK] FFT+PS+SM( 32768) 43.2 GFlops 17.5 GB/s ulps(fft 3.3,ps 6275.5) [OK] FFT+PS+SM( 65536) 48.7 GFlops 18.6 GB/s ulps(fft 3.4,ps 6429.1) [OK] FFT+PS+SM(131072) 51.6 GFlops 18.6 GB/s ulps(fft 3.6,ps 6590.4) [OK]
Device: GeForce GTX 470, 1215 MHz clock, 1280 MB memory.Compute capability 2.0Compiled with CUDA 3020. PowerSpectrum+summax Unit test #9 (FFT pipeline) Christmas 2010 edition.Stock: FFT+PS+SM( 8) 7.9 GFlops 14.0 GB/s ulps(fft 1.2,ps 4389.0) [OK] FFT+PS+SM( 16) 14.0 GFlops 19.1 GB/s ulps(fft 1.6,ps 4518.6) [OK] FFT+PS+SM( 32) 17.7 GFlops 19.7 GB/s ulps(fft 1.3,ps 3977.6) [OK] FFT+PS+SM( 64) 32.4 GFlops 30.7 GB/s ulps(fft 1.5,ps 4206.9) [OK] FFT+PS+SM( 128) 51.7 GFlops 42.6 GB/s ulps(fft 1.7,ps 4351.9) [OK] FFT+PS+SM( 256) 72.0 GFlops 52.5 GB/s ulps(fft 1.7,ps 4254.8) [OK] FFT+PS+SM( 512) 100.4 GFlops 65.6 GB/s ulps(fft 1.8,ps 4305.7) [OK] FFT+PS+SM( 1024) 124.9 GFlops 74.1 GB/s ulps(fft 2.1,ps 4725.7) [OK] FFT+PS+SM( 2048) 136.6 GFlops 74.1 GB/s ulps(fft 2.2,ps 4918.4) [OK] FFT+PS+SM( 4096) 139.1 GFlops 69.6 GB/s ulps(fft 2.2,ps 4762.0) [OK] FFT+PS+SM( 8192) 141.0 GFlops 65.4 GB/s ulps(fft 2.6,ps 5275.5) [OK] FFT+PS+SM( 16384) 132.7 GFlops 57.4 GB/s ulps(fft 2.6,ps 5355.0) [OK] FFT+PS+SM( 32768) 137.9 GFlops 55.9 GB/s ulps(fft 2.3,ps 4987.7) [OK] FFT+PS+SM( 65536) 134.5 GFlops 51.2 GB/s ulps(fft 2.0,ps 4601.3) [OK] FFT+PS+SM(131072) 116.0 GFlops 41.7 GB/s ulps(fft 2.7,ps 5392.0) [OK]Opt1 (worst case): 256 thrds/block FFT+PS+SM( 8) 14.2 GFlops 25.1 GB/s ulps(fft 1.2,ps 4324.2) [OK] FFT+PS+SM( 16) 27.2 GFlops 37.1 GB/s ulps(fft 1.6,ps 4326.2) [OK] FFT+PS+SM( 32) 43.9 GFlops 49.0 GB/s ulps(fft 1.3,ps 4003.6) [OK] FFT+PS+SM( 64) 61.3 GFlops 58.0 GB/s ulps(fft 1.5,ps 4270.2) [OK] FFT+PS+SM( 128) 65.6 GFlops 54.0 GB/s ulps(fft 1.7,ps 4347.9) [OK] FFT+PS+SM( 256) 95.7 GFlops 69.7 GB/s ulps(fft 1.7,ps 4261.8) [OK] FFT+PS+SM( 512) 121.1 GFlops 79.2 GB/s ulps(fft 1.8,ps 4327.4) [OK] FFT+PS+SM( 1024) 153.4 GFlops 91.0 GB/s ulps(fft 2.1,ps 4727.6) [OK] FFT+PS+SM( 2048) 161.9 GFlops 87.8 GB/s ulps(fft 2.2,ps 4921.2) [OK] FFT+PS+SM( 4096) 168.3 GFlops 84.2 GB/s ulps(fft 2.2,ps 4764.3) [OK] FFT+PS+SM( 8192) 157.7 GFlops 73.1 GB/s ulps(fft 2.6,ps 5278.8) [OK] FFT+PS+SM( 16384) 155.1 GFlops 67.1 GB/s ulps(fft 2.6,ps 5357.5) [OK] FFT+PS+SM( 32768) 151.9 GFlops 61.5 GB/s ulps(fft 2.3,ps 4992.8) [OK] FFT+PS+SM( 65536) 150.7 GFlops 57.4 GB/s ulps(fft 2.0,ps 4604.3) [OK] FFT+PS+SM(131072) 137.2 GFlops 49.3 GB/s ulps(fft 2.7,ps 5392.8) [OK]
Device: GeForce 8400M GS, 800 MHz clock, 114 MB memory.Compute capability 1.1Compiled with CUDA 3020. PowerSpectrum+summax Unit test #9 (FFT pipeline) Christmas 2010 edition.Stock: FFT+PS+SM( 8) 1.2 GFlops 2.1 GB/s ulps(fft 1.3,ps 4775.9) [OK] FFT+PS+SM( 16) 1.4 GFlops 1.8 GB/s ulps(fft 1.6,ps 4817.4) [OK] FFT+PS+SM( 32) 1.4 GFlops 1.5 GB/s ulps(fft 1.6,ps 4628.1) [OK] FFT+PS+SM( 64) 2.3 GFlops 2.2 GB/s ulps(fft 1.6,ps 4557.6) [OK] FFT+PS+SM( 128) 3.6 GFlops 2.9 GB/s ulps(fft 2.0,ps 4942.0) [OK] FFT+PS+SM( 256) 4.7 GFlops 3.4 GB/s ulps(fft 2.0,ps 4967.8) [OK] FFT+PS+SM( 512) 5.6 GFlops 3.7 GB/s ulps(fft 2.1,ps 5128.1) [OK] FFT+PS+SM( 1024) 5.5 GFlops 3.2 GB/s ulps(fft 2.5,ps 5552.5) [OK] FFT+PS+SM( 2048) 5.5 GFlops 3.0 GB/s ulps(fft 2.7,ps 5770.3) [OK] FFT+PS+SM( 4096) 5.3 GFlops 2.6 GB/s ulps(fft 2.4,ps 5313.7) [OK] FFT+PS+SM( 8192) 4.7 GFlops 2.2 GB/s ulps(fft 2.8,ps 5881.1) [OK] FFT+PS+SM( 16384) 4.4 GFlops 1.9 GB/s ulps(fft 3.3,ps 6399.1) [OK] FFT+PS+SM( 32768) 5.0 GFlops 2.0 GB/s ulps(fft 3.3,ps 6380.1) [OK] FFT+PS+SM( 65536) 5.2 GFlops 2.0 GB/s ulps(fft 3.4,ps 6534.8) [OK] FFT+PS+SM(131072) 5.5 GFlops 2.0 GB/s ulps(fft 3.6,ps 6694.2) [OK]Opt1 (worst case): 64 thrds/block FFT+PS+SM( 8) 1.6 GFlops 2.8 GB/s ulps(fft 1.3,ps 4637.5) [OK] FFT+PS+SM( 16) 1.9 GFlops 2.6 GB/s ulps(fft 1.6,ps 4589.2) [OK] FFT+PS+SM( 32) 2.3 GFlops 2.5 GB/s ulps(fft 1.6,ps 4535.6) [OK] FFT+PS+SM( 64) 3.1 GFlops 2.9 GB/s ulps(fft 1.6,ps 4426.7) [OK] FFT+PS+SM( 128) 3.6 GFlops 3.0 GB/s ulps(fft 2.0,ps 4818.1) [OK] FFT+PS+SM( 256) 4.8 GFlops 3.5 GB/s ulps(fft 2.0,ps 4831.0) [OK] FFT+PS+SM( 512) 5.8 GFlops 3.8 GB/s ulps(fft 2.1,ps 4987.2) [OK] FFT+PS+SM( 1024) 5.6 GFlops 3.3 GB/s ulps(fft 2.5,ps 5438.0) [OK] FFT+PS+SM( 2048) 5.7 GFlops 3.1 GB/s ulps(fft 2.7,ps 5674.7) [OK] FFT+PS+SM( 4096) 5.3 GFlops 2.7 GB/s ulps(fft 2.4,ps 5202.4) [OK] FFT+PS+SM( 8192) 4.8 GFlops 2.2 GB/s ulps(fft 2.8,ps 5765.4) [OK] FFT+PS+SM( 16384) 4.4 GFlops 1.9 GB/s ulps(fft 3.3,ps 6291.8) [OK] FFT+PS+SM( 32768) 5.0 GFlops 2.0 GB/s ulps(fft 3.3,ps 6275.5) [OK] FFT+PS+SM( 65536) 5.2 GFlops 2.0 GB/s ulps(fft 3.4,ps 6429.1) [OK] FFT+PS+SM(131072) 5.5 GFlops 2.0 GB/s ulps(fft 3.6,ps 6590.4) [OK]
Device: GeForce 9800 GTX/9800 GTX+, 1900 MHz clock, 496 MB memory.Compute capability 1.1Compiled with CUDA 3020. PowerSpectrum+summax Unit test #9 (FFT pipeline) Christmas 2010 edition.Stock: FFT+PS+SM( 8) 8.1 GFlops 14.3 GB/s ulps(fft 1.3,ps 4775.9) [OK] FFT+PS+SM( 16) 12.6 GFlops 17.2 GB/s ulps(fft 1.6,ps 4817.4) [OK] FFT+PS+SM( 32) 16.6 GFlops 18.5 GB/s ulps(fft 1.6,ps 4628.1) [OK] FFT+PS+SM( 64) 28.7 GFlops 27.1 GB/s ulps(fft 1.6,ps 4557.6) [OK] FFT+PS+SM( 128) 42.1 GFlops 34.6 GB/s ulps(fft 2.0,ps 4942.0) [OK] FFT+PS+SM( 256) 55.5 GFlops 40.4 GB/s ulps(fft 2.0,ps 4967.8) [OK] FFT+PS+SM( 512) 68.2 GFlops 44.6 GB/s ulps(fft 2.1,ps 5128.1) [OK] FFT+PS+SM( 1024) 72.3 GFlops 42.9 GB/s ulps(fft 2.5,ps 5552.5) [OK] FFT+PS+SM( 2048) 70.7 GFlops 38.4 GB/s ulps(fft 2.7,ps 5770.3) [OK] FFT+PS+SM( 4096) 66.1 GFlops 33.1 GB/s ulps(fft 2.4,ps 5313.7) [OK] FFT+PS+SM( 8192) 64.2 GFlops 29.8 GB/s ulps(fft 2.8,ps 5881.1) [OK] FFT+PS+SM( 16384) 60.7 GFlops 26.2 GB/s ulps(fft 3.3,ps 6399.1) [OK] FFT+PS+SM( 32768) 56.1 GFlops 22.7 GB/s ulps(fft 3.3,ps 6380.1) [OK] FFT+PS+SM( 65536) 62.0 GFlops 23.6 GB/s ulps(fft 3.4,ps 6534.8) [OK] FFT+PS+SM(131072) 63.2 GFlops 22.7 GB/s ulps(fft 3.6,ps 6694.2) [OK]Opt1 (worst case): 64 thrds/block FFT+PS+SM( 8) 11.1 GFlops 19.6 GB/s ulps(fft 1.3,ps 4637.5) [OK] FFT+PS+SM( 16) 19.4 GFlops 26.4 GB/s ulps(fft 1.6,ps 4589.2) [OK] FFT+PS+SM( 32) 27.5 GFlops 30.7 GB/s ulps(fft 1.6,ps 4535.6) [OK] FFT+PS+SM( 64) 40.8 GFlops 38.6 GB/s ulps(fft 1.6,ps 4426.7) [OK] FFT+PS+SM( 128) 48.9 GFlops 40.2 GB/s ulps(fft 2.0,ps 4818.1) [OK] FFT+PS+SM( 256) 64.2 GFlops 46.8 GB/s ulps(fft 2.0,ps 4831.0) [OK] FFT+PS+SM( 512) 79.3 GFlops 51.8 GB/s ulps(fft 2.1,ps 4987.2) [OK] FFT+PS+SM( 1024) 82.7 GFlops 49.0 GB/s ulps(fft 2.5,ps 5438.0) [OK] FFT+PS+SM( 2048) 79.9 GFlops 43.3 GB/s ulps(fft 2.7,ps 5674.7) [OK] FFT+PS+SM( 4096) 74.3 GFlops 37.2 GB/s ulps(fft 2.4,ps 5202.4) [OK] FFT+PS+SM( 8192) 71.6 GFlops 33.2 GB/s ulps(fft 2.8,ps 5765.4) [OK] FFT+PS+SM( 16384) 66.9 GFlops 28.9 GB/s ulps(fft 3.3,ps 6291.8) [OK] FFT+PS+SM( 32768) 61.4 GFlops 24.9 GB/s ulps(fft 3.3,ps 6275.5) [OK] FFT+PS+SM( 65536) 68.0 GFlops 25.9 GB/s ulps(fft 3.4,ps 6429.1) [OK] FFT+PS+SM(131072) 69.3 GFlops 24.9 GB/s ulps(fft 3.6,ps 6590.4) [OK]
Device: GeForce GTX 295, 1476 MHz clock, 874 MB memory.Compute capability 1.3Compiled with CUDA 3020. PowerSpectrum+summax Unit test #9 (FFT pipeline) Christmas 2010 edition.Stock: FFT+PS+SM( 8) 17.3 GFlops 30.4 GB/s ulps(fft 1.3,ps 4775.9) [OK] FFT+PS+SM( 16) 23.2 GFlops 31.7 GB/s ulps(fft 1.6,ps 4817.4) [OK] FFT+PS+SM( 32) 27.2 GFlops 30.4 GB/s ulps(fft 1.6,ps 4628.1) [OK] FFT+PS+SM( 64) 43.8 GFlops 41.5 GB/s ulps(fft 1.6,ps 4557.6) [OK] FFT+PS+SM( 128) 60.7 GFlops 49.9 GB/s ulps(fft 2.0,ps 4942.0) [OK] FFT+PS+SM( 256) 75.6 GFlops 55.1 GB/s ulps(fft 2.0,ps 4967.8) [OK] FFT+PS+SM( 512) 91.6 GFlops 59.9 GB/s ulps(fft 2.1,ps 5128.1) [OK] FFT+PS+SM( 1024) 92.1 GFlops 54.6 GB/s ulps(fft 2.5,ps 5552.5) [OK] FFT+PS+SM( 2048) 96.9 GFlops 52.6 GB/s ulps(fft 2.7,ps 5770.3) [OK] FFT+PS+SM( 4096) 93.1 GFlops 46.6 GB/s ulps(fft 2.4,ps 5313.7) [OK] FFT+PS+SM( 8192) 98.7 GFlops 45.8 GB/s ulps(fft 2.8,ps 5881.1) [OK] FFT+PS+SM( 16384) 96.1 GFlops 41.6 GB/s ulps(fft 3.3,ps 6399.1) [OK] FFT+PS+SM( 32768) 96.5 GFlops 39.1 GB/s ulps(fft 3.1,ps 6152.4) [OK] FFT+PS+SM( 65536) 88.2 GFlops 33.6 GB/s ulps(fft 2.8,ps 5899.2) [OK] FFT+PS+SM(131072) 94.4 GFlops 33.9 GB/s ulps(fft 3.6,ps 6694.2) [OK]Opt1 (worst case): 128 thrds/block FFT+PS+SM( 8) 25.0 GFlops 44.0 GB/s ulps(fft 1.3,ps 4637.5) [OK] FFT+PS+SM( 16) 37.1 GFlops 50.6 GB/s ulps(fft 1.6,ps 4589.2) [OK] FFT+PS+SM( 32) 49.8 GFlops 55.6 GB/s ulps(fft 1.6,ps 4535.6) [OK] FFT+PS+SM( 64) 68.5 GFlops 64.9 GB/s ulps(fft 1.6,ps 4426.7) [OK] FFT+PS+SM( 128) 81.4 GFlops 67.0 GB/s ulps(fft 2.0,ps 4818.1) [OK] FFT+PS+SM( 256) 94.6 GFlops 68.9 GB/s ulps(fft 2.0,ps 4831.0) [OK] FFT+PS+SM( 512) 115.9 GFlops 75.7 GB/s ulps(fft 2.1,ps 4987.2) [OK] FFT+PS+SM( 1024) 122.4 GFlops 72.6 GB/s ulps(fft 2.5,ps 5438.0) [OK] FFT+PS+SM( 2048) 124.9 GFlops 67.7 GB/s ulps(fft 2.7,ps 5674.7) [OK] FFT+PS+SM( 4096) 113.9 GFlops 57.0 GB/s ulps(fft 2.4,ps 5202.4) [OK] FFT+PS+SM( 8192) 120.5 GFlops 55.9 GB/s ulps(fft 2.8,ps 5765.4) [OK] FFT+PS+SM( 16384) 121.6 GFlops 52.6 GB/s ulps(fft 3.3,ps 6291.8) [OK] FFT+PS+SM( 32768) 120.1 GFlops 48.7 GB/s ulps(fft 3.1,ps 6041.9) [OK] FFT+PS+SM( 65536) 103.7 GFlops 39.5 GB/s ulps(fft 2.8,ps 5782.9) [OK] FFT+PS+SM(131072) 111.2 GFlops 40.0 GB/s ulps(fft 3.6,ps 6590.4) [OK]
Device: GeForce GTX 295, 1476 MHz clock, 873 MB memory.Compute capability 1.3Compiled with CUDA 3020. PowerSpectrum+summax Unit test #9 (FFT pipeline) Christmas 2010 edition.Stock: FFT+PS+SM( 8) 16.3 GFlops 28.7 GB/s ulps(fft 1.3,ps 4775.9) [OK] FFT+PS+SM( 16) 22.9 GFlops 31.3 GB/s ulps(fft 1.6,ps 4817.4) [OK] FFT+PS+SM( 32) 26.3 GFlops 29.3 GB/s ulps(fft 1.6,ps 4628.1) [OK] FFT+PS+SM( 64) 42.1 GFlops 39.8 GB/s ulps(fft 1.6,ps 4557.6) [OK] FFT+PS+SM( 128) 63.2 GFlops 52.0 GB/s ulps(fft 2.0,ps 4942.0) [OK] FFT+PS+SM( 256) 75.0 GFlops 54.6 GB/s ulps(fft 2.0,ps 4967.8) [OK] FFT+PS+SM( 512) 89.7 GFlops 58.6 GB/s ulps(fft 2.1,ps 5128.1) [OK] FFT+PS+SM( 1024) 92.9 GFlops 55.1 GB/s ulps(fft 2.5,ps 5552.5) [OK] FFT+PS+SM( 2048) 96.6 GFlops 52.4 GB/s ulps(fft 2.7,ps 5770.3) [OK] FFT+PS+SM( 4096) 87.3 GFlops 43.7 GB/s ulps(fft 2.4,ps 5313.7) [OK] FFT+PS+SM( 8192) 49.6 GFlops 23.0 GB/s ulps(fft 2.8,ps 5881.1) [OK] FFT+PS+SM( 16384) 98.6 GFlops 42.6 GB/s ulps(fft 3.3,ps 6399.1) [OK] FFT+PS+SM( 32768) 97.1 GFlops 39.3 GB/s ulps(fft 3.1,ps 6152.4) [OK] FFT+PS+SM( 65536) 85.5 GFlops 32.6 GB/s ulps(fft 2.8,ps 5899.2) [OK] FFT+PS+SM(131072) 91.4 GFlops 32.9 GB/s ulps(fft 3.6,ps 6694.2) [OK]Opt1 (worst case): 128 thrds/block FFT+PS+SM( 8) 24.5 GFlops 43.2 GB/s ulps(fft 1.3,ps 4637.5) [OK] FFT+PS+SM( 16) 36.4 GFlops 49.7 GB/s ulps(fft 1.6,ps 4589.2) [OK] FFT+PS+SM( 32) 48.8 GFlops 54.5 GB/s ulps(fft 1.6,ps 4535.6) [OK] FFT+PS+SM( 64) 67.0 GFlops 63.4 GB/s ulps(fft 1.6,ps 4426.7) [OK] FFT+PS+SM( 128) 79.6 GFlops 65.5 GB/s ulps(fft 2.0,ps 4818.1) [OK] FFT+PS+SM( 256) 92.7 GFlops 67.5 GB/s ulps(fft 2.0,ps 4831.0) [OK] FFT+PS+SM( 512) 113.9 GFlops 74.4 GB/s ulps(fft 2.1,ps 4987.2) [OK] FFT+PS+SM( 1024) 118.9 GFlops 70.5 GB/s ulps(fft 2.5,ps 5438.0) [OK] FFT+PS+SM( 2048) 122.9 GFlops 66.7 GB/s ulps(fft 2.7,ps 5674.7) [OK] FFT+PS+SM( 4096) 111.8 GFlops 55.9 GB/s ulps(fft 2.4,ps 5202.4) [OK] FFT+PS+SM( 8192) 117.7 GFlops 54.6 GB/s ulps(fft 2.8,ps 5765.4) [OK] FFT+PS+SM( 16384) 118.7 GFlops 51.3 GB/s ulps(fft 3.3,ps 6291.8) [OK] FFT+PS+SM( 32768) 117.7 GFlops 47.7 GB/s ulps(fft 3.1,ps 6041.9) [OK] FFT+PS+SM( 65536) 101.2 GFlops 38.5 GB/s ulps(fft 2.8,ps 5782.9) [OK] FFT+PS+SM(131072) 108.6 GFlops 39.0 GB/s ulps(fft 3.6,ps 6590.4) [OK]
Device: GeForce GTX 260, 1487 MHz clock, 874 MB memory.Compute capability 1.3Compiled with CUDA 3020. PowerSpectrum+summax Unit test #9 (FFT pipeline) Christmas 2010 edition.Stock: FFT+PS+SM( 8) 16.5 GFlops 29.2 GB/s ulps(fft 1.3,ps 4775.9) [OK] FFT+PS+SM( 16) 23.1 GFlops 31.5 GB/s ulps(fft 1.6,ps 4817.4) [OK] FFT+PS+SM( 32) 25.3 GFlops 28.3 GB/s ulps(fft 1.6,ps 4628.1) [OK] FFT+PS+SM( 64) 41.3 GFlops 39.1 GB/s ulps(fft 1.6,ps 4557.6) [OK] FFT+PS+SM( 128) 61.6 GFlops 50.7 GB/s ulps(fft 2.0,ps 4942.0) [OK] FFT+PS+SM( 256) 72.0 GFlops 52.5 GB/s ulps(fft 2.0,ps 4967.8) [OK] FFT+PS+SM( 512) 87.7 GFlops 57.3 GB/s ulps(fft 2.1,ps 5128.1) [OK] FFT+PS+SM( 1024) 94.5 GFlops 56.0 GB/s ulps(fft 2.5,ps 5552.5) [OK] FFT+PS+SM( 2048) 96.7 GFlops 52.5 GB/s ulps(fft 2.7,ps 5770.3) [OK] FFT+PS+SM( 4096) 90.5 GFlops 45.2 GB/s ulps(fft 2.4,ps 5313.7) [OK] FFT+PS+SM( 8192) 95.0 GFlops 44.1 GB/s ulps(fft 2.8,ps 5881.1) [OK] FFT+PS+SM( 16384) 95.0 GFlops 41.1 GB/s ulps(fft 3.3,ps 6399.1) [OK] FFT+PS+SM( 32768) 91.2 GFlops 36.9 GB/s ulps(fft 3.1,ps 6152.4) [OK] FFT+PS+SM( 65536) 83.6 GFlops 31.8 GB/s ulps(fft 2.8,ps 5899.2) [OK] FFT+PS+SM(131072) 90.6 GFlops 32.6 GB/s ulps(fft 3.6,ps 6694.2) [OK]Opt1 (worst case): 128 thrds/block FFT+PS+SM( 8) 24.1 GFlops 42.4 GB/s ulps(fft 1.3,ps 4637.5) [OK] FFT+PS+SM( 16) 35.3 GFlops 48.2 GB/s ulps(fft 1.6,ps 4589.2) [OK] FFT+PS+SM( 32) 47.1 GFlops 52.6 GB/s ulps(fft 1.6,ps 4535.6) [OK] FFT+PS+SM( 64) 64.9 GFlops 61.4 GB/s ulps(fft 1.6,ps 4426.7) [OK] FFT+PS+SM( 128) 77.0 GFlops 63.3 GB/s ulps(fft 2.0,ps 4818.1) [OK] FFT+PS+SM( 256) 89.2 GFlops 65.0 GB/s ulps(fft 2.0,ps 4831.0) [OK] FFT+PS+SM( 512) 110.0 GFlops 71.9 GB/s ulps(fft 2.1,ps 4987.2) [OK] FFT+PS+SM( 1024) 118.1 GFlops 70.0 GB/s ulps(fft 2.5,ps 5438.0) [OK] FFT+PS+SM( 2048) 118.8 GFlops 64.5 GB/s ulps(fft 2.7,ps 5674.7) [OK] FFT+PS+SM( 4096) 110.6 GFlops 55.3 GB/s ulps(fft 2.4,ps 5202.4) [OK] FFT+PS+SM( 8192) 116.2 GFlops 53.9 GB/s ulps(fft 2.8,ps 5765.4) [OK] FFT+PS+SM( 16384) 116.1 GFlops 50.2 GB/s ulps(fft 3.3,ps 6291.8) [OK] FFT+PS+SM( 32768) 108.7 GFlops 44.0 GB/s ulps(fft 3.1,ps 6041.9) [OK] FFT+PS+SM( 65536) 97.8 GFlops 37.2 GB/s ulps(fft 2.8,ps 5782.9) [OK] FFT+PS+SM(131072) 108.3 GFlops 38.9 GB/s ulps(fft 3.6,ps 6590.4) [OK]