Forum > GPU crunching
[Split] PowerSpectrum Unit Test
Jason G:
Thanks, That's crazy speedup there too ( 1.3-2x) . Will be checking thoroughly before moving on ;)
arkayn:
And the 460-768
--- Code: ---Device: GeForce GTX 460, 1600 MHz clock, 768 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
FFT+PS+SM( 8) 9.5 GFlops 16.7 GB/s ulps(fft 1.2,ps 4389.0) [OK]
FFT+PS+SM( 16) 14.4 GFlops 19.7 GB/s ulps(fft 1.6,ps 4518.6) [OK]
FFT+PS+SM( 32) 13.8 GFlops 15.4 GB/s ulps(fft 1.3,ps 3977.6) [OK]
FFT+PS+SM( 64) 24.2 GFlops 22.9 GB/s ulps(fft 1.5,ps 4206.9) [OK]
FFT+PS+SM( 128) 36.9 GFlops 30.4 GB/s ulps(fft 1.7,ps 4351.9) [OK]
FFT+PS+SM( 256) 49.9 GFlops 36.3 GB/s ulps(fft 1.7,ps 4254.8) [OK]
FFT+PS+SM( 512) 70.7 GFlops 46.2 GB/s ulps(fft 1.8,ps 4305.7) [OK]
FFT+PS+SM( 1024) 90.4 GFlops 53.6 GB/s ulps(fft 2.1,ps 4725.7) [OK]
FFT+PS+SM( 2048) 102.7 GFlops 55.7 GB/s ulps(fft 2.2,ps 4918.4) [OK]
FFT+PS+SM( 4096) 111.2 GFlops 55.6 GB/s ulps(fft 2.2,ps 4762.0) [OK]
FFT+PS+SM( 8192) 97.5 GFlops 45.2 GB/s ulps(fft 2.6,ps 5275.5) [OK]
FFT+PS+SM( 16384) 93.4 GFlops 40.4 GB/s ulps(fft 2.6,ps 5355.0) [OK]
FFT+PS+SM( 32768) 100.6 GFlops 40.7 GB/s ulps(fft 2.3,ps 4987.7) [OK]
FFT+PS+SM( 65536) 106.9 GFlops 40.7 GB/s ulps(fft 2.0,ps 4601.3) [OK]
FFT+PS+SM(131072) 86.9 GFlops 31.3 GB/s ulps(fft 2.7,ps 5392.0) [OK]
Opt1 (worst case): 256 thrds/block
FFT+PS+SM( 8) 16.5 GFlops 29.1 GB/s ulps(fft 1.2,ps 4324.2) [OK]
FFT+PS+SM( 16) 27.2 GFlops 37.1 GB/s ulps(fft 1.6,ps 4326.2) [OK]
FFT+PS+SM( 32) 38.4 GFlops 42.9 GB/s ulps(fft 1.3,ps 4003.6) [OK]
FFT+PS+SM( 64) 49.9 GFlops 47.2 GB/s ulps(fft 1.5,ps 4270.2) [OK]
FFT+PS+SM( 128) 45.0 GFlops 37.0 GB/s ulps(fft 1.7,ps 4347.9) [OK]
FFT+PS+SM( 256) 64.5 GFlops 47.0 GB/s ulps(fft 1.7,ps 4261.8) [OK]
FFT+PS+SM( 512) 82.9 GFlops 54.2 GB/s ulps(fft 1.8,ps 4327.4) [OK]
FFT+PS+SM( 1024) 108.0 GFlops 64.0 GB/s ulps(fft 2.1,ps 4727.6) [OK]
FFT+PS+SM( 2048) 123.3 GFlops 66.9 GB/s ulps(fft 2.2,ps 4921.2) [OK]
FFT+PS+SM( 4096) 132.9 GFlops 66.4 GB/s ulps(fft 2.2,ps 4764.3) [OK]
FFT+PS+SM( 8192) 111.0 GFlops 51.5 GB/s ulps(fft 2.6,ps 5278.8) [OK]
FFT+PS+SM( 16384) 107.2 GFlops 46.3 GB/s ulps(fft 2.6,ps 5357.5) [OK]
FFT+PS+SM( 32768) 111.4 GFlops 45.1 GB/s ulps(fft 2.3,ps 4992.8) [OK]
FFT+PS+SM( 65536) 117.4 GFlops 44.7 GB/s ulps(fft 2.0,ps 4604.3) [OK]
FFT+PS+SM(131072) 95.6 GFlops 34.4 GB/s ulps(fft 2.7,ps 5392.8) [OK]
--- End code ---
Rehosting of the test on a faster connection.
http://www.arkayn.us/seti/PowerSpectrumTest9.7z
Jason G:
--- Quote from: arkayn on 24 Dec 2010, 12:49:33 pm ---And the 460-768...
--- End quote ---
We're pushing that narrower memory bus I guess ;), totally different spread is interesting.
--- Quote ---Rehosting of the test on a faster connection.
http://www.arkayn.us/seti/PowerSpectrumTest9.7z
--- End quote ---
Cheers! (adding link to first post..[done] )
SciManStev:
This is fun!
--- Quote ---Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
FFT+PS+SM( 8) 21.2 GFlops 37.3 GB/s ulps(fft 1.2,ps 4389.0) [OK]
FFT+PS+SM( 16) 30.5 GFlops 41.6 GB/s ulps(fft 1.6,ps 4518.6) [OK]
FFT+PS+SM( 32) 30.7 GFlops 34.2 GB/s ulps(fft 1.3,ps 3977.6) [OK]
FFT+PS+SM( 64) 50.3 GFlops 47.6 GB/s ulps(fft 1.5,ps 4206.9) [OK]
FFT+PS+SM( 128) 73.0 GFlops 60.0 GB/s ulps(fft 1.7,ps 4351.9) [OK]
FFT+PS+SM( 256) 92.7 GFlops 67.5 GB/s ulps(fft 1.7,ps 4254.8) [OK]
FFT+PS+SM( 512) 125.8 GFlops 82.2 GB/s ulps(fft 1.8,ps 4305.7) [OK]
FFT+PS+SM( 1024) 149.6 GFlops 88.7 GB/s ulps(fft 2.1,ps 4725.7) [OK]
FFT+PS+SM( 2048) 163.0 GFlops 88.4 GB/s ulps(fft 2.2,ps 4918.4) [OK]
FFT+PS+SM( 4096) 168.5 GFlops 84.2 GB/s ulps(fft 2.2,ps 4762.0) [OK]
FFT+PS+SM( 8192) 170.0 GFlops 78.8 GB/s ulps(fft 2.6,ps 5275.5) [OK]
FFT+PS+SM( 16384) 157.2 GFlops 68.0 GB/s ulps(fft 2.6,ps 5355.0) [OK]
FFT+PS+SM( 32768) 167.4 GFlops 67.8 GB/s ulps(fft 2.3,ps 4987.7) [OK]
FFT+PS+SM( 65536) 164.6 GFlops 62.7 GB/s ulps(fft 2.0,ps 4601.3) [OK]
FFT+PS+SM(131072) 141.9 GFlops 51.0 GB/s ulps(fft 2.7,ps 5392.0) [OK]
Opt1 (worst case): 256 thrds/block
FFT+PS+SM( 8) 37.4 GFlops 65.9 GB/s ulps(fft 1.2,ps 4324.2) [OK]
FFT+PS+SM( 16) 58.9 GFlops 80.4 GB/s ulps(fft 1.6,ps 4326.2) [OK]
FFT+PS+SM( 32) 81.7 GFlops 91.2 GB/s ulps(fft 1.3,ps 4003.6) [OK]
FFT+PS+SM( 64) 102.4 GFlops 96.9 GB/s ulps(fft 1.5,ps 4270.2) [OK]
FFT+PS+SM( 128) 100.5 GFlops 82.7 GB/s ulps(fft 1.7,ps 4347.9) [OK]
FFT+PS+SM( 256) 142.2 GFlops 103.6 GB/s ulps(fft 1.7,ps 4261.8) [OK]
FFT+PS+SM( 512) 177.3 GFlops 115.9 GB/s ulps(fft 1.8,ps 4327.4) [OK]
FFT+PS+SM( 1024) 218.1 GFlops 129.3 GB/s ulps(fft 2.1,ps 4727.6) [OK]
FFT+PS+SM( 2048) 233.4 GFlops 126.6 GB/s ulps(fft 2.2,ps 4921.2) [OK]
FFT+PS+SM( 4096) 238.4 GFlops 119.2 GB/s ulps(fft 2.2,ps 4764.3) [OK]
FFT+PS+SM( 8192) 229.6 GFlops 106.5 GB/s ulps(fft 2.6,ps 5278.8) [OK]
FFT+PS+SM( 16384) 217.5 GFlops 94.1 GB/s ulps(fft 2.6,ps 5357.5) [OK]
FFT+PS+SM( 32768) 213.6 GFlops 86.5 GB/s ulps(fft 2.3,ps 4992.8) [OK]
FFT+PS+SM( 65536) 213.2 GFlops 81.2 GB/s ulps(fft 2.0,ps 4604.3) [OK]
FFT+PS+SM(131072) 198.0 GFlops 71.2 GB/s ulps(fft 2.7,ps 5392.8) [OK]
--- End quote ---
Steve
Richard Haselgrove:
The usual three:
9800GTX+, Windows 7/32
--- Code: ---Device: GeForce 9800 GTX/9800 GTX+, 1890 MHz clock, 498 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
FFT+PS+SM( 8) 6.9 GFlops 12.2 GB/s ulps(fft 1.3,ps 4775.9) [OK]
FFT+PS+SM( 16) 11.8 GFlops 16.1 GB/s ulps(fft 1.6,ps 4817.4) [OK]
FFT+PS+SM( 32) 15.6 GFlops 17.4 GB/s ulps(fft 1.6,ps 4628.1) [OK]
FFT+PS+SM( 64) 26.2 GFlops 24.8 GB/s ulps(fft 1.6,ps 4557.6) [OK]
FFT+PS+SM( 128) 36.6 GFlops 30.1 GB/s ulps(fft 2.0,ps 4942.0) [OK]
FFT+PS+SM( 256) 48.7 GFlops 35.5 GB/s ulps(fft 2.0,ps 4967.8) [OK]
FFT+PS+SM( 512) 57.8 GFlops 37.8 GB/s ulps(fft 2.1,ps 5128.1) [OK]
FFT+PS+SM( 1024) 62.9 GFlops 37.3 GB/s ulps(fft 2.5,ps 5552.5) [OK]
FFT+PS+SM( 2048) 61.7 GFlops 33.5 GB/s ulps(fft 2.7,ps 5770.3) [OK]
FFT+PS+SM( 4096) 57.6 GFlops 28.8 GB/s ulps(fft 2.4,ps 5313.7) [OK]
FFT+PS+SM( 8192) 56.7 GFlops 26.3 GB/s ulps(fft 2.8,ps 5881.1) [OK]
FFT+PS+SM( 16384) 52.5 GFlops 22.7 GB/s ulps(fft 3.3,ps 6399.1) [OK]
FFT+PS+SM( 32768) 50.3 GFlops 20.4 GB/s ulps(fft 3.3,ps 6380.1) [OK]
FFT+PS+SM( 65536) 55.3 GFlops 21.1 GB/s ulps(fft 3.4,ps 6534.8) [OK]
FFT+PS+SM(131072) 56.9 GFlops 20.5 GB/s ulps(fft 3.6,ps 6694.2) [OK]
Opt1 (worst case): 64 thrds/block
FFT+PS+SM( 8) 14.9 GFlops 26.2 GB/s ulps(fft 1.3,ps 4637.5) [OK]
FFT+PS+SM( 16) 23.3 GFlops 31.8 GB/s ulps(fft 1.6,ps 4589.2) [OK]
FFT+PS+SM( 32) 30.5 GFlops 34.0 GB/s ulps(fft 1.6,ps 4535.6) [OK]
FFT+PS+SM( 64) 43.2 GFlops 40.9 GB/s ulps(fft 1.6,ps 4426.7) [OK]
FFT+PS+SM( 128) 49.8 GFlops 41.0 GB/s ulps(fft 2.0,ps 4818.1) [OK]
FFT+PS+SM( 256) 64.9 GFlops 47.3 GB/s ulps(fft 2.0,ps 4831.0) [OK]
FFT+PS+SM( 512) 79.3 GFlops 51.8 GB/s ulps(fft 2.1,ps 4987.2) [OK]
FFT+PS+SM( 1024) 81.9 GFlops 48.6 GB/s ulps(fft 2.5,ps 5438.0) [OK]
FFT+PS+SM( 2048) 78.1 GFlops 42.4 GB/s ulps(fft 2.7,ps 5674.7) [OK]
FFT+PS+SM( 4096) 73.3 GFlops 36.7 GB/s ulps(fft 2.4,ps 5202.4) [OK]
FFT+PS+SM( 8192) 70.5 GFlops 32.7 GB/s ulps(fft 2.8,ps 5765.4) [OK]
FFT+PS+SM( 16384) 65.7 GFlops 28.4 GB/s ulps(fft 3.3,ps 6291.8) [OK]
FFT+PS+SM( 32768) 60.7 GFlops 24.6 GB/s ulps(fft 3.3,ps 6275.5) [OK]
FFT+PS+SM( 65536) 67.0 GFlops 25.5 GB/s ulps(fft 3.4,ps 6429.1) [OK]
FFT+PS+SM(131072) 68.5 GFlops 24.6 GB/s ulps(fft 3.6,ps 6590.4) [OK]
--- End code ---
9800GT, Windows XP/32
--- Code: ---Device: GeForce 9800 GT, 1500 MHz clock, 512 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
FFT+PS+SM( 8) 6.6 GFlops 11.6 GB/s ulps(fft 1.3,ps 4775.9) [OK]
FFT+PS+SM( 16) 10.5 GFlops 14.3 GB/s ulps(fft 1.6,ps 4817.4) [OK]
FFT+PS+SM( 32) 13.0 GFlops 14.5 GB/s ulps(fft 1.6,ps 4628.1) [OK]
FFT+PS+SM( 64) 22.4 GFlops 21.2 GB/s ulps(fft 1.6,ps 4557.6) [OK]
FFT+PS+SM( 128) 33.8 GFlops 27.8 GB/s ulps(fft 2.0,ps 4942.0) [OK]
FFT+PS+SM( 256) 45.2 GFlops 32.9 GB/s ulps(fft 2.0,ps 4967.8) [OK]
FFT+PS+SM( 512) 56.0 GFlops 36.6 GB/s ulps(fft 2.1,ps 5128.1) [OK]
FFT+PS+SM( 1024) 57.6 GFlops 34.1 GB/s ulps(fft 2.5,ps 5552.5) [OK]
FFT+PS+SM( 2048) 57.4 GFlops 31.1 GB/s ulps(fft 2.7,ps 5770.3) [OK]
FFT+PS+SM( 4096) 50.4 GFlops 25.2 GB/s ulps(fft 2.4,ps 5313.7) [OK]
FFT+PS+SM( 8192) 48.9 GFlops 22.7 GB/s ulps(fft 2.8,ps 5881.1) [OK]
FFT+PS+SM( 16384) 46.8 GFlops 20.3 GB/s ulps(fft 3.3,ps 6399.1) [OK]
FFT+PS+SM( 32768) 42.4 GFlops 17.2 GB/s ulps(fft 3.3,ps 6380.1) [OK]
FFT+PS+SM( 65536) 47.8 GFlops 18.2 GB/s ulps(fft 3.4,ps 6534.8) [OK]
FFT+PS+SM(131072) 50.5 GFlops 18.1 GB/s ulps(fft 3.6,ps 6694.2) [OK]
Opt1 (worst case): 64 thrds/block
FFT+PS+SM( 8) 9.7 GFlops 17.2 GB/s ulps(fft 1.3,ps 4637.5) [OK]
FFT+PS+SM( 16) 16.0 GFlops 21.9 GB/s ulps(fft 1.6,ps 4589.2) [OK]
FFT+PS+SM( 32) 21.5 GFlops 24.0 GB/s ulps(fft 1.6,ps 4535.6) [OK]
FFT+PS+SM( 64) 31.1 GFlops 29.4 GB/s ulps(fft 1.6,ps 4426.7) [OK]
FFT+PS+SM( 128) 36.3 GFlops 29.9 GB/s ulps(fft 2.0,ps 4818.1) [OK]
FFT+PS+SM( 256) 47.7 GFlops 34.8 GB/s ulps(fft 2.0,ps 4831.0) [OK]
FFT+PS+SM( 512) 58.6 GFlops 38.3 GB/s ulps(fft 2.1,ps 4987.2) [OK]
FFT+PS+SM( 1024) 59.7 GFlops 35.4 GB/s ulps(fft 2.5,ps 5438.0) [OK]
FFT+PS+SM( 2048) 59.0 GFlops 32.0 GB/s ulps(fft 2.7,ps 5674.7) [OK]
FFT+PS+SM( 4096) 51.9 GFlops 26.0 GB/s ulps(fft 2.4,ps 5202.4) [OK]
FFT+PS+SM( 8192) 50.0 GFlops 23.2 GB/s ulps(fft 2.8,ps 5765.4) [OK]
FFT+PS+SM( 16384) 47.7 GFlops 20.6 GB/s ulps(fft 3.3,ps 6291.8) [OK]
FFT+PS+SM( 32768) 43.2 GFlops 17.5 GB/s ulps(fft 3.3,ps 6275.5) [OK]
FFT+PS+SM( 65536) 48.7 GFlops 18.6 GB/s ulps(fft 3.4,ps 6429.1) [OK]
FFT+PS+SM(131072) 51.6 GFlops 18.6 GB/s ulps(fft 3.6,ps 6590.4) [OK]
--- End code ---
GTX 470, Windows XP/32
--- Code: ---Device: GeForce GTX 470, 1215 MHz clock, 1280 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
FFT+PS+SM( 8) 7.9 GFlops 14.0 GB/s ulps(fft 1.2,ps 4389.0) [OK]
FFT+PS+SM( 16) 14.0 GFlops 19.1 GB/s ulps(fft 1.6,ps 4518.6) [OK]
FFT+PS+SM( 32) 17.7 GFlops 19.7 GB/s ulps(fft 1.3,ps 3977.6) [OK]
FFT+PS+SM( 64) 32.4 GFlops 30.7 GB/s ulps(fft 1.5,ps 4206.9) [OK]
FFT+PS+SM( 128) 51.7 GFlops 42.6 GB/s ulps(fft 1.7,ps 4351.9) [OK]
FFT+PS+SM( 256) 72.0 GFlops 52.5 GB/s ulps(fft 1.7,ps 4254.8) [OK]
FFT+PS+SM( 512) 100.4 GFlops 65.6 GB/s ulps(fft 1.8,ps 4305.7) [OK]
FFT+PS+SM( 1024) 124.9 GFlops 74.1 GB/s ulps(fft 2.1,ps 4725.7) [OK]
FFT+PS+SM( 2048) 136.6 GFlops 74.1 GB/s ulps(fft 2.2,ps 4918.4) [OK]
FFT+PS+SM( 4096) 139.1 GFlops 69.6 GB/s ulps(fft 2.2,ps 4762.0) [OK]
FFT+PS+SM( 8192) 141.0 GFlops 65.4 GB/s ulps(fft 2.6,ps 5275.5) [OK]
FFT+PS+SM( 16384) 132.7 GFlops 57.4 GB/s ulps(fft 2.6,ps 5355.0) [OK]
FFT+PS+SM( 32768) 137.9 GFlops 55.9 GB/s ulps(fft 2.3,ps 4987.7) [OK]
FFT+PS+SM( 65536) 134.5 GFlops 51.2 GB/s ulps(fft 2.0,ps 4601.3) [OK]
FFT+PS+SM(131072) 116.0 GFlops 41.7 GB/s ulps(fft 2.7,ps 5392.0) [OK]
Opt1 (worst case): 256 thrds/block
FFT+PS+SM( 8) 14.2 GFlops 25.1 GB/s ulps(fft 1.2,ps 4324.2) [OK]
FFT+PS+SM( 16) 27.2 GFlops 37.1 GB/s ulps(fft 1.6,ps 4326.2) [OK]
FFT+PS+SM( 32) 43.9 GFlops 49.0 GB/s ulps(fft 1.3,ps 4003.6) [OK]
FFT+PS+SM( 64) 61.3 GFlops 58.0 GB/s ulps(fft 1.5,ps 4270.2) [OK]
FFT+PS+SM( 128) 65.6 GFlops 54.0 GB/s ulps(fft 1.7,ps 4347.9) [OK]
FFT+PS+SM( 256) 95.7 GFlops 69.7 GB/s ulps(fft 1.7,ps 4261.8) [OK]
FFT+PS+SM( 512) 121.1 GFlops 79.2 GB/s ulps(fft 1.8,ps 4327.4) [OK]
FFT+PS+SM( 1024) 153.4 GFlops 91.0 GB/s ulps(fft 2.1,ps 4727.6) [OK]
FFT+PS+SM( 2048) 161.9 GFlops 87.8 GB/s ulps(fft 2.2,ps 4921.2) [OK]
FFT+PS+SM( 4096) 168.3 GFlops 84.2 GB/s ulps(fft 2.2,ps 4764.3) [OK]
FFT+PS+SM( 8192) 157.7 GFlops 73.1 GB/s ulps(fft 2.6,ps 5278.8) [OK]
FFT+PS+SM( 16384) 155.1 GFlops 67.1 GB/s ulps(fft 2.6,ps 5357.5) [OK]
FFT+PS+SM( 32768) 151.9 GFlops 61.5 GB/s ulps(fft 2.3,ps 4992.8) [OK]
FFT+PS+SM( 65536) 150.7 GFlops 57.4 GB/s ulps(fft 2.0,ps 4604.3) [OK]
FFT+PS+SM(131072) 137.2 GFlops 49.3 GB/s ulps(fft 2.7,ps 5392.8) [OK]
--- End code ---
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version