Forum > GPU crunching
[Split] PowerSpectrum Unit Test
PatrickV2:
--- Quote from: Jason G on 24 Dec 2010, 06:47:56 am ---Phew! cool, thanks ;D
Not much headroom on that chip either, but I'll be happy with that small fraction improvement on the oldest cards for now.
Moving onto test #9 soon, will add in the FFTs, then will stream the test kernels after that, just to see what that does... Progress at last ;D
--- End quote ---
You're quite welcome. What exactly do you mean with 'not much headroom on that chip'?
Looking forward to the next test-programs. ;)
Oh, and a Merry Christmas!
Regards,
Patrick.
Jason G:
--- Quote from: PatrickV2 on 24 Dec 2010, 07:56:28 am ---You're quite welcome. What exactly do you mean with 'not much headroom on that chip'?
--- End quote ---
only that It seems the best & worst case Opt1 aren't as far apart as on the newer/bigger GPUs, which means we're getting close to limits of the smaller chips, as to what optimisations can be useful on those (with this part of code anyway )
Onto combining FFTs into the pipline now, which will change the picture a lot. Back later
Jason
Jason G:
Darn Next test won't fit ! Arggh! .... When I can get it posted somewhere, Net progress so far looks something like this for ~40-60% of multibeam processing:
Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
FFT+PS+SM( 8) 12.7 GFlops 22.4 GB/s ulps(fft 1.2,ps 4389.0) [OK]
FFT+PS+SM( 16) 20.6 GFlops 28.1 GB/s ulps(fft 1.6,ps 4518.6) [OK]
FFT+PS+SM( 32) 25.1 GFlops 28.0 GB/s ulps(fft 1.3,ps 3977.6) [OK]
FFT+PS+SM( 64) 43.1 GFlops 40.8 GB/s ulps(fft 1.5,ps 4206.9) [OK]
FFT+PS+SM( 128) 63.7 GFlops 52.4 GB/s ulps(fft 1.7,ps 4351.9) [OK]
FFT+PS+SM( 256) 85.6 GFlops 62.4 GB/s ulps(fft 1.7,ps 4254.8) [OK]
FFT+PS+SM( 512) 114.2 GFlops 74.6 GB/s ulps(fft 1.8,ps 4305.7) [OK]
FFT+PS+SM( 1024) 136.7 GFlops 81.0 GB/s ulps(fft 2.1,ps 4725.7) [OK]
FFT+PS+SM( 2048) 149.3 GFlops 81.0 GB/s ulps(fft 2.2,ps 4918.4) [OK]
FFT+PS+SM( 4096) 154.1 GFlops 77.1 GB/s ulps(fft 2.2,ps 4762.0) [OK]
FFT+PS+SM( 8192) 156.2 GFlops 72.4 GB/s ulps(fft 2.6,ps 5275.5) [OK]
FFT+PS+SM( 16384) 149.2 GFlops 64.5 GB/s ulps(fft 2.6,ps 5355.0) [OK]
FFT+PS+SM( 32768) 155.5 GFlops 63.0 GB/s ulps(fft 2.3,ps 4987.7) [OK]
FFT+PS+SM( 65536) 152.0 GFlops 57.9 GB/s ulps(fft 2.0,ps 4601.3) [OK]
FFT+PS+SM(131072) 134.7 GFlops 48.4 GB/s ulps(fft 2.7,ps 5392.0) [OK]
Opt1 (worst case): 256 thrds/block
FFT+PS+SM( 8) 19.2 GFlops 33.8 GB/s ulps(fft 1.2,ps 4324.2) [OK]
FFT+PS+SM( 16) 37.0 GFlops 50.5 GB/s ulps(fft 1.6,ps 4326.2) [OK]
FFT+PS+SM( 32) 61.1 GFlops 68.2 GB/s ulps(fft 1.3,ps 4003.6) [OK]
FFT+PS+SM( 64) 86.9 GFlops 82.2 GB/s ulps(fft 1.5,ps 4270.2) [OK]
FFT+PS+SM( 128) 93.4 GFlops 76.8 GB/s ulps(fft 1.7,ps 4347.9) [OK]
FFT+PS+SM( 256) 137.0 GFlops 99.8 GB/s ulps(fft 1.7,ps 4261.8) [OK]
FFT+PS+SM( 512) 174.8 GFlops 114.2 GB/s ulps(fft 1.8,ps 4327.4) [OK]
FFT+PS+SM( 1024) 218.7 GFlops 129.6 GB/s ulps(fft 2.1,ps 4727.6) [OK]
FFT+PS+SM( 2048) 231.2 GFlops 125.4 GB/s ulps(fft 2.2,ps 4921.2) [OK]
FFT+PS+SM( 4096) 236.8 GFlops 118.4 GB/s ulps(fft 2.2,ps 4764.3) [OK]
FFT+PS+SM( 8192) 229.0 GFlops 106.2 GB/s ulps(fft 2.6,ps 5278.8) [OK]
FFT+PS+SM( 16384) 223.9 GFlops 96.8 GB/s ulps(fft 2.6,ps 5357.5) [OK]
FFT+PS+SM( 32768) 216.0 GFlops 87.5 GB/s ulps(fft 2.3,ps 4992.8) [OK]
FFT+PS+SM( 65536) 214.0 GFlops 81.5 GB/s ulps(fft 2.0,ps 4604.3) [OK]
FFT+PS+SM(131072) 205.0 GFlops 73.7 GB/s ulps(fft 2.7,ps 5392.8) [OK]
Figuring out how to get it uploaded ...
Jason G:
Unable to upload here, Please try
ftp://temp:temp@sinbadsvn.dyndns.org:31469/Jason_PowerSpectrum_Test/PowerSpectrumTest9.7z
Updated first post:
--- Quote ---Update: Powerspectrum Test #9 (Xmas edition)
- full FFT processing added
- Tightened peak/average tolerances to 0.001%
- worst case Opt1 only
Temporary download location:
ftp://temp:temp@sinbadsvn.dyndns.org:31469/Jason_PowerSpectrum_Test/PowerSpectrumTest9.7z
--- End quote ---
Ghost0210:
GTX 465 results:
--- Quote ---Device: GeForce GTX 465, 1215 MHz clock, 994 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
FFT+PS+SM( 8 ) 10.6 GFlops 18.7 GB/s ulps(fft 1.2,ps 4389.0) [OK]
FFT+PS+SM( 16) 16.5 GFlops 22.5 GB/s ulps(fft 1.6,ps 4518.6) [OK]
FFT+PS+SM( 32) 16.9 GFlops 18.9 GB/s ulps(fft 1.3,ps 3977.6) [OK]
FFT+PS+SM( 64) 29.0 GFlops 27.4 GB/s ulps(fft 1.5,ps 4206.9) [OK]
FFT+PS+SM( 128) 43.4 GFlops 35.7 GB/s ulps(fft 1.7,ps 4351.9) [OK]
FFT+PS+SM( 256) 57.8 GFlops 42.1 GB/s ulps(fft 1.7,ps 4254.8 ) [OK]
FFT+PS+SM( 512) 77.4 GFlops 50.6 GB/s ulps(fft 1.8,ps 4305.7) [OK]
FFT+PS+SM( 1024) 92.9 GFlops 55.1 GB/s ulps(fft 2.1,ps 4725.7) [OK]
FFT+PS+SM( 2048) 99.7 GFlops 54.1 GB/s ulps(fft 2.2,ps 4918.4) [OK]
FFT+PS+SM( 4096) 101.1 GFlops 50.6 GB/s ulps(fft 2.2,ps 4762.0) [OK]
FFT+PS+SM( 8192) 103.9 GFlops 48.2 GB/s ulps(fft 2.6,ps 5275.5) [OK]
FFT+PS+SM( 16384) 103.1 GFlops 44.6 GB/s ulps(fft 2.6,ps 5355.0) [OK]
FFT+PS+SM( 32768) 104.6 GFlops 42.4 GB/s ulps(fft 2.3,ps 4987.7) [OK]
FFT+PS+SM( 65536) 102.4 GFlops 39.0 GB/s ulps(fft 2.0,ps 4601.3) [OK]
FFT+PS+SM(131072) 93.8 GFlops 33.7 GB/s ulps(fft 2.7,ps 5392.0) [OK]
Opt1 (worst case): 256 thrds/block
FFT+PS+SM( 8 ) 20.5 GFlops 36.2 GB/s ulps(fft 1.2,ps 4324.2) [OK]
FFT+PS+SM( 16) 33.7 GFlops 45.9 GB/s ulps(fft 1.6,ps 4326.2) [OK]
FFT+PS+SM( 32) 47.3 GFlops 52.8 GB/s ulps(fft 1.3,ps 4003.6) [OK]
FFT+PS+SM( 64) 60.0 GFlops 56.8 GB/s ulps(fft 1.5,ps 4270.2) [OK]
FFT+PS+SM( 128) 59.0 GFlops 48.5 GB/s ulps(fft 1.7,ps 4347.9) [OK]
FFT+PS+SM( 256) 85.8 GFlops 62.5 GB/s ulps(fft 1.7,ps 4261.8 ) [OK]
FFT+PS+SM( 512) 109.0 GFlops 71.2 GB/s ulps(fft 1.8,ps 4327.4) [OK]
FFT+PS+SM( 1024) 133.7 GFlops 79.3 GB/s ulps(fft 2.1,ps 4727.6) [OK]
FFT+PS+SM( 2048) 136.9 GFlops 74.3 GB/s ulps(fft 2.2,ps 4921.2) [OK]
FFT+PS+SM( 4096) 141.5 GFlops 70.7 GB/s ulps(fft 2.2,ps 4764.3) [OK]
FFT+PS+SM( 8192) 136.7 GFlops 63.4 GB/s ulps(fft 2.6,ps 5278.8 ) [OK]
FFT+PS+SM( 16384) 141.3 GFlops 61.1 GB/s ulps(fft 2.6,ps 5357.5) [OK]
FFT+PS+SM( 32768) 134.9 GFlops 54.6 GB/s ulps(fft 2.3,ps 4992.8 ) [OK]
FFT+PS+SM( 65536) 132.6 GFlops 50.5 GB/s ulps(fft 2.0,ps 4604.3) [OK]
FFT+PS+SM(131072) 130.5 GFlops 46.9 GB/s ulps(fft 2.7,ps 5392.8 ) [OK]
--- End quote ---
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version