Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (52/62) > >>

PatrickV2:

--- Quote from: Jason G on 24 Dec 2010, 06:47:56 am ---Phew!  cool, thanks  ;D

Not much headroom on that chip either, but I'll be happy with that small fraction improvement on the oldest cards for now. 

Moving onto test #9 soon, will add in the FFTs, then will stream the test kernels after that, just to see what that does... Progress at last  ;D

--- End quote ---

You're quite welcome. What exactly do you mean with 'not much headroom on that chip'?

Looking forward to the next test-programs. ;)

Oh, and a Merry Christmas!

Regards,

Patrick.

Jason G:

--- Quote from: PatrickV2 on 24 Dec 2010, 07:56:28 am ---You're quite welcome. What exactly do you mean with 'not much headroom on that chip'?
--- End quote ---

only that It seems the best & worst case Opt1 aren't as far apart as on the newer/bigger GPUs, which means we're getting close to limits of the smaller chips, as to what optimisations can be useful on those (with this part of code anyway )

Onto combining FFTs into the pipline now, which will change the picture a lot.  Back later

Jason

Jason G:
Darn Next test won't fit ! Arggh! .... When I can get it posted somewhere,  Net progress so far looks something like this for ~40-60% of multibeam processing:

Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #9 (FFT pipeline)
                                Christmas 2010 edition.
Stock:
 FFT+PS+SM(     8)   12.7 GFlops   22.4 GB/s  ulps(fft  1.2,ps 4389.0) [OK]
 FFT+PS+SM(    16)   20.6 GFlops   28.1 GB/s  ulps(fft  1.6,ps 4518.6) [OK]
 FFT+PS+SM(    32)   25.1 GFlops   28.0 GB/s  ulps(fft  1.3,ps 3977.6) [OK]
 FFT+PS+SM(    64)   43.1 GFlops   40.8 GB/s  ulps(fft  1.5,ps 4206.9) [OK]
 FFT+PS+SM(   128)   63.7 GFlops   52.4 GB/s  ulps(fft  1.7,ps 4351.9) [OK]
 FFT+PS+SM(   256)   85.6 GFlops   62.4 GB/s  ulps(fft  1.7,ps 4254.8) [OK]
 FFT+PS+SM(   512)  114.2 GFlops   74.6 GB/s  ulps(fft  1.8,ps 4305.7) [OK]
 FFT+PS+SM(  1024)  136.7 GFlops   81.0 GB/s  ulps(fft  2.1,ps 4725.7) [OK]
 FFT+PS+SM(  2048)  149.3 GFlops   81.0 GB/s  ulps(fft  2.2,ps 4918.4) [OK]
 FFT+PS+SM(  4096)  154.1 GFlops   77.1 GB/s  ulps(fft  2.2,ps 4762.0) [OK]
 FFT+PS+SM(  8192)  156.2 GFlops   72.4 GB/s  ulps(fft  2.6,ps 5275.5) [OK]
 FFT+PS+SM( 16384)  149.2 GFlops   64.5 GB/s  ulps(fft  2.6,ps 5355.0) [OK]
 FFT+PS+SM( 32768)  155.5 GFlops   63.0 GB/s  ulps(fft  2.3,ps 4987.7) [OK]
 FFT+PS+SM( 65536)  152.0 GFlops   57.9 GB/s  ulps(fft  2.0,ps 4601.3) [OK]
 FFT+PS+SM(131072)  134.7 GFlops   48.4 GB/s  ulps(fft  2.7,ps 5392.0) [OK]


Opt1 (worst case): 256 thrds/block
 FFT+PS+SM(     8)   19.2 GFlops   33.8 GB/s  ulps(fft  1.2,ps 4324.2) [OK]
 FFT+PS+SM(    16)   37.0 GFlops   50.5 GB/s  ulps(fft  1.6,ps 4326.2) [OK]
 FFT+PS+SM(    32)   61.1 GFlops   68.2 GB/s  ulps(fft  1.3,ps 4003.6) [OK]
 FFT+PS+SM(    64)   86.9 GFlops   82.2 GB/s  ulps(fft  1.5,ps 4270.2) [OK]
 FFT+PS+SM(   128)   93.4 GFlops   76.8 GB/s  ulps(fft  1.7,ps 4347.9) [OK]
 FFT+PS+SM(   256)  137.0 GFlops   99.8 GB/s  ulps(fft  1.7,ps 4261.8) [OK]
 FFT+PS+SM(   512)  174.8 GFlops  114.2 GB/s  ulps(fft  1.8,ps 4327.4) [OK]
 FFT+PS+SM(  1024)  218.7 GFlops  129.6 GB/s  ulps(fft  2.1,ps 4727.6) [OK]
 FFT+PS+SM(  2048)  231.2 GFlops  125.4 GB/s  ulps(fft  2.2,ps 4921.2) [OK]
 FFT+PS+SM(  4096)  236.8 GFlops  118.4 GB/s  ulps(fft  2.2,ps 4764.3) [OK]
 FFT+PS+SM(  8192)  229.0 GFlops  106.2 GB/s  ulps(fft  2.6,ps 5278.8) [OK]
 FFT+PS+SM( 16384)  223.9 GFlops   96.8 GB/s  ulps(fft  2.6,ps 5357.5) [OK]
 FFT+PS+SM( 32768)  216.0 GFlops   87.5 GB/s  ulps(fft  2.3,ps 4992.8) [OK]
 FFT+PS+SM( 65536)  214.0 GFlops   81.5 GB/s  ulps(fft  2.0,ps 4604.3) [OK]
 FFT+PS+SM(131072)  205.0 GFlops   73.7 GB/s  ulps(fft  2.7,ps 5392.8) [OK]

Figuring out how to get it uploaded ...

Jason G:
Unable to upload here, Please try

ftp://temp:temp@sinbadsvn.dyndns.org:31469/Jason_PowerSpectrum_Test/PowerSpectrumTest9.7z

Updated first post:


--- Quote ---Update: Powerspectrum Test #9 (Xmas edition)
- full FFT processing added
- Tightened peak/average tolerances to 0.001%
- worst case Opt1 only

Temporary download location:
ftp://temp:temp@sinbadsvn.dyndns.org:31469/Jason_PowerSpectrum_Test/PowerSpectrumTest9.7z


--- End quote ---

Ghost0210:
GTX 465 results:


--- Quote ---Device: GeForce GTX 465, 1215 MHz clock, 994 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #9 (FFT pipeline)
                                Christmas 2010 edition.
Stock:
 FFT+PS+SM(     8 )   10.6 GFlops   18.7 GB/s  ulps(fft  1.2,ps 4389.0) [OK]
 FFT+PS+SM(    16)   16.5 GFlops   22.5 GB/s  ulps(fft  1.6,ps 4518.6) [OK]
 FFT+PS+SM(    32)   16.9 GFlops   18.9 GB/s  ulps(fft  1.3,ps 3977.6) [OK]
 FFT+PS+SM(    64)   29.0 GFlops   27.4 GB/s  ulps(fft  1.5,ps 4206.9) [OK]
 FFT+PS+SM(   128)   43.4 GFlops   35.7 GB/s  ulps(fft  1.7,ps 4351.9) [OK]
 FFT+PS+SM(   256)   57.8 GFlops   42.1 GB/s  ulps(fft  1.7,ps 4254.8 ) [OK]
 FFT+PS+SM(   512)   77.4 GFlops   50.6 GB/s  ulps(fft  1.8,ps 4305.7) [OK]
 FFT+PS+SM(  1024)   92.9 GFlops   55.1 GB/s  ulps(fft  2.1,ps 4725.7) [OK]
 FFT+PS+SM(  2048)   99.7 GFlops   54.1 GB/s  ulps(fft  2.2,ps 4918.4) [OK]
 FFT+PS+SM(  4096)  101.1 GFlops   50.6 GB/s  ulps(fft  2.2,ps 4762.0) [OK]
 FFT+PS+SM(  8192)  103.9 GFlops   48.2 GB/s  ulps(fft  2.6,ps 5275.5) [OK]
 FFT+PS+SM( 16384)  103.1 GFlops   44.6 GB/s  ulps(fft  2.6,ps 5355.0) [OK]
 FFT+PS+SM( 32768)  104.6 GFlops   42.4 GB/s  ulps(fft  2.3,ps 4987.7) [OK]
 FFT+PS+SM( 65536)  102.4 GFlops   39.0 GB/s  ulps(fft  2.0,ps 4601.3) [OK]
 FFT+PS+SM(131072)   93.8 GFlops   33.7 GB/s  ulps(fft  2.7,ps 5392.0) [OK]


Opt1 (worst case): 256 thrds/block
 FFT+PS+SM(     8 )   20.5 GFlops   36.2 GB/s  ulps(fft  1.2,ps 4324.2) [OK]
 FFT+PS+SM(    16)   33.7 GFlops   45.9 GB/s  ulps(fft  1.6,ps 4326.2) [OK]
 FFT+PS+SM(    32)   47.3 GFlops   52.8 GB/s  ulps(fft  1.3,ps 4003.6) [OK]
 FFT+PS+SM(    64)   60.0 GFlops   56.8 GB/s  ulps(fft  1.5,ps 4270.2) [OK]
 FFT+PS+SM(   128)   59.0 GFlops   48.5 GB/s  ulps(fft  1.7,ps 4347.9) [OK]
 FFT+PS+SM(   256)   85.8 GFlops   62.5 GB/s  ulps(fft  1.7,ps 4261.8 ) [OK]
 FFT+PS+SM(   512)  109.0 GFlops   71.2 GB/s  ulps(fft  1.8,ps 4327.4) [OK]
 FFT+PS+SM(  1024)  133.7 GFlops   79.3 GB/s  ulps(fft  2.1,ps 4727.6) [OK]
 FFT+PS+SM(  2048)  136.9 GFlops   74.3 GB/s  ulps(fft  2.2,ps 4921.2) [OK]
 FFT+PS+SM(  4096)  141.5 GFlops   70.7 GB/s  ulps(fft  2.2,ps 4764.3) [OK]
 FFT+PS+SM(  8192)  136.7 GFlops   63.4 GB/s  ulps(fft  2.6,ps 5278.8 ) [OK]
 FFT+PS+SM( 16384)  141.3 GFlops   61.1 GB/s  ulps(fft  2.6,ps 5357.5) [OK]
 FFT+PS+SM( 32768)  134.9 GFlops   54.6 GB/s  ulps(fft  2.3,ps 4992.8 ) [OK]
 FFT+PS+SM( 65536)  132.6 GFlops   50.5 GB/s  ulps(fft  2.0,ps 4604.3) [OK]
 FFT+PS+SM(131072)  130.5 GFlops   46.9 GB/s  ulps(fft  2.7,ps 5392.8 ) [OK]

--- End quote ---

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version