Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (34/62) > >>

_heinz:
Vista64
~~~~
Stopping Boinc...
PowerSpectrumTest6.exe -device 0

Device: GeForce GTX 470, 810 MHz clock, 1248 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   20.4 GFlops   81.6 GB/s   0.0ulps

 SumMax (    64)    1.4 GFlops    6.0 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    4.6 GFlops   18.7 GB/s


GetPowerSpectrum() choice for Opt1: 256 thrds/block
    256 threads:       30.0 GFlops  119.9 GB/s 121.7ulps


Opt1 (PSmod3+SM): 256 thrds/block
PowerSpectrumSumMax array pinned in host memory.
  256 threads, fftlen 64: (worst case: full summax copy)
         7.1 GFlops   28.8 GB/s 121.7ulps
Every ifft average & peak OK
  256 threads, fftlen 64: (best case, nothing to update)
        11.1 GFlops   45.1 GB/s 121.7ulps


PowerSpectrumTest6.exe -device 1

Device: GeForce GTX 470, 810 MHz clock, 1249 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   20.4 GFlops   81.8 GB/s   0.0ulps

 SumMax (    64)    1.4 GFlops    5.9 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    4.6 GFlops   18.5 GB/s


GetPowerSpectrum() choice for Opt1: 256 thrds/block
    256 threads:       30.1 GFlops  120.6 GB/s 121.7ulps


Opt1 (PSmod3+SM): 256 thrds/block
PowerSpectrumSumMax array pinned in host memory.
  256 threads, fftlen 64: (worst case: full summax copy)
         7.3 GFlops   29.7 GB/s 121.7ulps
Every ifft average & peak OK
  256 threads, fftlen 64: (best case, nothing to update)
        11.2 GFlops   45.2 GB/s 121.7ulps


.
Done
Restarting Boinc...

Jason G:
Thanks Richard, perryjay & Heinz.

All fit with the models so far.

The Compute capability 1.1, devices, Richard & Perryjay,  are IMO doing their memory bound best with the powerspectrum, ~matching stock 'PwrSpec' speed for that, then 'magically' lifting with the reductions (summax)  for Opt1 worst case.  I beleive that must be purely a result of the memory transfer hiding, since the compute density of the reduction hasn't changed from O(logn).

@Heinz, glad to see your numbers back up to where they should be.  I reckon that's scaling well against my OC'd 480:
Stock (PS+Summax): 5.9 GFlops  , 23.7 GB/s
worse (opt1):          10.0 GFlops , 40.4 GB/s
best   (opt1):          16.0 GFlops , 64.8 GB/s

SciManStev:

Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   28.1 GFlops  112.5 GB/s   0.0ulps

 SumMax (    64)    2.3 GFlops    9.6 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    7.2 GFlops   29.2 GB/s


GetPowerSpectrum() choice for Opt1: 256 thrds/block
    256 threads:       41.4 GFlops  165.6 GB/s 121.7ulps


Opt1 (PSmod3+SM): 256 thrds/block
PowerSpectrumSumMax array pinned in host memory.
  256 threads, fftlen 64: (worst case: full summax copy)
        12.7 GFlops   51.5 GB/s 121.7ulps
Every ifft average & peak OK
  256 threads, fftlen 64: (best case, nothing to update)
        16.1 GFlops   65.3 GB/s 121.7ulps


Steve

Jason G:
Ouch! 27% more throughput on worst case Opt1 than mine  (12.7 Vs 10 GFlops) ;D despite slower powerspectrum (memory), that can't be core (same 'best' case @16.1) .... PCIe Bus overclocked ? (ahh, faster host memory too I suppose)

SciManStev:
My CPU memory is at 1774 MHz. My PCIe buss is slightly over clocked. I adjusted my GPU RAM to 1900 MHz. There is still room for more. I am on my last GPU wu for Einstein. There aren't any available at the moment. Piggy hit the #5 spot for the top rigs at Einstein with a RAC of over 14,000. There is nothing slow about Piggy. It does a fantastic job at running Starry Night Pro Plus astronomy software. I can't wait to get back to SETI crunching!

Steve

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version