Forum > GPU crunching
[Split] PowerSpectrum Unit Test
_heinz:
Vista64
~~~~
Stopping Boinc...
PowerSpectrumTest6.exe -device 0
Device: GeForce GTX 470, 810 MHz clock, 1248 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
PwrSpec< 64> 20.4 GFlops 81.6 GB/s 0.0ulps
SumMax ( 64) 1.4 GFlops 6.0 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 4.6 GFlops 18.7 GB/s
GetPowerSpectrum() choice for Opt1: 256 thrds/block
256 threads: 30.0 GFlops 119.9 GB/s 121.7ulps
Opt1 (PSmod3+SM): 256 thrds/block
PowerSpectrumSumMax array pinned in host memory.
256 threads, fftlen 64: (worst case: full summax copy)
7.1 GFlops 28.8 GB/s 121.7ulps
Every ifft average & peak OK
256 threads, fftlen 64: (best case, nothing to update)
11.1 GFlops 45.1 GB/s 121.7ulps
PowerSpectrumTest6.exe -device 1
Device: GeForce GTX 470, 810 MHz clock, 1249 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
PwrSpec< 64> 20.4 GFlops 81.8 GB/s 0.0ulps
SumMax ( 64) 1.4 GFlops 5.9 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 4.6 GFlops 18.5 GB/s
GetPowerSpectrum() choice for Opt1: 256 thrds/block
256 threads: 30.1 GFlops 120.6 GB/s 121.7ulps
Opt1 (PSmod3+SM): 256 thrds/block
PowerSpectrumSumMax array pinned in host memory.
256 threads, fftlen 64: (worst case: full summax copy)
7.3 GFlops 29.7 GB/s 121.7ulps
Every ifft average & peak OK
256 threads, fftlen 64: (best case, nothing to update)
11.2 GFlops 45.2 GB/s 121.7ulps
.
Done
Restarting Boinc...
Jason G:
Thanks Richard, perryjay & Heinz.
All fit with the models so far.
The Compute capability 1.1, devices, Richard & Perryjay, are IMO doing their memory bound best with the powerspectrum, ~matching stock 'PwrSpec' speed for that, then 'magically' lifting with the reductions (summax) for Opt1 worst case. I beleive that must be purely a result of the memory transfer hiding, since the compute density of the reduction hasn't changed from O(logn).
@Heinz, glad to see your numbers back up to where they should be. I reckon that's scaling well against my OC'd 480:
Stock (PS+Summax): 5.9 GFlops , 23.7 GB/s
worse (opt1): 10.0 GFlops , 40.4 GB/s
best (opt1): 16.0 GFlops , 64.8 GB/s
SciManStev:
Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
PwrSpec< 64> 28.1 GFlops 112.5 GB/s 0.0ulps
SumMax ( 64) 2.3 GFlops 9.6 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 7.2 GFlops 29.2 GB/s
GetPowerSpectrum() choice for Opt1: 256 thrds/block
256 threads: 41.4 GFlops 165.6 GB/s 121.7ulps
Opt1 (PSmod3+SM): 256 thrds/block
PowerSpectrumSumMax array pinned in host memory.
256 threads, fftlen 64: (worst case: full summax copy)
12.7 GFlops 51.5 GB/s 121.7ulps
Every ifft average & peak OK
256 threads, fftlen 64: (best case, nothing to update)
16.1 GFlops 65.3 GB/s 121.7ulps
Steve
Jason G:
Ouch! 27% more throughput on worst case Opt1 than mine (12.7 Vs 10 GFlops) ;D despite slower powerspectrum (memory), that can't be core (same 'best' case @16.1) .... PCIe Bus overclocked ? (ahh, faster host memory too I suppose)
SciManStev:
My CPU memory is at 1774 MHz. My PCIe buss is slightly over clocked. I adjusted my GPU RAM to 1900 MHz. There is still room for more. I am on my last GPU wu for Einstein. There aren't any available at the moment. Piggy hit the #5 spot for the top rigs at Einstein with a RAC of over 14,000. There is nothing slow about Piggy. It does a fantastic job at running Starry Night Pro Plus astronomy software. I can't wait to get back to SETI crunching!
Steve
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version