Forum > GPU crunching
[Split] PowerSpectrum Unit Test
Miep:
For sake of completeness:
Device: Quadro FX 570M, 950 MHz clock, 242 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #5
Stock:
PwrSpec< 64> 4.5 GFlops 17.8 GB/s 1183.3ulps
SumMax ( 64) 0.2 GFlops 1.0 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 0.4 GFlops 1.7 GB/s
GetPowerSpectrum() choice for Opt1: 64 thrds/block
64 threads: 4.4 GFlops 17.8 GB/s 121.7ulps
Opt1 (PSmod3+SM): 64 thrds/block
64 threads, fftlen 64: (worst case: full summax copy)
1.3 GFlops 5.3 GB/s 121.7ulps
Every ifft average & peak OK
64 threads, fftlen 64: (best case, nothing to update)
1.4 GFlops 5.8 GB/s 121.7ulps
Some 10% difference between the two bottom ones.
Jason G:
Cheers,
Analysing...
Average, peak calcs, thread-count hueristic: OK
worst case speedup: (1.3-0.4)/0.4 ~225% (3.25x).. Winner! ;D
best case speedup: (1.4-0.4)/0.4 ~250% (3.5x)
Double checking those ridiculous numbers: (mistakes always possible ;) )
1.3GFlops(optimised) / 0.5 GFlops(Stock) definitely = 3.25x (325% of stock throughput)
The perecentage of optimised throughput that is speedup is then 0.9 GFlops / 1.3 GFlops ~= 69 percent of Opt throughput is Bonus. Speedup component is 225% of the stock throughput.
#Stock is doing something that GPU doesn't like :-\
Miep:
I reran a few times, getting 0.8-0.9 1.3 1.4-1.5 now.
i.e. higher baseline, optimazation values stable. Can do some statistics tomorrow.
edit: that 0.4 seems to have been exceptionally low (and no, I didn't have the GPU crunching by accident :P )
Jason G:
OK, non-critical unless I make computation mistakes ( I was mostly concerned here to not make code slower...). Stock / x32f code there is doing something your GPU doesn't like IMO.
Was that quadro 'integrated & using some portion of system memory ? or does it use dedicated memory ?
SciManStev:
Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #5
Stock:
PwrSpec< 64> 28.4 GFlops 113.7 GB/s 0.0ulps
SumMax ( 64) 2.3 GFlops 9.7 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 7.4 GFlops 29.9 GB/s
GetPowerSpectrum() choice for Opt1: 256 thrds/block
256 threads: 41.4 GFlops 165.5 GB/s 121.7ulps
Opt1 (PSmod3+SM): 256 thrds/block
256 threads, fftlen 64: (worst case: full summax copy)
10.9 GFlops 44.0 GB/s 121.7ulps
Every ifft average & peak OK
256 threads, fftlen 64: (best case, nothing to update)
16.2 GFlops 65.4 GB/s 121.7ulps
This was much easier than typing it out. Thanks, Richard.
Steve
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version