Forum > GPU crunching
[Split] PowerSpectrum Unit Test
perryjay:
Hey guys, I done something right for a change!!! :) ::) Looking forward to the next test. This time I'll know to turn it off!
PatrickV2:
--- Quote from: Jason G on 25 Dec 2010, 08:53:02 am ---
--- Quote from: PatrickV2 on 25 Dec 2010, 05:21:25 am ---Ran test #9 on my Q6600/8GB/8800GTX, under both WinXP-32 as well as Win7-64.
--- End quote ---
Excellent, not broken on the 8800. Last hurdle for that code area cleared & can move on :D
--- End quote ---
Wonderful to hear that. As always, looking forward to the next bit of execution-magic. ;)
Regards, Patrick.
Jason G:
Will take me some time to cook up the next test, working out this streaming stuff.
Mixed results with kernel streaming so far, appearing to benefit my smaller highly optimised kernels more over the stock-ish larger sizes (don't know why yet, and dividing further into additional streams seems to slow it down again ... tricky! ):
As with test #9 (single stream)
--- Quote ---Opt1 (worst case): 256 thrds/block, 1 x 1048576 element streams
FFT+PS+SM( 8) 19.2 GFlops 33.8 GB/s ulps(fft 1.2,ps 4324.2) [OK]
FFT+PS+SM( 16) 36.8 GFlops 50.3 GB/s ulps(fft 1.6,ps 4326.2) [OK]
FFT+PS+SM( 32) 60.7 GFlops 67.8 GB/s ulps(fft 1.3,ps 4003.6) [OK]
FFT+PS+SM( 64) 86.2 GFlops 81.6 GB/s ulps(fft 1.5,ps 4270.2) [OK]
FFT+PS+SM( 128) 92.5 GFlops 76.1 GB/s ulps(fft 1.7,ps 4347.9) [OK]
FFT+PS+SM( 256) 135.0 GFlops 98.3 GB/s ulps(fft 1.7,ps 4261.8) [OK]
FFT+PS+SM( 512) 172.0 GFlops 112.4 GB/s ulps(fft 1.8,ps 4327.4) [OK]
FFT+PS+SM( 1024) 214.7 GFlops 127.3 GB/s ulps(fft 2.1,ps 4727.6) [OK]
FFT+PS+SM( 2048) 225.9 GFlops 122.6 GB/s ulps(fft 2.2,ps 4921.2) [OK]
FFT+PS+SM( 4096) 232.3 GFlops 116.2 GB/s ulps(fft 2.2,ps 4764.3) [OK]
FFT+PS+SM( 8192) 226.0 GFlops 104.8 GB/s ulps(fft 2.6,ps 5278.8) [OK]
FFT+PS+SM( 16384) 221.5 GFlops 95.8 GB/s ulps(fft 2.6,ps 5357.5) [OK]
FFT+PS+SM( 32768) 213.1 GFlops 86.3 GB/s ulps(fft 2.3,ps 4992.8) [OK]
FFT+PS+SM( 65536) 210.5 GFlops 80.2 GB/s ulps(fft 2.0,ps 4604.3) [OK]
FFT+PS+SM(131072) 202.6 GFlops 72.8 GB/s ulps(fft 2.7,ps 5392.8) [OK]
--- End quote ---
2x streams:
--- Quote ---Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
FFT+PS+SM( 8) 26.7 GFlops 47.2 GB/s ulps(fft 1.2,ps 4324.2) [OK]
FFT+PS+SM( 16) 66.9 GFlops 91.3 GB/s ulps(fft 1.6,ps 4326.2) [OK]
FFT+PS+SM( 32) 90.9 GFlops 101.5 GB/s ulps(fft 1.3,ps 4003.6) [OK]
FFT+PS+SM( 64) 105.0 GFlops 99.4 GB/s ulps(fft 1.5,ps 4270.2) [OK]
FFT+PS+SM( 128) 94.0 GFlops 77.3 GB/s ulps(fft 1.7,ps 4347.9) [OK]
FFT+PS+SM( 256) 135.9 GFlops 98.9 GB/s ulps(fft 1.7,ps 4261.8) [OK]
FFT+PS+SM( 512) 167.9 GFlops 109.7 GB/s ulps(fft 1.8,ps 4327.4) [OK]
FFT+PS+SM( 1024) 198.4 GFlops 117.6 GB/s ulps(fft 2.1,ps 4727.6) [OK]
FFT+PS+SM( 2048) 209.1 GFlops 113.4 GB/s ulps(fft 2.2,ps 4921.2) [OK]
FFT+PS+SM( 4096) 209.9 GFlops 105.0 GB/s ulps(fft 2.2,ps 4764.3) [OK]
FFT+PS+SM( 8192) 204.8 GFlops 95.0 GB/s ulps(fft 2.6,ps 5278.8) [OK]
FFT+PS+SM( 16384) 205.0 GFlops 88.6 GB/s ulps(fft 2.6,ps 5357.5) [OK]
FFT+PS+SM( 32768) 187.5 GFlops 75.9 GB/s ulps(fft 2.3,ps 4992.8) [OK]
FFT+PS+SM( 65536) 195.2 GFlops 74.4 GB/s ulps(fft 2.0,ps 4604.3) [OK]
FFT+PS+SM(131072) 172.5 GFlops 62.0 GB/s ulps(fft 2.7,ps 5392.8) [OK]
--- End quote ---
Jason G:
Updated first Post:
--- Quote ---Update: PowerPsectrum Test #10 (attached)
- summary performance of FFT pipeline improvements against stock, for assessing overall progress
- can vary, so may need a few runs, just to check stability of result
- Please use DLLs provided with Test#9
--- End quote ---
arkayn:
--- Code: ---Device: GeForce GTX 460, 1600 MHz clock, 768 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
Processing... Done!
Compute Thoughput GFlops Avg( 67.27) Peak( 111.28) Min( 9.42) [OK]
Memory thoughput GB/s Avg( 36.72) Peak( 55.70) Min( 15.41)
Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
revert to single stream from size 512
Processing... Done!
Compute thoughput [GFlops] -
Avg( 84.36, 1.25x) Peak( 131.47, 1.18x) Min( 31.13, 3.30x) [OK]
Memory thoughput [GB/s] -
Avg( 51.22, 1.39x) Peak( 66.16, 1.19x) Min( 34.18, 2.22x)
--- End code ---
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version