Forum > GPU crunching
[Split] PowerSpectrum Unit Test
Miep:
--- Quote from: Jason G on 21 Dec 2010, 12:46:29 pm ---Best case requires few memory transfers back to the host CPU ( only one best spike & no detections) ;)
[Edit:] Worst case would be a best signal + numdatapoints/fftlen detections, i.e. not really possible since we're limited to 30 detections, so wouldn't bother transferring more than the first 30 ( ... unlike stock...)
--- End quote ---
Now he tells us ::) ;)
So normal data would perform somewhere in between - any info on the distribution between the two endpoints?
Jason G:
--- Quote from: Miep on 21 Dec 2010, 01:04:35 pm ---So normal data would perform somewhere in between - any info on the distribution between the two endpoints?
--- End quote ---
Yes. Actual performance will fall somewhere in between best & worst cases ... :P ... Though initially I'll be using 'worst case' code for rapid code improvements to working prototypes ( Size 64 already in field testing in x33 ), best case code is a glass ceiling to aim for with 'advanced coding'
[Edit:] size 64 (worst case implementation) provides ~3% performance improvement to 'shorties' on GTX 480
[Edit2:] oh, that was 'old' worst case code, nevermind ::)
PatrickV2:
I re-ran the tests on my rig (Q6600/8GB/8800GTX) under both Win764 as well as WinXP32.
First WinXP32:
Device: GeForce 8800 GTX, 1350 MHz clock, 768 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
PS+SuMx( 8) [OK] 2.2 GFlops 9.6 GB/s
PS+SuMx( 16) [OK] 2.6 GFlops 11.1 GB/s
PS+SuMx( 32) [OK] 2.6 GFlops 10.5 GB/s
PS+SuMx( 64) [OK] 4.3 GFlops 17.5 GB/s
Opt1: 64 thrds/block
worst case best case
GFlps GB/s ulps GFlps GB/s ulps
PS+SuMx( 8) 3.6 15.8 121.7 [OK] 6.2 27.2 121.7
PS+SuMx( 16) 4.5 18.8 121.7 [OK] 6.1 25.5 121.7
PS+SuMx( 32) 4.9 20.1 121.7 [OK] 5.8 23.8 121.7
PS+SuMx( 64) 6.6 26.5 121.7 [OK] 7.4 30.0 121.7
Then Win7-64:
Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
PS+SuMx( 8) [OK] 2.1 GFlops 9.0 GB/s
PS+SuMx( 16) [OK] 2.4 GFlops 10.2 GB/s
PS+SuMx( 32) [OK] 2.4 GFlops 9.8 GB/s
PS+SuMx( 64) [OK] 3.9 GFlops 15.6 GB/s
Opt1: 64 thrds/block
worst case best case
GFlps GB/s ulps GFlps GB/s ulps
PS+SuMx( 8) 3.4 14.9 121.7 [OK] 6.1 26.8 121.7
PS+SuMx( 16) 4.2 17.5 121.7 [OK] 6.0 25.3 121.7
PS+SuMx( 32) 4.6 18.7 121.7 [OK] 5.8 23.7 121.7
PS+SuMx( 64) 5.9 24.0 121.7 [OK] 7.4 29.8 121.7
As always, hope it helps. ;)
Regards, Patrick.
EDIT: Modified to use no smilies due to the 'cool' smilies in the test-results.
Ghost0210:
Win7x64 results:
Device: GeForce GTX 465, 1215 MHz clock, 994 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
PS+SuMx( 8 ) [OK] 2.4 GFlops 10.7 GB/s
PS+SuMx( 16) [OK] 3.1 GFlops 13.0 GB/s
PS+SuMx( 32) [OK] 2.6 GFlops 10.6 GB/s
PS+SuMx( 64) [OK] 4.0 GFlops 16.1 GB/s
Opt1: 256 thrds/block
worst case best case
GFlps GB/s ulps GFlps GB/s ulps
PS+SuMx( 8 ) 4.9 21.4 121.7 [OK] 13.1 57.4 121.7
PS+SuMx( 16) 6.5 27.2 121.7 [OK] 12.3 51.4 121.7
PS+SuMx( 32) 7.8 31.8 121.7 [OK] 11.9 48.7 121.7
PS+SuMx( 64) 8.6 34.8 121.7 [OK] 11.6 47.0 121.7
M_M:
GTX460 1GB OC Core=880MHz Mem=2000MHz Win7-64bit
C:\Test>powerspectrumtest7
Device: GeForce GTX 460, 810 MHz clock, 993 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
PS+SuMx( 8) [OK] 3.4 GFlops 14.7 GB/s
PS+SuMx( 16) [OK] 3.5 GFlops 14.7 GB/s
PS+SuMx( 32) [OK] 2.3 GFlops 9.6 GB/s
PS+SuMx( 64) [OK] 3.5 GFlops 14.3 GB/s
Opt1: 256 thrds/block
worst case best case
GFlps GB/s ulps GFlps GB/s ulps
PS+SuMx( 8) 6.5 28.4 121.7 [OK] 13.5 59.1 121.7
PS+SuMx( 16) 7.7 32.3 121.7 [OK] 12.6 52.8 121.7
PS+SuMx( 32) 8.5 34.8 121.7 [OK] 12.2 49.8 121.7
PS+SuMx( 64) 9.0 36.3 121.7 [OK] 12.3 49.6 121.7
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version