Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (42/62) > >>

Miep:

--- Quote from: Jason G on 21 Dec 2010, 12:46:29 pm ---Best case requires few memory transfers back to the host CPU ( only one best spike & no detections)  ;)

[Edit:] Worst case would be a best signal + numdatapoints/fftlen detections, i.e. not really possible since we're limited to 30 detections, so wouldn't bother transferring more than the first 30 ( ... unlike stock...)

--- End quote ---

Now he tells us ::) ;)
So normal data would perform somewhere in between - any info on the distribution between the two endpoints?

Jason G:

--- Quote from: Miep on 21 Dec 2010, 01:04:35 pm ---So normal data would perform somewhere in between - any info on the distribution between the two endpoints?

--- End quote ---

Yes.  Actual performance will fall somewhere in between best & worst cases ...  :P ... Though initially I'll be using 'worst case' code for rapid code  improvements to working prototypes ( Size 64 already in field testing in x33 ), best case code is a glass ceiling to aim for with 'advanced coding'

[Edit:] size 64 (worst case implementation) provides ~3% performance improvement to 'shorties' on GTX 480

[Edit2:] oh, that was 'old' worst case code, nevermind  ::)

PatrickV2:
I re-ran the tests on my rig (Q6600/8GB/8800GTX) under both Win764 as well as WinXP32.

First WinXP32:

Device: GeForce 8800 GTX, 1350 MHz clock, 768 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    2.2 GFlops    9.6 GB/s
 PS+SuMx(    16) [OK]    2.6 GFlops   11.1 GB/s
 PS+SuMx(    32) [OK]    2.6 GFlops   10.5 GB/s
 PS+SuMx(    64) [OK]    4.3 GFlops   17.5 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    3.6   15.8 121.7 [OK]    6.2   27.2 121.7
 PS+SuMx(    16)    4.5   18.8 121.7 [OK]    6.1   25.5 121.7
 PS+SuMx(    32)    4.9   20.1 121.7 [OK]    5.8   23.8 121.7
 PS+SuMx(    64)    6.6   26.5 121.7 [OK]    7.4   30.0 121.7


Then Win7-64:

Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    2.1 GFlops    9.0 GB/s
 PS+SuMx(    16) [OK]    2.4 GFlops   10.2 GB/s
 PS+SuMx(    32) [OK]    2.4 GFlops    9.8 GB/s
 PS+SuMx(    64) [OK]    3.9 GFlops   15.6 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    3.4   14.9 121.7 [OK]    6.1   26.8 121.7
 PS+SuMx(    16)    4.2   17.5 121.7 [OK]    6.0   25.3 121.7
 PS+SuMx(    32)    4.6   18.7 121.7 [OK]    5.8   23.7 121.7
 PS+SuMx(    64)    5.9   24.0 121.7 [OK]    7.4   29.8 121.7

As always, hope it helps. ;)

Regards, Patrick.

EDIT: Modified to use no smilies due to the 'cool' smilies in the test-results.

Ghost0210:
Win7x64 results:


Device: GeForce GTX 465, 1215 MHz clock, 994 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8 ) [OK]    2.4 GFlops   10.7 GB/s
 PS+SuMx(    16) [OK]    3.1 GFlops   13.0 GB/s
 PS+SuMx(    32) [OK]    2.6 GFlops   10.6 GB/s
 PS+SuMx(    64) [OK]    4.0 GFlops   16.1 GB/s


Opt1: 256 thrds/block
                                worst case                    best case
                            GFlps  GB/s ulps             GFlps  GB/s ulps
 PS+SuMx(     8 )    4.9   21.4 121.7 [OK]   13.1   57.4 121.7
 PS+SuMx(    16)    6.5   27.2 121.7 [OK]   12.3   51.4 121.7
 PS+SuMx(    32)    7.8   31.8 121.7 [OK]   11.9   48.7 121.7
 PS+SuMx(    64)    8.6   34.8 121.7 [OK]   11.6   47.0 121.7

M_M:
GTX460 1GB OC Core=880MHz Mem=2000MHz Win7-64bit

C:\Test>powerspectrumtest7

Device: GeForce GTX 460, 810 MHz clock, 993 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #7 (Faster reductions)
Stock:
 PS+SuMx(     8) [OK]    3.4 GFlops   14.7 GB/s
 PS+SuMx(    16) [OK]    3.5 GFlops   14.7 GB/s
 PS+SuMx(    32) [OK]    2.3 GFlops    9.6 GB/s
 PS+SuMx(    64) [OK]    3.5 GFlops   14.3 GB/s


Opt1: 256 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    6.5   28.4 121.7 [OK]   13.5   59.1 121.7
 PS+SuMx(    16)    7.7   32.3 121.7 [OK]   12.6   52.8 121.7
 PS+SuMx(    32)    8.5   34.8 121.7 [OK]   12.2   49.8 121.7
 PS+SuMx(    64)    9.0   36.3 121.7 [OK]   12.3   49.6 121.7

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version