Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (51/62) > >>

Miep:
All OK worst 0-0.4 faster than stock, best another .1-.4 faster than worst.
about 5 runs.

arkayn:
Device: GeForce GTX 460, 1600 MHz clock, 768 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.


--- Code: ---PS+SuMx( 65536) [OK]   12.4 GFlops   49.6 GB/s
--- End code ---


--- Code: ---PS+SuMx( 65536)   16.6   66.4 121.7 [OK]   17.7   70.7 121.7
--- End code ---

Jason G:

--- Quote from: PatrickV2 on 23 Dec 2010, 04:06:32 pm --- PS+SuMx(    64)
 FAILURE in c:/[Projects]/LunaticsUnited/Tools/Tests/PowerSpectrum/main.cpp, lin
e 456
--- End quote ---

Wow Patrick, clearly something I'm doing in size 64 has changed (and only appears on cc1.0  :o), will check.  we're going to need to fix that before moving on.

[Later:] @Patrick: when you can, please reboot & try the attached fix attempt ( for compute cap 1.0)... If OK on that card I'll be able to avoid breaking that again...

[Removed attachment]

PatrickV2:

--- Quote from: Jason G on 24 Dec 2010, 12:47:51 am ---
--- Quote from: PatrickV2 on 23 Dec 2010, 04:06:32 pm --- PS+SuMx(    64)
 FAILURE in c:/[Projects]/LunaticsUnited/Tools/Tests/PowerSpectrum/main.cpp, lin
e 456
--- End quote ---

Wow Patrick, clearly something I'm doing in size 64 has changed (and only appears on cc1.0  :o), will check.  we're going to need to fix that before moving on.

[Later:] @Patrick: when you can, please reboot & try the attached fix attempt ( for compute cap 1.0)... If OK on that card I'll be able to avoid breaking that again...

--- End quote ---

It looks like you fixed it, full loggings for completion sake:

WinXP-32:


--- Code: ---Device: GeForce 8800 GTX, 1350 MHz clock, 768 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #8 (Sanity Check)
Stock:
 PS+SuMx(     8) [OK]    2.2 GFlops    9.7 GB/s
 PS+SuMx(    16) [OK]    2.6 GFlops   11.1 GB/s
 PS+SuMx(    32) [OK]    2.6 GFlops   10.5 GB/s
 PS+SuMx(    64) [OK]    4.3 GFlops   17.6 GB/s
 PS+SuMx(   128) [OK]    6.7 GFlops   26.9 GB/s
 PS+SuMx(   256) [OK]    9.0 GFlops   36.0 GB/s
 PS+SuMx(   512) [OK]   11.2 GFlops   44.7 GB/s
 PS+SuMx(  1024) [OK]   11.8 GFlops   47.4 GB/s
 PS+SuMx(  2048) [OK]   13.5 GFlops   53.9 GB/s
 PS+SuMx(  4096) [OK]   13.2 GFlops   52.6 GB/s
 PS+SuMx(  8192) [OK]   14.4 GFlops   57.5 GB/s
 PS+SuMx( 16384) [OK]   14.1 GFlops   56.5 GB/s
 PS+SuMx( 32768) [OK]   14.9 GFlops   59.5 GB/s
 PS+SuMx( 65536) [OK]   15.3 GFlops   61.2 GB/s
 PS+SuMx(131072) [OK]   12.0 GFlops   47.8 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    3.6   15.8 121.7 [OK]    6.2   27.2 121.7
 PS+SuMx(    16)    4.5   18.8 121.7 [OK]    6.1   25.5 121.7
 PS+SuMx(    32)    4.9   20.1 121.7 [OK]    5.8   23.8 121.7
 PS+SuMx(    64)    6.5   26.5 121.7 [OK]    7.4   30.0 121.7
 PS+SuMx(   128)    7.2   28.8 121.7 [OK]    7.8   31.3 121.7
 PS+SuMx(   256)    9.4   37.8 121.7 [OK]   10.2   40.7 121.7
 PS+SuMx(   512)   11.6   46.3 121.7 [OK]   12.4   49.7 121.7
 PS+SuMx(  1024)   12.1   48.5 121.7 [OK]   12.9   51.6 121.7
 PS+SuMx(  2048)   13.7   54.9 121.7 [OK]   14.6   58.5 121.7
 PS+SuMx(  4096)   13.4   53.5 121.7 [OK]   14.2   56.8 121.7
 PS+SuMx(  8192)   14.5   58.2 121.7 [OK]   15.5   62.0 121.7
 PS+SuMx( 16384)   14.3   57.1 121.7 [OK]   15.2   60.9 121.7
 PS+SuMx( 32768)   15.1   60.3 121.7 [OK]   16.1   64.4 121.7
 PS+SuMx( 65536)   15.5   62.0 121.7 [OK]   16.5   66.2 121.7
 PS+SuMx(131072)   12.1   48.2 121.7 [OK]   12.7   50.8 121.7

--- End code ---

Win7-64:


--- Code: ---Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #8 (Sanity Check)
Stock:
 PS+SuMx(     8) [OK]    2.0 GFlops    8.7 GB/s
 PS+SuMx(    16) [OK]    2.4 GFlops   10.2 GB/s
 PS+SuMx(    32) [OK]    2.4 GFlops    9.7 GB/s
 PS+SuMx(    64) [OK]    3.9 GFlops   15.8 GB/s
 PS+SuMx(   128) [OK]    5.6 GFlops   22.7 GB/s
 PS+SuMx(   256) [OK]    7.2 GFlops   29.0 GB/s
 PS+SuMx(   512) [OK]    8.7 GFlops   34.7 GB/s
 PS+SuMx(  1024) [OK]    9.0 GFlops   36.0 GB/s
 PS+SuMx(  2048) [OK]   10.0 GFlops   40.1 GB/s
 PS+SuMx(  4096) [OK]    9.8 GFlops   39.0 GB/s
 PS+SuMx(  8192) [OK]   10.4 GFlops   41.6 GB/s
 PS+SuMx( 16384) [OK]   10.2 GFlops   40.7 GB/s
 PS+SuMx( 32768) [OK]   10.8 GFlops   43.2 GB/s
 PS+SuMx( 65536) [OK]   10.9 GFlops   43.6 GB/s
 PS+SuMx(131072) [OK]    9.0 GFlops   36.1 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    3.4   14.9 121.7 [OK]    6.1   26.8 121.7
 PS+SuMx(    16)    4.2   17.6 121.7 [OK]    6.1   25.4 121.7
 PS+SuMx(    32)    4.6   18.7 121.7 [OK]    5.8   23.7 121.7
 PS+SuMx(    64)    6.0   24.2 121.7 [OK]    7.3   29.4 121.7
 PS+SuMx(   128)    6.5   26.0 121.7 [OK]    7.7   31.1 121.7
 PS+SuMx(   256)    8.3   33.3 121.7 [OK]   10.1   40.4 121.7
 PS+SuMx(   512)    9.9   39.8 121.7 [OK]   12.3   49.4 121.7
 PS+SuMx(  1024)   10.2   40.8 121.7 [OK]   12.8   51.3 121.7
 PS+SuMx(  2048)   11.3   45.2 121.7 [OK]   14.5   58.2 121.7
 PS+SuMx(  4096)   11.2   44.6 121.7 [OK]   14.1   56.3 121.7
 PS+SuMx(  8192)   12.1   48.3 121.7 [OK]   15.4   61.5 121.7
 PS+SuMx( 16384)   11.7   46.8 121.7 [OK]   15.1   60.4 121.7
 PS+SuMx( 32768)   12.2   48.8 121.7 [OK]   16.0   63.8 121.7
 PS+SuMx( 65536)   12.5   50.0 121.7 [OK]   16.4   65.8 121.7
 PS+SuMx(131072)   10.1   40.5 121.7 [OK]   12.6   50.5 121.7

--- End code ---

Regards, Patrick.

Jason G:
Phew!  cool, thanks  ;D

Not much headroom on that chip either, but I'll be happy with that small fraction improvement on the oldest cards for now. 

Moving onto test #9 soon, will add in the FFTs, then will stream the test kernels after that, just to see what that does... Progress at last  ;D

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version