Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (50/62) > >>

Jason G:
I calculate 122.24 GB/s theoretical max (matching GPU-z listing), so 98.2 seems pretty good.  I'll look at what that size is doing & see if I can spread some performance around up in that area.

[Edit:] I get the impression we might be best seeing what streaming those kernels will do sometime soon  :-\  too many new fan-dangled features in this stuff  ;)

Claggy:
OK here on 9800GTX+ (5 runs) GPU usage up from stock's ~80% to ~95% on Opt1:

Best Stock result:

--- Quote ---PS+SuMx( 65536) [OK]   11.6 GFlops   46.5 GB/s
--- End quote ---

Opt1 Best Result:

--- Quote ---Opt1: 64 thrds/block
                                worst case                 best case
                           GFlps   GB/s   ulps         GFlps   GB/s  ulps
PS+SuMx( 65536)   13.0   52.1 121.7 [OK]   15.6   62.5 121.7
--- End quote ---

and O.K on 128Mb 8400M GS (5 runs):



--- Quote ---Device: GeForce 8400M GS, 800 MHz clock, 114 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #8 (Sanity Check)
Stock:
 PS+SuMx(     8) [OK]    0.3 GFlops    1.3 GB/s
 PS+SuMx(    16) [OK]    0.3 GFlops    1.2 GB/s
 PS+SuMx(    32) [OK]    0.2 GFlops    0.9 GB/s
 PS+SuMx(    64) [OK]    0.4 GFlops    1.5 GB/s
 PS+SuMx(   128) [OK]    0.5 GFlops    2.2 GB/s
 PS+SuMx(   256) [OK]    0.7 GFlops    2.8 GB/s
 PS+SuMx(   512) [OK]    0.8 GFlops    3.4 GB/s
 PS+SuMx(  1024) [OK]    0.9 GFlops    3.5 GB/s
 PS+SuMx(  2048) [OK]    1.0 GFlops    4.0 GB/s
 PS+SuMx(  4096) [OK]    0.9 GFlops    3.7 GB/s
 PS+SuMx(  8192) [OK]    1.0 GFlops    4.0 GB/s
 PS+SuMx( 16384) [OK]    1.0 GFlops    3.9 GB/s
 PS+SuMx( 32768) [OK]    1.0 GFlops    4.1 GB/s
 PS+SuMx( 65536) [OK]    1.1 GFlops    4.2 GB/s
 PS+SuMx(131072) [OK]    1.1 GFlops    4.3 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    0.4    1.9 121.7 [OK]    0.5    2.1 121.7
 PS+SuMx(    16)    0.4    1.8 121.7 [OK]    0.5    1.9 121.7
 PS+SuMx(    32)    0.4    1.7 121.7 [OK]    0.4    1.7 121.7
 PS+SuMx(    64)    0.5    2.1 121.7 [OK]    0.5    2.2 121.7
 PS+SuMx(   128)    0.6    2.2 121.7 [OK]    0.6    2.3 121.7
 PS+SuMx(   256)    0.7    2.9 121.7 [OK]    0.7    3.0 121.7
 PS+SuMx(   512)    0.9    3.5 121.7 [OK]    0.9    3.6 121.7
 PS+SuMx(  1024)    0.9    3.5 121.7 [OK]    0.9    3.7 121.7
 PS+SuMx(  2048)    1.0    4.0 121.7 [OK]    1.0    4.2 121.7
 PS+SuMx(  4096)    0.9    3.8 121.7 [OK]    1.0    3.9 121.7
 PS+SuMx(  8192)    1.0    4.0 121.7 [OK]    1.0    4.2 121.7
 PS+SuMx( 16384)    1.0    4.0 121.7 [OK]    1.0    4.1 121.7
 PS+SuMx( 32768)    1.1    4.2 121.7 [OK]    1.1    4.3 121.7
 PS+SuMx( 65536)    1.1    4.3 121.7 [OK]    1.1    4.5 121.7
 PS+SuMx(131072)    1.1    4.4 121.7 [OK]    1.1    4.5 121.7
--- End quote ---

Claggy

PatrickV2:
I ran this on my usual rig (Q6600/8GB/8800GTX) but version 8 added something new, an error. Under WinXP it just shows the error, but under Win7-64 the screen turns black and I get a "driver stopped responding error". Running 260.99.

First the WinXP-32 log:

Device: GeForce 8800 GTX, 1350 MHz clock, 768 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #8 (Sanity Check)
Stock:
 PS+SuMx(     8) [OK]    2.2 GFlops    9.7 GB/s
 PS+SuMx(    16) [OK]    2.6 GFlops   11.1 GB/s
 PS+SuMx(    32) [OK]    2.6 GFlops   10.5 GB/s
 PS+SuMx(    64) [OK]    4.3 GFlops   17.6 GB/s
 PS+SuMx(   128) [OK]    6.7 GFlops   26.9 GB/s
 PS+SuMx(   256) [OK]    9.0 GFlops   36.0 GB/s
 PS+SuMx(   512) [OK]   11.2 GFlops   44.7 GB/s
 PS+SuMx(  1024) [OK]   11.8 GFlops   47.4 GB/s
 PS+SuMx(  2048) [OK]   13.5 GFlops   53.9 GB/s
 PS+SuMx(  4096) [OK]   13.2 GFlops   52.6 GB/s
 PS+SuMx(  8192) [OK]   14.4 GFlops   57.4 GB/s
 PS+SuMx( 16384) [OK]   14.1 GFlops   56.4 GB/s
 PS+SuMx( 32768) [OK]   14.9 GFlops   59.5 GB/s
 PS+SuMx( 65536) [OK]   15.3 GFlops   61.1 GB/s
 PS+SuMx(131072) [OK]   11.9 GFlops   47.7 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    3.6   15.8 121.7 [OK]    6.2   27.2 121.7
 PS+SuMx(    16)    4.5   18.8 121.7 [OK]    6.1   25.5 121.7
 PS+SuMx(    32)    4.9   20.1 121.7 [OK]    5.8   23.8 121.7
 PS+SuMx(    64)
 FAILURE in c:/[Projects]/LunaticsUnited/Tools/Tests/PowerSpectrum/main.cpp, lin
e 456

Then the Win7-64 log:

Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #8 (Sanity Check)
Stock:
 PS+SuMx(     8) [OK]    2.0 GFlops    9.0 GB/s
 PS+SuMx(    16) [OK]    2.4 GFlops   10.2 GB/s
 PS+SuMx(    32) [OK]    2.4 GFlops    9.8 GB/s
 PS+SuMx(    64) [OK]    3.9 GFlops   15.6 GB/s
 PS+SuMx(   128) [OK]    5.7 GFlops   22.8 GB/s
 PS+SuMx(   256) [OK]    7.2 GFlops   28.8 GB/s
 PS+SuMx(   512) [OK]    8.5 GFlops   34.1 GB/s
 PS+SuMx(  1024) [OK]    8.9 GFlops   35.8 GB/s
 PS+SuMx(  2048) [OK]    9.8 GFlops   39.3 GB/s
 PS+SuMx(  4096) [OK]    9.7 GFlops   38.8 GB/s
 PS+SuMx(  8192) [OK]   10.3 GFlops   41.3 GB/s
 PS+SuMx( 16384) [OK]   10.1 GFlops   40.5 GB/s
 PS+SuMx( 32768) [OK]   10.6 GFlops   42.2 GB/s
 PS+SuMx( 65536) [OK]   10.7 GFlops   43.0 GB/s
 PS+SuMx(131072) [OK]    9.0 GFlops   36.0 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    3.4   14.8 121.7 [OK]    6.1   26.8 121.7
 PS+SuMx(    16)    4.2   17.4 121.7 [OK]    6.0   25.3 121.7
 PS+SuMx(    32)    4.6   18.7 121.7 [OK]    5.8   23.7 121.7
 PS+SuMx(    64)
 FAILURE in c:/[Projects]/LunaticsUnited/Tools/Tests/PowerSpectrum/main.cpp, lin
e 456

Regards, Patrick.

SciManStev:
All OK here with GPU RAM at 1975 MHz with 5 runs

Best Stock result

--- Quote --- PS+SuMx( 32768) [OK]   18.7 GFlops   75.0 GB/s
--- End quote ---

Best Opt. 1 result

--- Quote --- PS+SuMx( 32768)   26.8  107.4 121.7 [OK]   37.0  148.1 121.7
--- End quote ---

Steve

_heinz:
very interesting test8 shows for the cards GTX470/480 --> 32768 as best result.
But with slow end cards 131072 is best.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version