Forum > GPU crunching
[Split] PowerSpectrum Unit Test
Jason G:
I calculate 122.24 GB/s theoretical max (matching GPU-z listing), so 98.2 seems pretty good. I'll look at what that size is doing & see if I can spread some performance around up in that area.
[Edit:] I get the impression we might be best seeing what streaming those kernels will do sometime soon :-\ too many new fan-dangled features in this stuff ;)
Claggy:
OK here on 9800GTX+ (5 runs) GPU usage up from stock's ~80% to ~95% on Opt1:
Best Stock result:
--- Quote ---PS+SuMx( 65536) [OK] 11.6 GFlops 46.5 GB/s
--- End quote ---
Opt1 Best Result:
--- Quote ---Opt1: 64 thrds/block
worst case best case
GFlps GB/s ulps GFlps GB/s ulps
PS+SuMx( 65536) 13.0 52.1 121.7 [OK] 15.6 62.5 121.7
--- End quote ---
and O.K on 128Mb 8400M GS (5 runs):
--- Quote ---Device: GeForce 8400M GS, 800 MHz clock, 114 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #8 (Sanity Check)
Stock:
PS+SuMx( 8) [OK] 0.3 GFlops 1.3 GB/s
PS+SuMx( 16) [OK] 0.3 GFlops 1.2 GB/s
PS+SuMx( 32) [OK] 0.2 GFlops 0.9 GB/s
PS+SuMx( 64) [OK] 0.4 GFlops 1.5 GB/s
PS+SuMx( 128) [OK] 0.5 GFlops 2.2 GB/s
PS+SuMx( 256) [OK] 0.7 GFlops 2.8 GB/s
PS+SuMx( 512) [OK] 0.8 GFlops 3.4 GB/s
PS+SuMx( 1024) [OK] 0.9 GFlops 3.5 GB/s
PS+SuMx( 2048) [OK] 1.0 GFlops 4.0 GB/s
PS+SuMx( 4096) [OK] 0.9 GFlops 3.7 GB/s
PS+SuMx( 8192) [OK] 1.0 GFlops 4.0 GB/s
PS+SuMx( 16384) [OK] 1.0 GFlops 3.9 GB/s
PS+SuMx( 32768) [OK] 1.0 GFlops 4.1 GB/s
PS+SuMx( 65536) [OK] 1.1 GFlops 4.2 GB/s
PS+SuMx(131072) [OK] 1.1 GFlops 4.3 GB/s
Opt1: 64 thrds/block
worst case best case
GFlps GB/s ulps GFlps GB/s ulps
PS+SuMx( 8) 0.4 1.9 121.7 [OK] 0.5 2.1 121.7
PS+SuMx( 16) 0.4 1.8 121.7 [OK] 0.5 1.9 121.7
PS+SuMx( 32) 0.4 1.7 121.7 [OK] 0.4 1.7 121.7
PS+SuMx( 64) 0.5 2.1 121.7 [OK] 0.5 2.2 121.7
PS+SuMx( 128) 0.6 2.2 121.7 [OK] 0.6 2.3 121.7
PS+SuMx( 256) 0.7 2.9 121.7 [OK] 0.7 3.0 121.7
PS+SuMx( 512) 0.9 3.5 121.7 [OK] 0.9 3.6 121.7
PS+SuMx( 1024) 0.9 3.5 121.7 [OK] 0.9 3.7 121.7
PS+SuMx( 2048) 1.0 4.0 121.7 [OK] 1.0 4.2 121.7
PS+SuMx( 4096) 0.9 3.8 121.7 [OK] 1.0 3.9 121.7
PS+SuMx( 8192) 1.0 4.0 121.7 [OK] 1.0 4.2 121.7
PS+SuMx( 16384) 1.0 4.0 121.7 [OK] 1.0 4.1 121.7
PS+SuMx( 32768) 1.1 4.2 121.7 [OK] 1.1 4.3 121.7
PS+SuMx( 65536) 1.1 4.3 121.7 [OK] 1.1 4.5 121.7
PS+SuMx(131072) 1.1 4.4 121.7 [OK] 1.1 4.5 121.7
--- End quote ---
Claggy
PatrickV2:
I ran this on my usual rig (Q6600/8GB/8800GTX) but version 8 added something new, an error. Under WinXP it just shows the error, but under Win7-64 the screen turns black and I get a "driver stopped responding error". Running 260.99.
First the WinXP-32 log:
Device: GeForce 8800 GTX, 1350 MHz clock, 768 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #8 (Sanity Check)
Stock:
PS+SuMx( 8) [OK] 2.2 GFlops 9.7 GB/s
PS+SuMx( 16) [OK] 2.6 GFlops 11.1 GB/s
PS+SuMx( 32) [OK] 2.6 GFlops 10.5 GB/s
PS+SuMx( 64) [OK] 4.3 GFlops 17.6 GB/s
PS+SuMx( 128) [OK] 6.7 GFlops 26.9 GB/s
PS+SuMx( 256) [OK] 9.0 GFlops 36.0 GB/s
PS+SuMx( 512) [OK] 11.2 GFlops 44.7 GB/s
PS+SuMx( 1024) [OK] 11.8 GFlops 47.4 GB/s
PS+SuMx( 2048) [OK] 13.5 GFlops 53.9 GB/s
PS+SuMx( 4096) [OK] 13.2 GFlops 52.6 GB/s
PS+SuMx( 8192) [OK] 14.4 GFlops 57.4 GB/s
PS+SuMx( 16384) [OK] 14.1 GFlops 56.4 GB/s
PS+SuMx( 32768) [OK] 14.9 GFlops 59.5 GB/s
PS+SuMx( 65536) [OK] 15.3 GFlops 61.1 GB/s
PS+SuMx(131072) [OK] 11.9 GFlops 47.7 GB/s
Opt1: 64 thrds/block
worst case best case
GFlps GB/s ulps GFlps GB/s ulps
PS+SuMx( 8) 3.6 15.8 121.7 [OK] 6.2 27.2 121.7
PS+SuMx( 16) 4.5 18.8 121.7 [OK] 6.1 25.5 121.7
PS+SuMx( 32) 4.9 20.1 121.7 [OK] 5.8 23.8 121.7
PS+SuMx( 64)
FAILURE in c:/[Projects]/LunaticsUnited/Tools/Tests/PowerSpectrum/main.cpp, lin
e 456
Then the Win7-64 log:
Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #8 (Sanity Check)
Stock:
PS+SuMx( 8) [OK] 2.0 GFlops 9.0 GB/s
PS+SuMx( 16) [OK] 2.4 GFlops 10.2 GB/s
PS+SuMx( 32) [OK] 2.4 GFlops 9.8 GB/s
PS+SuMx( 64) [OK] 3.9 GFlops 15.6 GB/s
PS+SuMx( 128) [OK] 5.7 GFlops 22.8 GB/s
PS+SuMx( 256) [OK] 7.2 GFlops 28.8 GB/s
PS+SuMx( 512) [OK] 8.5 GFlops 34.1 GB/s
PS+SuMx( 1024) [OK] 8.9 GFlops 35.8 GB/s
PS+SuMx( 2048) [OK] 9.8 GFlops 39.3 GB/s
PS+SuMx( 4096) [OK] 9.7 GFlops 38.8 GB/s
PS+SuMx( 8192) [OK] 10.3 GFlops 41.3 GB/s
PS+SuMx( 16384) [OK] 10.1 GFlops 40.5 GB/s
PS+SuMx( 32768) [OK] 10.6 GFlops 42.2 GB/s
PS+SuMx( 65536) [OK] 10.7 GFlops 43.0 GB/s
PS+SuMx(131072) [OK] 9.0 GFlops 36.0 GB/s
Opt1: 64 thrds/block
worst case best case
GFlps GB/s ulps GFlps GB/s ulps
PS+SuMx( 8) 3.4 14.8 121.7 [OK] 6.1 26.8 121.7
PS+SuMx( 16) 4.2 17.4 121.7 [OK] 6.0 25.3 121.7
PS+SuMx( 32) 4.6 18.7 121.7 [OK] 5.8 23.7 121.7
PS+SuMx( 64)
FAILURE in c:/[Projects]/LunaticsUnited/Tools/Tests/PowerSpectrum/main.cpp, lin
e 456
Regards, Patrick.
SciManStev:
All OK here with GPU RAM at 1975 MHz with 5 runs
Best Stock result
--- Quote --- PS+SuMx( 32768) [OK] 18.7 GFlops 75.0 GB/s
--- End quote ---
Best Opt. 1 result
--- Quote --- PS+SuMx( 32768) 26.8 107.4 121.7 [OK] 37.0 148.1 121.7
--- End quote ---
Steve
_heinz:
very interesting test8 shows for the cards GTX470/480 --> 32768 as best result.
But with slow end cards 131072 is best.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version