Okay, here's test 8. Figured it would be better for me to post it rather than try to explain what I don't understand. :8
Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.
C:\Users\perry>cd\test
C:\test> powerspectrumtest8.exe
Device: GeForce 9500 GT, 1848 MHz clock, 1006 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #8 (Sanity Check)
Stock:
PS+SuMx(
[OK] 0.7 GFlops 3.1 GB/s
PS+SuMx( 16) [OK] 0.8 GFlops 3.2 GB/s
PS+SuMx( 32) [OK] 0.7 GFlops 3.0 GB/s
PS+SuMx( 64) [OK] 1.0 GFlops 4.2 GB/s
PS+SuMx( 128) [OK] 0.8 GFlops 3.4 GB/s
PS+SuMx( 256) [OK] 1.6 GFlops 6.6 GB/s
PS+SuMx( 512) [OK] 2.0 GFlops 7.8 GB/s
PS+SuMx( 1024) [OK] 2.1 GFlops 8.2 GB/s
PS+SuMx( 2048) [OK] 2.1 GFlops 8.2 GB/s
PS+SuMx( 4096) [OK] 2.0 GFlops 8.1 GB/s
PS+SuMx( 8192) [OK] 2.1 GFlops 8.4 GB/s
PS+SuMx( 16384) [OK] 2.1 GFlops 8.4 GB/s
PS+SuMx( 32768) [OK] 0.5 GFlops 1.9 GB/s
PS+SuMx( 65536) [OK] 0.4 GFlops 1.5 GB/s
PS+SuMx(131072) [OK] 2.1 GFlops 8.5 GB/s
Opt1: 64 thrds/block
worst case best case
GFlps GB/s ulps GFlps GB/s ulps
PS+SuMx(
1.1 4.8 121.7 [OK] 1.5 6.8 121.7
PS+SuMx( 16) 1.2 5.0 121.7 [OK] 1.7 6.9 121.7
PS+SuMx( 32) 1.2 5.0 121.7 [OK] 1.5 6.1 121.7
PS+SuMx( 64) 0.5 1.9 121.7 [OK] 1.7 7.1 121.7
PS+SuMx( 128) 0.6 2.5 121.7 [OK] 1.8 7.2 121.7
PS+SuMx( 256) 0.6 2.3 121.7 [OK] 2.1 8.3 121.7
PS+SuMx( 512) 2.0 8.1 121.7 [OK] 2.5 10.1 121.7
PS+SuMx( 1024) 1.9 7.8 121.7 [OK] 2.6 10.3 121.7
PS+SuMx( 2048) 2.1 8.6 121.7 [OK] 2.6 10.3 121.7
PS+SuMx( 4096) 0.5 2.1 121.7 [OK] 2.5 10.0 121.7
PS+SuMx( 8192) 2.2 8.7 121.7 [OK] 2.8 11.1 121.7
PS+SuMx( 16384) 2.1 8.2 121.7 [OK] 2.7 10.9 121.7
PS+SuMx( 32768) 2.2 8.8 121.7 [OK] 2.8 11.1 121.7
PS+SuMx( 65536) 2.2 8.9 121.7 [OK] 2.8 11.2 121.7
PS+SuMx(131072) 2.3 9.2 121.7 [OK] 2.8 11.3 121.7
C:\test>