PS+SuMx( 32768) [OK] 12.2 GFlops 48.7 GB/s
Opt1: 256 thrds/block worst case best case GFlps GB/s ulps GFlps GB/s ulps PS+SuMx( 32768) 17.7 71.0 121.7 [OK] 24.6 98.2 121.7
Hey Ghost, what's the memory bus width & memory clock on that 465 ?
PS+SuMx( 32768) 17.7 71.0 121.7 [OK] 24.6 98.2 121.7
Quote from: Ghost on 23 Dec 2010, 01:23:33 pm PS+SuMx( 32768) 17.7 71.0 121.7 [OK] 24.6 98.2 121.7Hmm this *could* be near max theoretical then ... checking
PS+SuMx( 65536) [OK] 11.6 GFlops 46.5 GB/s
Opt1: 64 thrds/block worst case best case GFlps GB/s ulps GFlps GB/s ulpsPS+SuMx( 65536) 13.0 52.1 121.7 [OK] 15.6 62.5 121.7
Device: GeForce 8400M GS, 800 MHz clock, 114 MB memory.Compute capability 1.1Compiled with CUDA 3020. PowerSpectrum+summax Unit test #8 (Sanity Check)Stock: PS+SuMx( 8) [OK] 0.3 GFlops 1.3 GB/s PS+SuMx( 16) [OK] 0.3 GFlops 1.2 GB/s PS+SuMx( 32) [OK] 0.2 GFlops 0.9 GB/s PS+SuMx( 64) [OK] 0.4 GFlops 1.5 GB/s PS+SuMx( 128) [OK] 0.5 GFlops 2.2 GB/s PS+SuMx( 256) [OK] 0.7 GFlops 2.8 GB/s PS+SuMx( 512) [OK] 0.8 GFlops 3.4 GB/s PS+SuMx( 1024) [OK] 0.9 GFlops 3.5 GB/s PS+SuMx( 2048) [OK] 1.0 GFlops 4.0 GB/s PS+SuMx( 4096) [OK] 0.9 GFlops 3.7 GB/s PS+SuMx( 8192) [OK] 1.0 GFlops 4.0 GB/s PS+SuMx( 16384) [OK] 1.0 GFlops 3.9 GB/s PS+SuMx( 32768) [OK] 1.0 GFlops 4.1 GB/s PS+SuMx( 65536) [OK] 1.1 GFlops 4.2 GB/s PS+SuMx(131072) [OK] 1.1 GFlops 4.3 GB/sOpt1: 64 thrds/block worst case best case GFlps GB/s ulps GFlps GB/s ulps PS+SuMx( 8) 0.4 1.9 121.7 [OK] 0.5 2.1 121.7 PS+SuMx( 16) 0.4 1.8 121.7 [OK] 0.5 1.9 121.7 PS+SuMx( 32) 0.4 1.7 121.7 [OK] 0.4 1.7 121.7 PS+SuMx( 64) 0.5 2.1 121.7 [OK] 0.5 2.2 121.7 PS+SuMx( 128) 0.6 2.2 121.7 [OK] 0.6 2.3 121.7 PS+SuMx( 256) 0.7 2.9 121.7 [OK] 0.7 3.0 121.7 PS+SuMx( 512) 0.9 3.5 121.7 [OK] 0.9 3.6 121.7 PS+SuMx( 1024) 0.9 3.5 121.7 [OK] 0.9 3.7 121.7 PS+SuMx( 2048) 1.0 4.0 121.7 [OK] 1.0 4.2 121.7 PS+SuMx( 4096) 0.9 3.8 121.7 [OK] 1.0 3.9 121.7 PS+SuMx( 8192) 1.0 4.0 121.7 [OK] 1.0 4.2 121.7 PS+SuMx( 16384) 1.0 4.0 121.7 [OK] 1.0 4.1 121.7 PS+SuMx( 32768) 1.1 4.2 121.7 [OK] 1.1 4.3 121.7 PS+SuMx( 65536) 1.1 4.3 121.7 [OK] 1.1 4.5 121.7 PS+SuMx(131072) 1.1 4.4 121.7 [OK] 1.1 4.5 121.7
PS+SuMx( 32768) [OK] 18.7 GFlops 75.0 GB/s
PS+SuMx( 32768) 26.8 107.4 121.7 [OK] 37.0 148.1 121.7
PS+SuMx( 65536) [OK] 12.4 GFlops 49.6 GB/s
PS+SuMx( 65536) 16.6 66.4 121.7 [OK] 17.7 70.7 121.7
PS+SuMx( 64) FAILURE in c:/[Projects]/LunaticsUnited/Tools/Tests/PowerSpectrum/main.cpp, line 456
Quote from: PatrickV2 on 23 Dec 2010, 04:06:32 pm PS+SuMx( 64) FAILURE in c:/[Projects]/LunaticsUnited/Tools/Tests/PowerSpectrum/main.cpp, line 456Wow Patrick, clearly something I'm doing in size 64 has changed (and only appears on cc1.0 ), will check. we're going to need to fix that before moving on.[Later:] @Patrick: when you can, please reboot & try the attached fix attempt ( for compute cap 1.0)... If OK on that card I'll be able to avoid breaking that again...
Device: GeForce 8800 GTX, 1350 MHz clock, 768 MB memory.Compute capability 1.0Compiled with CUDA 3020. PowerSpectrum+summax Unit test #8 (Sanity Check)Stock: PS+SuMx( 8) [OK] 2.2 GFlops 9.7 GB/s PS+SuMx( 16) [OK] 2.6 GFlops 11.1 GB/s PS+SuMx( 32) [OK] 2.6 GFlops 10.5 GB/s PS+SuMx( 64) [OK] 4.3 GFlops 17.6 GB/s PS+SuMx( 128) [OK] 6.7 GFlops 26.9 GB/s PS+SuMx( 256) [OK] 9.0 GFlops 36.0 GB/s PS+SuMx( 512) [OK] 11.2 GFlops 44.7 GB/s PS+SuMx( 1024) [OK] 11.8 GFlops 47.4 GB/s PS+SuMx( 2048) [OK] 13.5 GFlops 53.9 GB/s PS+SuMx( 4096) [OK] 13.2 GFlops 52.6 GB/s PS+SuMx( 8192) [OK] 14.4 GFlops 57.5 GB/s PS+SuMx( 16384) [OK] 14.1 GFlops 56.5 GB/s PS+SuMx( 32768) [OK] 14.9 GFlops 59.5 GB/s PS+SuMx( 65536) [OK] 15.3 GFlops 61.2 GB/s PS+SuMx(131072) [OK] 12.0 GFlops 47.8 GB/sOpt1: 64 thrds/block worst case best case GFlps GB/s ulps GFlps GB/s ulps PS+SuMx( 8) 3.6 15.8 121.7 [OK] 6.2 27.2 121.7 PS+SuMx( 16) 4.5 18.8 121.7 [OK] 6.1 25.5 121.7 PS+SuMx( 32) 4.9 20.1 121.7 [OK] 5.8 23.8 121.7 PS+SuMx( 64) 6.5 26.5 121.7 [OK] 7.4 30.0 121.7 PS+SuMx( 128) 7.2 28.8 121.7 [OK] 7.8 31.3 121.7 PS+SuMx( 256) 9.4 37.8 121.7 [OK] 10.2 40.7 121.7 PS+SuMx( 512) 11.6 46.3 121.7 [OK] 12.4 49.7 121.7 PS+SuMx( 1024) 12.1 48.5 121.7 [OK] 12.9 51.6 121.7 PS+SuMx( 2048) 13.7 54.9 121.7 [OK] 14.6 58.5 121.7 PS+SuMx( 4096) 13.4 53.5 121.7 [OK] 14.2 56.8 121.7 PS+SuMx( 8192) 14.5 58.2 121.7 [OK] 15.5 62.0 121.7 PS+SuMx( 16384) 14.3 57.1 121.7 [OK] 15.2 60.9 121.7 PS+SuMx( 32768) 15.1 60.3 121.7 [OK] 16.1 64.4 121.7 PS+SuMx( 65536) 15.5 62.0 121.7 [OK] 16.5 66.2 121.7 PS+SuMx(131072) 12.1 48.2 121.7 [OK] 12.7 50.8 121.7
Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.Compute capability 1.0Compiled with CUDA 3020. PowerSpectrum+summax Unit test #8 (Sanity Check)Stock: PS+SuMx( 8) [OK] 2.0 GFlops 8.7 GB/s PS+SuMx( 16) [OK] 2.4 GFlops 10.2 GB/s PS+SuMx( 32) [OK] 2.4 GFlops 9.7 GB/s PS+SuMx( 64) [OK] 3.9 GFlops 15.8 GB/s PS+SuMx( 128) [OK] 5.6 GFlops 22.7 GB/s PS+SuMx( 256) [OK] 7.2 GFlops 29.0 GB/s PS+SuMx( 512) [OK] 8.7 GFlops 34.7 GB/s PS+SuMx( 1024) [OK] 9.0 GFlops 36.0 GB/s PS+SuMx( 2048) [OK] 10.0 GFlops 40.1 GB/s PS+SuMx( 4096) [OK] 9.8 GFlops 39.0 GB/s PS+SuMx( 8192) [OK] 10.4 GFlops 41.6 GB/s PS+SuMx( 16384) [OK] 10.2 GFlops 40.7 GB/s PS+SuMx( 32768) [OK] 10.8 GFlops 43.2 GB/s PS+SuMx( 65536) [OK] 10.9 GFlops 43.6 GB/s PS+SuMx(131072) [OK] 9.0 GFlops 36.1 GB/sOpt1: 64 thrds/block worst case best case GFlps GB/s ulps GFlps GB/s ulps PS+SuMx( 8) 3.4 14.9 121.7 [OK] 6.1 26.8 121.7 PS+SuMx( 16) 4.2 17.6 121.7 [OK] 6.1 25.4 121.7 PS+SuMx( 32) 4.6 18.7 121.7 [OK] 5.8 23.7 121.7 PS+SuMx( 64) 6.0 24.2 121.7 [OK] 7.3 29.4 121.7 PS+SuMx( 128) 6.5 26.0 121.7 [OK] 7.7 31.1 121.7 PS+SuMx( 256) 8.3 33.3 121.7 [OK] 10.1 40.4 121.7 PS+SuMx( 512) 9.9 39.8 121.7 [OK] 12.3 49.4 121.7 PS+SuMx( 1024) 10.2 40.8 121.7 [OK] 12.8 51.3 121.7 PS+SuMx( 2048) 11.3 45.2 121.7 [OK] 14.5 58.2 121.7 PS+SuMx( 4096) 11.2 44.6 121.7 [OK] 14.1 56.3 121.7 PS+SuMx( 8192) 12.1 48.3 121.7 [OK] 15.4 61.5 121.7 PS+SuMx( 16384) 11.7 46.8 121.7 [OK] 15.1 60.4 121.7 PS+SuMx( 32768) 12.2 48.8 121.7 [OK] 16.0 63.8 121.7 PS+SuMx( 65536) 12.5 50.0 121.7 [OK] 16.4 65.8 121.7 PS+SuMx(131072) 10.1 40.5 121.7 [OK] 12.6 50.5 121.7