Was worth doing this just to get an XP machine up and running again - although I'm struggling to remember where anything is.....
Update: powerspectrum Test 6, pinned memory- does it improve 'worst case' optimisation on WDDM versus XPDM ?- or does it improve on both OSes the same ? (or neither, Test5 remains for comparison)
Device: GeForce GTX 465, 1215 MHz clock, 1024 MB memory.Compute capability 2.0Compiled with CUDA 3020. PowerSpectrum+summax Unit test #6 (pinned mem)Stock: PwrSpec< 64> 15.8 GFlops 63.3 GB/s 0.0ulps SumMax ( 64) 1.4 GFlops 5.7 GB/sEvery ifft average & peak OK PS+SuMx( 64) 4.3 GFlops 17.5 GB/sGetPowerSpectrum() choice for Opt1: 256 thrds/block 256 threads: 23.1 GFlops 92.4 GB/s 121.7ulps Opt1 (PSmod3+SM): 256 thrds/blockPowerSpectrumSumMax array pinned in host memory. 256 threads, fftlen 64: (worst case: full summax copy) 7.6 GFlops 30.6 GB/s 121.7ulpsEvery ifft average & peak OK 256 threads, fftlen 64: (best case, nothing to update) 8.7 GFlops 35.3 GB/s 121.7ulps
Device: GeForce GTX 465, 1215 MHz clock, 994 MB memory.Compute capability 2.0Compiled with CUDA 3020. PowerSpectrum+summax Unit test #6 (pinned mem)Stock: PwrSpec< 64> 17.3 GFlops 69.2 GB/s 0.0ulps SumMax ( 64) 1.2 GFlops 5.2 GB/sEvery ifft average & peak OK PS+SuMx( 64) 4.0 GFlops 16.3 GB/sGetPowerSpectrum() choice for Opt1: 256 thrds/block 256 threads: 27.5 GFlops 110.0 GB/s 121.7ulps Opt1 (PSmod3+SM): 256 thrds/blockPowerSpectrumSumMax array pinned in host memory. 256 threads, fftlen 64: (worst case: full summax copy) 7.2 GFlops 29.2 GB/s 121.7ulpsEvery ifft average & peak OK 256 threads, fftlen 64: (best case, nothing to update) 9.2 GFlops 37.3 GB/s 121.7ulps
Stock case: (17.7-16.5)/16.5 = ~7.27% advantage to XPWorst case: (27.2-24.2)/24.2 = ~12.4% advantage to XPBest case: (35.3-35.4)/35.4 = ~0.3% advantage to Win7
Device: GeForce 9800 GTX/9800 GTX+, 1900 MHz clock, 496 MB memory.Compute capability 1.1Compiled with CUDA 3020. PowerSpectrum+summax Unit test #6 (pinned mem)Stock: PwrSpec< 64> 16.1 GFlops 64.6 GB/s 1183.3ulps SumMax ( 64) 1.4 GFlops 6.0 GB/sEvery ifft average & peak OK PS+SuMx( 64) 4.5 GFlops 18.3 GB/sGetPowerSpectrum() choice for Opt1: 64 thrds/block 64 threads: 16.2 GFlops 64.8 GB/s 121.7ulpsOpt1 (PSmod3+SM): 64 thrds/blockPowerSpectrumSumMax array pinned in host memory. 64 threads, fftlen 64: (worst case: full summax copy) 7.1 GFlops 28.7 GB/s 121.7ulpsEvery ifft average & peak OK 64 threads, fftlen 64: (best case, nothing to update) 9.9 GFlops 40.0 GB/s 121.7ulps
Device: GeForce 9800 GTX/9800 GTX+, 1900 MHz clock, 496 MB memory.Compute capability 1.1Compiled with CUDA 3020. PowerSpectrum+summax Unit test #6 (pinned mem)Stock: PwrSpec< 64> 16.1 GFlops 64.3 GB/s 1183.3ulps SumMax ( 64) 1.4 GFlops 5.8 GB/sEvery ifft average & peak OK PS+SuMx( 64) 4.4 GFlops 17.8 GB/sGetPowerSpectrum() choice for Opt1: 64 thrds/block 64 threads: 16.2 GFlops 64.7 GB/s 121.7ulpsOpt1 (PSmod3+SM): 64 thrds/blockPowerSpectrumSumMax array pinned in host memory. 64 threads, fftlen 64: (worst case: full summax copy) 6.9 GFlops 27.8 GB/s 121.7ulpsEvery ifft average & peak OK 64 threads, fftlen 64: (best case, nothing to update) 9.9 GFlops 39.9 GB/s 121.7ulps
Device: GeForce 8400M GS, 800 MHz clock, 114 MB memory.Compute capability 1.1Compiled with CUDA 3020. PowerSpectrum+summax Unit test #6 (pinned mem)Stock: PwrSpec< 64> 1.2 GFlops 4.8 GB/s 1183.3ulps SumMax ( 64) 0.1 GFlops 0.5 GB/sEvery ifft average & peak OK PS+SuMx( 64) 0.4 GFlops 1.5 GB/sGetPowerSpectrum() choice for Opt1: 64 thrds/block 64 threads: 1.2 GFlops 4.8 GB/s 121.7ulpsOpt1 (PSmod3+SM): 64 thrds/blockPowerSpectrumSumMax array pinned in host memory. 64 threads, fftlen 64: (worst case: full summax copy) 0.6 GFlops 2.5 GB/s 121.7ulpsEvery ifft average & peak OK 64 threads, fftlen 64: (best case, nothing to update) 0.6 GFlops 2.6 GB/s 121.7ulps
Device: GeForce 9800 GTX/9800 GTX+, 1890 MHz clock, 498 MB memory.Compute capability 1.1Compiled with CUDA 3020. PowerSpectrum+summax Unit test #6 (pinned mem)Stock: PwrSpec< 64> 15.8 GFlops 63.4 GB/s 1183.3ulps SumMax ( 64) 1.3 GFlops 5.3 GB/sEvery ifft average & peak OK PS+SuMx( 64) 4.1 GFlops 16.5 GB/sGetPowerSpectrum() choice for Opt1: 64 thrds/block 64 threads: 15.9 GFlops 63.7 GB/s 121.7ulps Opt1 (PSmod3+SM): 64 thrds/blockPowerSpectrumSumMax array pinned in host memory. 64 threads, fftlen 64: (worst case: full summax copy) 6.9 GFlops 28.1 GB/s 121.7ulpsEvery ifft average & peak OK 64 threads, fftlen 64: (best case, nothing to update) 9.8 GFlops 39.5 GB/s 121.7ulps