Forum > GPU crunching
[Split] PowerSpectrum Unit Test
Jason G:
Hehe, those ( worst case Opt1) are up a bit ( apart from the 8400M, I suppose unsurprisingly ). Looks like we found WDDM display driver limitation, and should be able to work around it, with lots of effort.
Claggy:
I also added PowerSpectrum5 results for my 9800GTX+ on Vista 64bit, on page Eight
Claggy
Jason G:
Cheers, yep was looking back there, definitely confirms the use of pinned memory helped Opt1, a bit more than I expected too.
On the XPDM Vs WDDM issue, I've had further confirmation on 8800GTS, from a non-crunching friend, that test #5 Opt1 worst case is faster on XPDM over win7, but roughly same speed in Test #6 (using Pinned Memory). The 'Best case' is also faster on Win7, so the numbers seem to match up. Make the code a bit more sophisticated & Win7 performance is ~equal to a bit faster than XP.
I'll be stewing on these additional aspects we've worked out here for a little while, and apply the knowledge to expanded tests with more fft sizes ~end of week. If that pans out well, it'll be time to start levering in these small improvements into the X series codebase. After the powerspectrum+reduction is integrated, then will probably be refinement & expansion of the 'freaky powerspectrum' (custom FFT) kernels using the same knowledge.
All this, of course is working towards 'fixing' the problematic puslefinding down the road, and having enough strategies to do so effectively.
(Can't wait for the time when I can ask Berkeley to send VLARs back out to GPUs again :P)
Jason
Richard Haselgrove:
9800GTX+, Windows 7/32
--- Code: ---Device: GeForce 9800 GTX/9800 GTX+, 1890 MHz clock, 498 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
PwrSpec< 64> 15.8 GFlops 63.4 GB/s 1183.3ulps
SumMax ( 64) 1.3 GFlops 5.3 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 4.1 GFlops 16.5 GB/s
GetPowerSpectrum() choice for Opt1: 64 thrds/block
64 threads: 15.9 GFlops 63.7 GB/s 121.7ulps
Opt1 (PSmod3+SM): 64 thrds/block
PowerSpectrumSumMax array pinned in host memory.
64 threads, fftlen 64: (worst case: full summax copy)
6.9 GFlops 28.1 GB/s 121.7ulps
Every ifft average & peak OK
64 threads, fftlen 64: (best case, nothing to update)
9.8 GFlops 39.5 GB/s 121.7ulps
--- End code ---
perryjay:
Took a couple of tries but I think I got it right....
Microsoft Windows [Version 6.0.6002]
Copyright (c) 2006 Microsoft Corporation. All rights reserved.
C:\Users\perry>cd \test
C:\test>powerspectrum6.exe >results.txt
'powerspectrum6.exe' is not recognized as an internal or external command,
operable program or batch file.
C:\test>powerspectrumtest6.exe
Device: GeForce 9500 GT, 1840 MHz clock, 1008 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
PwrSpec< 64> 2.8 GFlops 11.3 GB/s 1183.3ulps
SumMax ( 64) 0.4 GFlops 1.9 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 1.2 GFlops 4.9 GB/s
GetPowerSpectrum() choice for Opt1: 64 thrds/block
64 threads: 2.8 GFlops 11.4 GB/s 121.7ulps
Opt1 (PSmod3+SM): 64 thrds/block
PowerSpectrumSumMax array pinned in host memory.
64 threads, fftlen 64: (worst case: full summax copy)
1.9 GFlops 7.6 GB/s 121.7ulps
Every ifft average & peak OK
64 threads, fftlen 64: (best case, nothing to update)
2.0 GFlops 8.2 GB/s 121.7ulps
C:\test>
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version