Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (33/62) > >>

Jason G:
Hehe, those ( worst case Opt1) are up a bit ( apart from the 8400M, I suppose unsurprisingly ).  Looks like we found WDDM display driver limitation, and should be able to work around it, with lots of effort.

Claggy:
I also added PowerSpectrum5 results for my 9800GTX+ on Vista 64bit, on page Eight

Claggy

Jason G:
Cheers, yep was looking back there, definitely confirms the use of pinned memory helped Opt1, a bit more than I expected too.

On the XPDM Vs WDDM issue, I've had further confirmation on 8800GTS, from a non-crunching friend, that test #5 Opt1 worst case is faster on XPDM over win7, but roughly same speed in Test #6 (using Pinned Memory).  The 'Best case' is also faster on Win7, so the numbers seem to match up.   Make the code a bit more sophisticated & Win7 performance is ~equal to a bit faster than XP.

I'll be stewing on these additional aspects we've worked out here for a little while, and apply the knowledge to expanded tests with more fft sizes ~end of week.  If that pans out well, it'll be time to start levering in these small improvements into the X series codebase.  After the powerspectrum+reduction is integrated, then will probably be refinement & expansion of the 'freaky powerspectrum' (custom FFT) kernels using the same knowledge.

All this, of course is working towards 'fixing' the problematic puslefinding down the road, and having enough strategies to do so effectively.
(Can't wait for the time when I can ask Berkeley to send VLARs back out to GPUs again  :P)

Jason

Richard Haselgrove:
9800GTX+, Windows 7/32


--- Code: ---Device: GeForce 9800 GTX/9800 GTX+, 1890 MHz clock, 498 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   15.8 GFlops   63.4 GB/s 1183.3ulps

 SumMax (    64)    1.3 GFlops    5.3 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    4.1 GFlops   16.5 GB/s


GetPowerSpectrum() choice for Opt1: 64 thrds/block
     64 threads:       15.9 GFlops   63.7 GB/s 121.7ulps


Opt1 (PSmod3+SM): 64 thrds/block
PowerSpectrumSumMax array pinned in host memory.
   64 threads, fftlen 64: (worst case: full summax copy)
         6.9 GFlops   28.1 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         9.8 GFlops   39.5 GB/s 121.7ulps
--- End code ---

perryjay:
Took a couple of tries but I think I got it right....


Microsoft Windows [Version 6.0.6002]
Copyright (c) 2006 Microsoft Corporation.  All rights reserved.

C:\Users\perry>cd \test

C:\test>powerspectrum6.exe >results.txt
'powerspectrum6.exe' is not recognized as an internal or external command,
operable program or batch file.

C:\test>powerspectrumtest6.exe

Device: GeForce 9500 GT, 1840 MHz clock, 1008 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>    2.8 GFlops   11.3 GB/s 1183.3ulps

 SumMax (    64)    0.4 GFlops    1.9 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    1.2 GFlops    4.9 GB/s


GetPowerSpectrum() choice for Opt1: 64 thrds/block
     64 threads:        2.8 GFlops   11.4 GB/s 121.7ulps


Opt1 (PSmod3+SM): 64 thrds/block
PowerSpectrumSumMax array pinned in host memory.
   64 threads, fftlen 64: (worst case: full summax copy)
         1.9 GFlops    7.6 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         2.0 GFlops    8.2 GB/s 121.7ulps



C:\test>

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version