Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (30/62) > >>

Jason G:
Extra drives on order here ... hopefully will be able to find a floppy disk for XP raid driver install before they arrive  ::).

If you're able to verify (increased XPDM advantage with the heavily optimised kernels, over stock ~10% advantage between driver models) prior to me getting setup, I'll report the increased XPDM<->WDDM speed discrepancy with highly optimised kernels ... Since they may not have factored as much as 30% performance difference into decisions (related to TCC mode).

Ghost0210:
I've managed to scavenge an old drive from an old machine for this test, so have now got a dual-boot machine for a short time ;)
Just downloading and installing the standard drivers to get a baseline for the test -

Stock results on XP Pro x32 260.99 drivers:

Device: GeForce GTX 465, 1215 MHz clock, 1024 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #5
Stock:
 PwrSpec<    64>   16.0 GFlops   63.8 GB/s   0.0ulps

 SumMax (    64)    1.4 GFlops    5.8 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    4.4 GFlops   17.7 GB/s

GetPowerSpectrum() choice for Opt1: 256 thrds/block
    256 threads:       23.0 GFlops   91.9 GB/s 121.7ulps

Opt1 (PSmod3+SM): 256 thrds/block
  256 threads, fftlen 64: (worst case: full summax copy)
         6.7 GFlops   27.2 GB/s 121.7ulps
Every ifft average & peak OK
  256 threads, fftlen 64: (best case, nothing to update)
         8.7 GFlops   35.3 GB/s 121.7ulps

_heinz:
PowerSpectrumxe2011Test5.exe -device 0

Device: GeForce GTX 470, 810 MHz clock, 1248 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #5)
Stock:
 PwrSpec<    64>   11.9 GFlops   47.6 GB/s   0.0ulps

 SumMax (    64)    0.4 GFlops    1.7 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    1.4 GFlops    5.8 GB/s


GetPowerSpectrum() choice for Opt1: 256 thrds/block
    256 threads:       18.5 GFlops   73.8 GB/s 121.7ulps


Opt1 (PSmod3+SM): 256 thrds/block
  256 threads, fftlen 64: (worst case: full summax copy)
         2.1 GFlops    8.3 GB/s 121.7ulps
Every ifft average & peak OK
  256 threads, fftlen 64: (best case, nothing to update)
         2.4 GFlops    9.6 GB/s 121.7ulps


PowerSpectrumxe2011Test5.exe -device 1

Device: GeForce GTX 470, 810 MHz clock, 1249 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #5)
Stock:
 PwrSpec<    64>   11.9 GFlops   47.6 GB/s   0.0ulps

 SumMax (    64)    0.4 GFlops    1.7 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    1.4 GFlops    5.8 GB/s


GetPowerSpectrum() choice for Opt1: 256 thrds/block
    256 threads:       18.3 GFlops   73.3 GB/s 121.7ulps


Opt1 (PSmod3+SM): 256 thrds/block
  256 threads, fftlen 64: (worst case: full summax copy)
         2.1 GFlops    8.4 GB/s 121.7ulps
Every ifft average & peak OK
  256 threads, fftlen 64: (best case, nothing to update)
         2.4 GFlops    9.6 GB/s 121.7ulps


.
Done

Remark:compiled with XE2011

modify:
something must be changed, last Test5 above shows
11.2 GFlops   45.3 GB/s 121.7ulps

in last line

Jason G:

--- Quote from: _heinz on 04 Dec 2010, 02:18:56 pm ---
something must be changed, last Test5 above shows
11.2 GFlops   45.3 GB/s 121.7ulps

--- End quote ---

Yeah, 11.2 is more like what that card should be doing heinz.

Jason G:

--- Quote from: Ghost on 04 Dec 2010, 02:09:09 pm ---Stock results on XP Pro x32 260.99 drivers:
...
 PS+SuMx(    64)    4.4 GFlops   17.7 GB/s
...
  256 threads, fftlen 64: (worst case: full summax copy)
         6.7 GFlops   27.2 GB/s 121.7ulps
...
  256 threads, fftlen 64: (best case, nothing to update)
         8.7 GFlops   35.3 GB/s 121.7ulps

--- End quote ---

OK, so far against your previous results (assuming all else equal), we're back to our roughly ~10% performance advantage to XP:

(XP32-Win7x64)/Win7x64
Stock case: (4.4-4.1)/4.1 = ~7.3 % advantage to XP (expected, not too annoying)
Worst case: (6.7-6.0)/6.0 = ~11.7% advantage to XP ( I can *almost* live with that)
Best case:  (8.7-8.7)/8.7 = ~0.0% advantage to XP (fine)

So there appears to be a greater advantage to XP with the worst case (lot's of memory transfers), though not as great as feared... Phew!  ;D

Since the Memory numbers have more significant digits, and the worst case advantage indicates a memory issue of some sort, I'll compare the throughput figures also:
Stock case: (17.7-16.5)/16.5 = ~7.27% advantage to XP
Worst case: (27.2-24.2)/24.2 = ~12.4% advantage to XP
Best case:  (35.3-35.4)/35.4 = ~0.3% advantage to Win7

Tentative analysis based on above:   Raw compute speed between the two OS/Driver models is roughly the same ('Best Case has no memory transfer of results), however WDDM's memory paging schemes increase overheads for the worst case by up to ~14.2% on that system ( 1/(1-0.124) ).

So memory transfers will have to be minimised in critical kernels.  I can enable a pinned memory optimisation I implemented for integrated GPUs, which might just help the situation.  At least we're not looking at the ~30% difference that had me petrified.

Jason

 

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version