Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (37/62) > >>

Miep:

--- Quote from: Jason G on 06 Dec 2010, 09:57:58 am ---Thanks for the tolerances.  Being largely memory bound, the FLops tolerances are more than enough, and indicate +/- 10% variation of worst case on that.  I presume that's driving a display, so that's reasonable.

--- End quote ---

You're welcome - now what exactly makes you think the mobile GPU of a laptop might be driving a display? ;D
No bluescreens with the lastest driver yet - touch wood...

I'll do statistics on all the numbers next time round then.

PatrickV2:
OK, I ran version 6 of the tool on my system (Q6600/8GB/8800GTX) under both WinXP32 as well as Win7-64. If you want me to (re-)run other versions of the tool, let me know. ;)

Both loggings below each-other, first the oldest, WinXP32:

Device: GeForce 8800 GTX, 1350 MHz clock, 768 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   18.3 GFlops   73.1 GB/s 1183.3ulps

 SumMax (    64)    1.3 GFlops    5.5 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    4.3 GFlops   17.6 GB/s


GetPowerSpectrum() choice for Opt1: 64 thrds/block
     64 threads:       18.3 GFlops   73.1 GB/s 121.7ulps


Opt1 (PSmod3+SM): 64 thrds/block
   64 threads, fftlen 64: (worst case: full summax copy)
         6.4 GFlops   26.1 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         8.1 GFlops   32.7 GB/s 121.7ulps



Then Win7-64:

Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   18.1 GFlops   72.5 GB/s 1183.3ulps

 SumMax (    64)    1.1 GFlops    4.8 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    3.8 GFlops   15.4 GB/s


GetPowerSpectrum() choice for Opt1: 64 thrds/block
     64 threads:       18.1 GFlops   72.6 GB/s 121.7ulps


Opt1 (PSmod3+SM): 64 thrds/block
   64 threads, fftlen 64: (worst case: full summax copy)
         5.4 GFlops   21.9 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         6.6 GFlops   26.8 GB/s 121.7ulps


Regards, Patrick.

Jason G:
Ahhh, hi Patrick.  Looks like your card  should still be able to use pinned host memory, but isn't  :( .  It indeed doesn't support mapped memory (a different kind), but didn't engage the pinned memory improvement because I need to change how I detect that feature.  I'm checking the wrong feature flags it looks like.... ooops  ::)

Will make a #7 end of week, and pay special attention to making sure that engages properly on compute capability 1.0 cards (that don't support mapped memory).

Cheers for finding the problem  ;)

PatrickV2:

--- Quote from: Jason G on 06 Dec 2010, 06:19:49 pm ---Ahhh, hi Patrick.  Looks like your card  should still be able to use pinned host memory, but isn't  :( .  It indeed doesn't support mapped memory (a different kind), but didn't engage the pinned memory improvement because I need to change how I detect that feature.  I'm checking the wrong feature flags it looks like.... ooops  ::)

Will make a #7 end of week, and pay special attention to making sure that engages properly on compute capability 1.0 cards (that don't support mapped memory).

Cheers for finding the problem  ;)

--- End quote ---

I have no idea what I did, but you're quite welcome. ;)

Regards, Patrick.

Jason G:
Thanks,

    It's what you (the test #6 anyway) didn't do  :D

This line's missing:

--- Quote ---Opt1 (PSmod3+SM): 64 thrds/block
PowerSpectrumSumMax array pinned in host memory.
   64 threads, fftlen 64: (worst case: full summax copy)
         1.5 GFlops    5.9 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         1.6 GFlops    6.7 GB/s 121.7ulps
--- End quote ---

When operational, that feature seems to add a touch of throughput to both XP & Vista/Win7, and seems to close the performance difference. (we've been so worried about).  You should get a boost when I fix that.

Jason

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version