Forum > GPU crunching
[Split] PowerSpectrum Unit Test
Miep:
--- Quote from: Jason G on 06 Dec 2010, 09:57:58 am ---Thanks for the tolerances. Being largely memory bound, the FLops tolerances are more than enough, and indicate +/- 10% variation of worst case on that. I presume that's driving a display, so that's reasonable.
--- End quote ---
You're welcome - now what exactly makes you think the mobile GPU of a laptop might be driving a display? ;D
No bluescreens with the lastest driver yet - touch wood...
I'll do statistics on all the numbers next time round then.
PatrickV2:
OK, I ran version 6 of the tool on my system (Q6600/8GB/8800GTX) under both WinXP32 as well as Win7-64. If you want me to (re-)run other versions of the tool, let me know. ;)
Both loggings below each-other, first the oldest, WinXP32:
Device: GeForce 8800 GTX, 1350 MHz clock, 768 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
PwrSpec< 64> 18.3 GFlops 73.1 GB/s 1183.3ulps
SumMax ( 64) 1.3 GFlops 5.5 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 4.3 GFlops 17.6 GB/s
GetPowerSpectrum() choice for Opt1: 64 thrds/block
64 threads: 18.3 GFlops 73.1 GB/s 121.7ulps
Opt1 (PSmod3+SM): 64 thrds/block
64 threads, fftlen 64: (worst case: full summax copy)
6.4 GFlops 26.1 GB/s 121.7ulps
Every ifft average & peak OK
64 threads, fftlen 64: (best case, nothing to update)
8.1 GFlops 32.7 GB/s 121.7ulps
Then Win7-64:
Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
PwrSpec< 64> 18.1 GFlops 72.5 GB/s 1183.3ulps
SumMax ( 64) 1.1 GFlops 4.8 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 3.8 GFlops 15.4 GB/s
GetPowerSpectrum() choice for Opt1: 64 thrds/block
64 threads: 18.1 GFlops 72.6 GB/s 121.7ulps
Opt1 (PSmod3+SM): 64 thrds/block
64 threads, fftlen 64: (worst case: full summax copy)
5.4 GFlops 21.9 GB/s 121.7ulps
Every ifft average & peak OK
64 threads, fftlen 64: (best case, nothing to update)
6.6 GFlops 26.8 GB/s 121.7ulps
Regards, Patrick.
Jason G:
Ahhh, hi Patrick. Looks like your card should still be able to use pinned host memory, but isn't :( . It indeed doesn't support mapped memory (a different kind), but didn't engage the pinned memory improvement because I need to change how I detect that feature. I'm checking the wrong feature flags it looks like.... ooops ::)
Will make a #7 end of week, and pay special attention to making sure that engages properly on compute capability 1.0 cards (that don't support mapped memory).
Cheers for finding the problem ;)
PatrickV2:
--- Quote from: Jason G on 06 Dec 2010, 06:19:49 pm ---Ahhh, hi Patrick. Looks like your card should still be able to use pinned host memory, but isn't :( . It indeed doesn't support mapped memory (a different kind), but didn't engage the pinned memory improvement because I need to change how I detect that feature. I'm checking the wrong feature flags it looks like.... ooops ::)
Will make a #7 end of week, and pay special attention to making sure that engages properly on compute capability 1.0 cards (that don't support mapped memory).
Cheers for finding the problem ;)
--- End quote ---
I have no idea what I did, but you're quite welcome. ;)
Regards, Patrick.
Jason G:
Thanks,
It's what you (the test #6 anyway) didn't do :D
This line's missing:
--- Quote ---Opt1 (PSmod3+SM): 64 thrds/block
PowerSpectrumSumMax array pinned in host memory.
64 threads, fftlen 64: (worst case: full summax copy)
1.5 GFlops 5.9 GB/s 121.7ulps
Every ifft average & peak OK
64 threads, fftlen 64: (best case, nothing to update)
1.6 GFlops 6.7 GB/s 121.7ulps
--- End quote ---
When operational, that feature seems to add a touch of throughput to both XP & Vista/Win7, and seems to close the performance difference. (we've been so worried about). You should get a boost when I fix that.
Jason
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version