Forum > GPU crunching
[Split] PowerSpectrum Unit Test
Jason G:
--- Quote from: Claggy on 29 Nov 2010, 01:43:41 pm ---...and from my 128Mb 8400M GS:
--- End quote ---
Analysing both ;)
9800GTX+
Average, peak calcs, thread-count hueristic: OK
worst case speedup: ~33%
best case speedup: ~75%
8400M GS
Average, peak calcs, thread-count hueristic: OK
worst case speedup: ~50% <-- nice
best case speedup: ~50% <-- nice
PatrickV2:
--- Quote from: Jason G on 29 Nov 2010, 01:49:17 pm ---
--- Quote from: PatrickV2 on 29 Nov 2010, 01:44:54 pm ---....Are you also interested in a run under WinXP? ...
--- End quote ---
Sure! it'll be interesting to see if I'm closing the gap, or making it wider ;).
Analysing your first result....
8800GTX
Average, peak calcs, thread-count hueristic: OK
worst case speedup: ~38%
best case speedup: ~69%
--- End quote ---
As requested (Q6600/8GB/8800GTX/WinXP32):
Device: GeForce 8800 GTX, 1350 MHz clock, 768 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #5
Stock:
PwrSpec< 64> 18.3 GFlops 73.1 GB/s 1183.3ulps
SumMax ( 64) 1.3 GFlops 5.5 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 4.3 GFlops 17.5 GB/s
GetPowerSpectrum() choice for Opt1: 64 thrds/block
64 threads: 18.3 GFlops 73.1 GB/s 121.7ulps
Opt1 (PSmod3+SM): 64 thrds/block
64 threads, fftlen 64: (worst case: full summax copy)
6.4 GFlops 25.8 GB/s 121.7ulps
Every ifft average & peak OK
64 threads, fftlen 64: (best case, nothing to update)
7.9 GFlops 32.2 GB/s 121.7ulps
Regards, Patrick.
Jason G:
--- Quote from: PatrickV2 on 29 Nov 2010, 02:13:32 pm ---As requested (Q6600/8GB/8800GTX/WinXP32):
--- End quote ---
8800GTX earlier Win7x64 result:
Average, peak calcs, thread-count hueristic: OK
worst case speedup: ~38% --> 5.4 GFlops
best case speedup: ~69% --> 6.6Gflops
8800GTX XP32 result
Average, peak calcs, thread-count hueristic: OK
worst case speedup: ~48% --> 6.4 GFlops
best case speedup: ~83% --> 7.9 GFlops
Tentative conclusion: in both best and worst cases, with that particular card and these specific hard coded kernels (not overly driver/cuda library dependant), XP32 performance is higher by some 18-19%
That's a lot of difference (more than I expected). Could you let me know both driver versions involved, whether your win7 has aero active, and any other possible differences besides OS ?
(looks like I might end up widening the gap, rather than narrowing it ::))
Jason
PatrickV2:
--- Quote from: Jason G on 29 Nov 2010, 02:25:06 pm ---
--- Quote from: PatrickV2 on 29 Nov 2010, 02:13:32 pm ---As requested (Q6600/8GB/8800GTX/WinXP32):
--- End quote ---
8800GTX earlier Win7x64 result:
Average, peak calcs, thread-count hueristic: OK
worst case speedup: ~38% --> 5.4 GFlops
best case speedup: ~69% --> 6.6Gflops
8800GTX XP32 result
Average, peak calcs, thread-count hueristic: OK
worst case speedup: ~48% --> 6.4 GFlops
best case speedup: ~83% --> 7.9 GFlops
Tentative conculsion: in both best and worst cases, with that particular card and these specific hard coded kernels (not overly driver/cuda library dependant), XP32 performance is higher by some 18-19%
That's a lot of difference (more than I expected). Could you let me know both driver versions involved, whether your win7 has aero active, and any other possible differences besides OS ?
Jason
--- End quote ---
Ah, both OSes have the 260.99 driver installed. Aero was active on Win7-64. There was also a VMWare virtual machine idling on the Win7-machine.
Since I suppose you;d like me to re-run the test on the Win7 machine without Aero and without the VM active, I did :):
Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #5
Stock:
PwrSpec< 64> 18.1 GFlops 72.4 GB/s 1183.3ulps
SumMax ( 64) 1.2 GFlops 4.8 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 3.9 GFlops 15.6 GB/s
GetPowerSpectrum() choice for Opt1: 64 thrds/block
64 threads: 18.2 GFlops 72.7 GB/s 121.7ulps
Opt1 (PSmod3+SM): 64 thrds/block
64 threads, fftlen 64: (worst case: full summax copy)
5.4 GFlops 21.9 GB/s 121.7ulps
Every ifft average & peak OK
64 threads, fftlen 64: (best case, nothing to update)
6.6 GFlops 26.6 GB/s 121.7ulps
Hope this provides some insight.
Regards, Patrick.
Jason G:
--- Quote from: PatrickV2 on 29 Nov 2010, 02:34:11 pm ---Hope this provides some insight.
--- End quote ---
Thanks it does :). Neither aero nor the idling VM appear to have noticeably altered the performance numbers there... so us Win7-adopters appear to be paying the price for our shiny new WDDM driver model ;).
The stock code numbers are interesting too. XP32 @ 4.3GFlops, and Win7x64 @ 3.9-4.1 GFlops ..... looks like the more familiar reported ~10% advantage to XP we've heard about before.
Nice that my tweaking works even faster on XP, but I'm starting to hope MS include some sortof video subsystem fixes in SP1 for Win7x64 :D
[Edit:] Here later in the week, I'll look into if the 64 bitness of the OS is a factor now, though it hasn't shown to be significant before. The WoW64 layer could be slowing things up there somehow, possibly, but best to know for sure.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version