Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (24/62) > >>

Jason G:

--- Quote from: Claggy on 29 Nov 2010, 01:43:41 pm ---...and from my 128Mb 8400M GS:
--- End quote ---

Analysing both ;)


9800GTX+
    Average, peak calcs, thread-count hueristic: OK
    worst case speedup: ~33%
    best case speedup: ~75%

8400M GS
    Average, peak calcs, thread-count hueristic: OK
    worst case speedup: ~50%  <-- nice
    best case speedup:  ~50%   <-- nice



PatrickV2:

--- Quote from: Jason G on 29 Nov 2010, 01:49:17 pm ---
--- Quote from: PatrickV2 on 29 Nov 2010, 01:44:54 pm ---....Are you also interested in a run under WinXP? ...
--- End quote ---

Sure! it'll be interesting to see if I'm closing the gap, or making it wider  ;).

Analysing your first result....

8800GTX
    Average, peak calcs, thread-count hueristic: OK
    worst case speedup: ~38%
    best case speedup: ~69%

--- End quote ---

As requested (Q6600/8GB/8800GTX/WinXP32):


Device: GeForce 8800 GTX, 1350 MHz clock, 768 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #5
Stock:
 PwrSpec<    64>   18.3 GFlops   73.1 GB/s 1183.3ulps

 SumMax (    64)    1.3 GFlops    5.5 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    4.3 GFlops   17.5 GB/s


GetPowerSpectrum() choice for Opt1: 64 thrds/block
     64 threads:       18.3 GFlops   73.1 GB/s 121.7ulps


Opt1 (PSmod3+SM): 64 thrds/block
   64 threads, fftlen 64: (worst case: full summax copy)
         6.4 GFlops   25.8 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         7.9 GFlops   32.2 GB/s 121.7ulps


Regards, Patrick.

Jason G:

--- Quote from: PatrickV2 on 29 Nov 2010, 02:13:32 pm ---As requested (Q6600/8GB/8800GTX/WinXP32):

--- End quote ---

8800GTX earlier Win7x64 result:
    Average, peak calcs, thread-count hueristic: OK
    worst case speedup: ~38% --> 5.4 GFlops
    best case speedup: ~69%  -->  6.6Gflops

8800GTX XP32 result
    Average, peak calcs, thread-count hueristic: OK
    worst case speedup: ~48%   --> 6.4 GFlops
    best case speedup: ~83%    --> 7.9 GFlops


Tentative conclusion: in both best and worst cases, with that particular card and these specific hard coded kernels (not overly driver/cuda library dependant), XP32 performance is higher by some 18-19%

That's a lot of difference (more than I expected).  Could you let me know both driver versions involved, whether your win7 has aero active, and any other possible differences besides OS ?

(looks like I might end up widening the gap, rather than narrowing it  ::))
Jason

PatrickV2:

--- Quote from: Jason G on 29 Nov 2010, 02:25:06 pm ---
--- Quote from: PatrickV2 on 29 Nov 2010, 02:13:32 pm ---As requested (Q6600/8GB/8800GTX/WinXP32):

--- End quote ---

8800GTX earlier Win7x64 result:
    Average, peak calcs, thread-count hueristic: OK
    worst case speedup: ~38% --> 5.4 GFlops
    best case speedup: ~69%  -->  6.6Gflops

8800GTX XP32 result
    Average, peak calcs, thread-count hueristic: OK
    worst case speedup: ~48%   --> 6.4 GFlops
    best case speedup: ~83%    --> 7.9 GFlops


Tentative conculsion: in both best and worst cases, with that particular card and these specific hard coded kernels (not overly driver/cuda library dependant), XP32 performance is higher by some 18-19%

That's a lot of difference (more than I expected).  Could you let me know both driver versions involved, whether your win7 has aero active, and any other possible differences besides OS ?

Jason

--- End quote ---

Ah, both OSes have the 260.99 driver installed. Aero was active on Win7-64. There was also a VMWare virtual machine idling on the Win7-machine.

Since I suppose you;d like me to re-run the test on the Win7 machine without Aero and without the VM active, I did :):


Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #5
Stock:
 PwrSpec<    64>   18.1 GFlops   72.4 GB/s 1183.3ulps

 SumMax (    64)    1.2 GFlops    4.8 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    3.9 GFlops   15.6 GB/s


GetPowerSpectrum() choice for Opt1: 64 thrds/block
     64 threads:       18.2 GFlops   72.7 GB/s 121.7ulps


Opt1 (PSmod3+SM): 64 thrds/block
   64 threads, fftlen 64: (worst case: full summax copy)
         5.4 GFlops   21.9 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         6.6 GFlops   26.6 GB/s 121.7ulps


Hope this provides some insight.

Regards, Patrick.

Jason G:

--- Quote from: PatrickV2 on 29 Nov 2010, 02:34:11 pm ---Hope this provides some insight.

--- End quote ---

Thanks it does  :).  Neither aero nor the idling VM appear to have noticeably altered the performance numbers there... so us Win7-adopters  appear to be paying the price for our shiny new WDDM driver model  ;).

The stock code numbers are interesting too.  XP32 @ 4.3GFlops, and Win7x64 @ 3.9-4.1 GFlops ..... looks like the more familiar reported ~10% advantage to XP we've heard about before.

Nice that my tweaking works even faster on XP, but I'm starting to hope MS include some sortof video subsystem fixes in SP1 for Win7x64  :D

[Edit:] Here later in the week, I'll look into if the 64 bitness of the OS is a factor now, though it hasn't shown to be significant before.  The WoW64 layer could be slowing  things up there somehow, possibly, but best to know for sure.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version