Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (12/62) > >>

PatrickV2:
Busy thread and a lot happening here. My respect. I re-ran the version 4 benchmark again on:

Win7-64/8GB/8800GTX/260.99 drivers:

Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
                PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
     64 threads:       17.8 GFlops    7.1 GB/s 1183.3ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:       14.0 GFlops    5.6 GB/s 121.7ulps
     64 threads:       17.8 GFlops    7.1 GB/s 121.7ulps
    128 threads:       17.8 GFlops    7.1 GB/s 121.7ulps
    256 threads:       17.6 GFlops    7.0 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:        2.9 GFlops    1.1 GB/s 1183.3ulps
     64 threads:        2.9 GFlops    1.2 GB/s 1183.3ulps
    128 threads:        2.9 GFlops    1.1 GB/s 1183.3ulps
    256 threads:        2.9 GFlops    1.1 GB/s 1183.3ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:       14.6 GFlops    5.8 GB/s 121.7ulps
     64 threads:       17.9 GFlops    7.2 GB/s 121.7ulps
    128 threads:       17.7 GFlops    7.1 GB/s 121.7ulps
    256 threads:       17.5 GFlops    7.0 GB/s 121.7ulps
    512 threads:       16.1 GFlops    6.4 GB/s 121.7ulps
   1024 threads: N/A


EDIT: I still have WinXP32 installed on another HD of this machine; are you interested in a run of your tool under that OS?

Regards, Patrick.

Jason G:

--- Quote from: PatrickV2 on 20 Nov 2010, 07:27:55 am ---EDIT: I still have WinXP32 installed on another HD of this machine; are you interested in a run of your tool under that OS?
--- End quote ---

Yes please.  The difference picked up earlier (Thanks Frizz)  between XP32 & XP64 was interesting ( with stock, around 10% advantage to XP32, reduced to ~5% with Mod3 ) .    I've little doubt XP32 has a similar advantage over Win7x64, due to the simpler driver model, but it'd be nice to confirm if the mods close that gap a bit too.

Jason G:

--- Quote from: MarkJ on 20 Nov 2010, 06:38:13 am ---I ran on all the different cards on the farm:

1st up the GT240 (Win7 x64) has 3 cards, the DDR5 variety. Device 0 is slightly slower than 1 and 2, although they are all the same brand/model. Output is from device 0.

Device: GeForce GT 240, 1340 MHz clock, 475 MB memory....
--- End quote ---

Nice to be edging out stock on that stubborn card.  With the rest of your results it's starting to paint a picture that might be easy to handle:

by Compute Capability
  2.0 & 2.1: Mod3 256 thread wins (Significant Boost )
  1.3: Mod3 with 128 threads  ( Very small boost )
 1.0-1.2: Mod3 with 64 threads  (edges out stock by a slim margin sometimes, but seems consistent)

That should be fairly straightforward to follow rules like this for other more important kernels, so I'll make sure I fully understand this behaviour & build kernels with that in mind.

SciManStev:
Test 4 Win 7 64 260.99

Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum Unit Test #4
Stock GetPowerSpectrum<>:
     64 threads:       27.6 GFlops  11.0 GB/s      0.0ulps

GetPowerSpectrum<> mod 1: <made Fermi & Pre-Fermi match in accuracy.>
     32 threads:       17.4 GFlops   7.0 GB/s    121.7ulps
     64 threads:       27.5 GFlops  11.0 GB/s    121.7ulps
    128 threads:       36.4 GFlops  14.5 GB/s    121.7ulps
    256 threads:       39.6 GFlops  15.8 GB/s    121.7ulps

GetPowerSpectrum<> mod 2 <fixed, but slow>:
     32 threads:       18.9 GFlops   7.6 GB/s      0.0ulps
     64 threads:       23.1 GFlops   9.2 GB/s      0.0ulps
    128 threads:       24.1 GFlops   9.6 GB/s      0.0ulps
    256 threads:       22.7 GFlops   9.1 GB/s      0.0ulps

GetPowerSpectrum<> mod 3: <As with mod1, +threads & split loads>
     32 threads:       17.5 GFlops   7.0 GB/s    121.7ulps
     64 threads:       27.6 GFlops  11.0 GB/s    121.7ulps
    128 threads:       36.3 GFlops  14.5 GB/s    121.7ulps
    256 threads:       39.7 GFlops  15.9 GB/s    121.7ulps
    512 threads:       39.2 GFlops  15.7 GB/s    121.7ulps
   1024 threads:       34.7 GFlops  13.9 GB/s    121.7ulps

Steve

perryjay:
Me and my little 9500GT reporting for duty sir but it's time for a little hand holding.I downloaded the package from the first post. I got a DLL and the executable. Where do I put the DLL before I open the EXE?

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version