Forum > GPU crunching
[Split] PowerSpectrum Unit Test
PatrickV2:
Busy thread and a lot happening here. My respect. I re-ran the version 4 benchmark again on:
Win7-64/8GB/8800GTX/260.99 drivers:
Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum Unit Test #4
Stock GetPowerSpectrum():
64 threads: 17.8 GFlops 7.1 GB/s 1183.3ulps
GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
32 threads: 14.0 GFlops 5.6 GB/s 121.7ulps
64 threads: 17.8 GFlops 7.1 GB/s 121.7ulps
128 threads: 17.8 GFlops 7.1 GB/s 121.7ulps
256 threads: 17.6 GFlops 7.0 GB/s 121.7ulps
GetPowerSpectrum() mod 2 (fixed, but slow):
32 threads: 2.9 GFlops 1.1 GB/s 1183.3ulps
64 threads: 2.9 GFlops 1.2 GB/s 1183.3ulps
128 threads: 2.9 GFlops 1.1 GB/s 1183.3ulps
256 threads: 2.9 GFlops 1.1 GB/s 1183.3ulps
GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
32 threads: 14.6 GFlops 5.8 GB/s 121.7ulps
64 threads: 17.9 GFlops 7.2 GB/s 121.7ulps
128 threads: 17.7 GFlops 7.1 GB/s 121.7ulps
256 threads: 17.5 GFlops 7.0 GB/s 121.7ulps
512 threads: 16.1 GFlops 6.4 GB/s 121.7ulps
1024 threads: N/A
EDIT: I still have WinXP32 installed on another HD of this machine; are you interested in a run of your tool under that OS?
Regards, Patrick.
Jason G:
--- Quote from: PatrickV2 on 20 Nov 2010, 07:27:55 am ---EDIT: I still have WinXP32 installed on another HD of this machine; are you interested in a run of your tool under that OS?
--- End quote ---
Yes please. The difference picked up earlier (Thanks Frizz) between XP32 & XP64 was interesting ( with stock, around 10% advantage to XP32, reduced to ~5% with Mod3 ) . I've little doubt XP32 has a similar advantage over Win7x64, due to the simpler driver model, but it'd be nice to confirm if the mods close that gap a bit too.
Jason G:
--- Quote from: MarkJ on 20 Nov 2010, 06:38:13 am ---I ran on all the different cards on the farm:
1st up the GT240 (Win7 x64) has 3 cards, the DDR5 variety. Device 0 is slightly slower than 1 and 2, although they are all the same brand/model. Output is from device 0.
Device: GeForce GT 240, 1340 MHz clock, 475 MB memory....
--- End quote ---
Nice to be edging out stock on that stubborn card. With the rest of your results it's starting to paint a picture that might be easy to handle:
by Compute Capability
2.0 & 2.1: Mod3 256 thread wins (Significant Boost )
1.3: Mod3 with 128 threads ( Very small boost )
1.0-1.2: Mod3 with 64 threads (edges out stock by a slim margin sometimes, but seems consistent)
That should be fairly straightforward to follow rules like this for other more important kernels, so I'll make sure I fully understand this behaviour & build kernels with that in mind.
SciManStev:
Test 4 Win 7 64 260.99
Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
PowerSpectrum Unit Test #4
Stock GetPowerSpectrum<>:
64 threads: 27.6 GFlops 11.0 GB/s 0.0ulps
GetPowerSpectrum<> mod 1: <made Fermi & Pre-Fermi match in accuracy.>
32 threads: 17.4 GFlops 7.0 GB/s 121.7ulps
64 threads: 27.5 GFlops 11.0 GB/s 121.7ulps
128 threads: 36.4 GFlops 14.5 GB/s 121.7ulps
256 threads: 39.6 GFlops 15.8 GB/s 121.7ulps
GetPowerSpectrum<> mod 2 <fixed, but slow>:
32 threads: 18.9 GFlops 7.6 GB/s 0.0ulps
64 threads: 23.1 GFlops 9.2 GB/s 0.0ulps
128 threads: 24.1 GFlops 9.6 GB/s 0.0ulps
256 threads: 22.7 GFlops 9.1 GB/s 0.0ulps
GetPowerSpectrum<> mod 3: <As with mod1, +threads & split loads>
32 threads: 17.5 GFlops 7.0 GB/s 121.7ulps
64 threads: 27.6 GFlops 11.0 GB/s 121.7ulps
128 threads: 36.3 GFlops 14.5 GB/s 121.7ulps
256 threads: 39.7 GFlops 15.9 GB/s 121.7ulps
512 threads: 39.2 GFlops 15.7 GB/s 121.7ulps
1024 threads: 34.7 GFlops 13.9 GB/s 121.7ulps
Steve
perryjay:
Me and my little 9500GT reporting for duty sir but it's time for a little hand holding.I downloaded the package from the first post. I got a DLL and the executable. Where do I put the DLL before I open the EXE?
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version