Forum > GPU crunching
[Split] PowerSpectrum Unit Test
_heinz:
run it twice on the ION
~~~~~~~~~~~~~~~~~
starting PowerSpectrum2
.
Device: ION, 1100 MHz clock, 242 MB memory.
Compiled with CUDA 3020.
Stock GetPowerSpectrum():
64 threads: 1.9 GFlops 0.8 GB/s 1183.3ulps
GetPowerSpectrum() mod 1:
32 threads: 1.3 GFlops 0.5 GB/s 1183.3ulps
64 threads: 1.9 GFlops 0.7 GB/s 1183.3ulps
128 threads: 1.9 GFlops 0.8 GB/s 1183.3ulps
256 threads: 1.9 GFlops 0.8 GB/s 1183.3ulps
GetPowerSpectrum() mod 2:
32 threads: 1.0 GFlops 0.4 GB/s 1183.3ulps
64 threads: 1.0 GFlops 0.4 GB/s 1183.3ulps
128 threads: 0.9 GFlops 0.4 GB/s 1183.3ulps
256 threads: 0.8 GFlops 0.3 GB/s 1183.3ulps
Device: ION, 1100 MHz clock, 242 MB memory.
Compiled with CUDA 3020.
Stock GetPowerSpectrum():
64 threads: 1.9 GFlops 0.8 GB/s 1183.3ulps
GetPowerSpectrum() mod 1:
32 threads: 1.3 GFlops 0.5 GB/s 1183.3ulps
64 threads: 1.9 GFlops 0.8 GB/s 1183.3ulps
128 threads: 1.9 GFlops 0.8 GB/s 1183.3ulps
256 threads: 1.9 GFlops 0.8 GB/s 1183.3ulps
GetPowerSpectrum() mod 2:
32 threads: 1.0 GFlops 0.4 GB/s 1183.3ulps
64 threads: 1.0 GFlops 0.4 GB/s 1183.3ulps
128 threads: 0.9 GFlops 0.4 GB/s 1183.3ulps
256 threads: &nbqp; 0.8 GFlops 0.3 GB/s 1183.3ulps
.
Done
SciManStev:
This is what I got on my 480's with 260.99
Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory
Compiled with CUDA 3020
Stock GetPowerSpectrum<> mod 1:
64 threads: 27.6 GFlops 11.1 GB/s 0.0ulps
GetPowerSpectrum<> mod 1:
32 threads: 17.5 GFlops 7.0 GB/s 0.0ulps
64 threads: 27.5 GFlops&nb!`; 11.0 GB/s 0.0ulps
128 threads: 36.4 GFlops 14.6 GB/s 0.0ulps
256 threads: 39.6 GFlops 15.8 GB/s 0.0ulps
GetPowerSpectrum<> mod 2:
32 threads: 20.2 GFlops 8.1 GB/s 0.0ulps
64 threads: 39.7 GFlops 15.9 GB/s 0.0ulps
128 threads: 64.1 GFlops 25.6 GB/s 0.0ulps
256 threads: 64.3 GFlops 25.7 GB/s 0.0ulps
Steve
I edited the data as the first time I was crunching.
Jason G:
--- Quote from: _heinz on 18 Nov 2010, 03:37:36 pm ---modify:
@Jason, woundering about you get 20 GFlops more with 256 threads than mine GTX470
have you source for me to compile with 2011XE Compiler ?
--- End quote ---
GTX480 has wider memory bus IIRC. Also they're GPU Kernels Heinz, so CPU host side won't make any difference here (Unless Intel started messing with Cuda binaries ;) ) After some work, this will lead to a set of optimisation strategies for other kernels throughout, rather than 1 specific piece of useful code
I'm looking at this (almost pure) memory bound computation (powerspectrum), as a way to see what optimisation strategies work on different cards with that type of operation. This way I can learn to make kernels that choose the best memory access strategy internally by compute capability.
So far it looks like Mod2 is winning on Fermi (apart from whatever is causing arkayn's problems) Prior Gen 200 series seem to like Mod1 better, so I suspect there is some memory pattern issue for me to look at in Mod2 with respect to prior gen cards. Earlier G80-G92 cards could be even more memory subsystem constrained, or need even more special treatment of access patterns, by the looks of things.
@Arkayn, not sure what would cause that, but on my 480 that's where things start to get 'a bit warm' ... Is there a possibility of temperature issues ? Try cranking the fan perhaps. [Edit:] Probably pushing the 2.1 (GTX 460) architecture limits in Mod2. I'll look into that for mod3.
Steve's WINNING! (Just ;) ) -
Plenty of data for me to chew on. Will be thinking about mod3.
Jason
Jason G:
--- Quote from: Claggy on 18 Nov 2010, 06:11:25 pm ---Here's my 128Mb 8400M GS's result, while it's not got enough RAM for Seti, it at least gives you some figures for very slow GPU's:
--- End quote ---
Nice! Another stubborn GPU :D
Jason G:
--- Quote from: PatrickV2 on 18 Nov 2010, 02:36:34 pm ---Not sure if you're looking for this, but below my results on my 8800GTX, 260.99 drivers:
--- End quote ---
Exactly what I'm looking for, thanks.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version