[Split] PowerSpectrum Unit Test

Forum > GPU crunching

<< < (9/62) > >>

Ghost0210:
And on the 465:

Device: GeForce GTX 465, 1215 MHz clock, 994 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
Stock GetPowerSpectrum():
64 threads: 16.0 GFlops 6.4 GB/s 0.0ulps

GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
32 threads: 9.8 GFlops 3.9 GB/s 121.7ulps
64 threads: 15.8 GFlops 6.3 GB/s 121.7ulps
128 threads: 20.8 GFlops 8.3 GB/s 121.7ulps
256 threads: 23.1 GFlops 9.2 GB/s 121.7ulps

GetPowerSpectrum() mod 2 (fixed, but slow):
32 threads: 10.8 GFlops 4.3 GB/s 0.0ulps
64 threads: 13.2 GFlops 5.3 GB/s 0.0ulps
128 threads: 13.3 GFlops 5.3 GB/s 0.0ulps
256 threads: 12.1 GFlops 4.9 GB/s 0.0ulps

GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
32 threads: 9.4 GFlops 3.7 GB/s 121.7ulps
64 threads: 15.3 GFlops 6.1 GB/s 121.7ulps
128 threads: 20.8 GFlops 8.3 GB/s 121.7ulps
256 threads: 20.6 GFlops 8.3 GB/s 121.7ulps
512 threads: 20.6 GFlops 8.2 GB/s 121.7ulps
1024 threads: 18.6 GFlops 7.4 GB/s 121.7ulps

Jason G:
Cheers,
Will have to test the kernel concurrency next ( launch 2 - 16 powerspectrums at the same time ). No idea how much, if any, overall speed improvement might be achievable with that, but needs testing. I'll keep stock & all 3 mods in play for that, since one may 'pack' better than the others (smaller thread counts might pass the larger ones in performance if executing multiple on the same multiprocessor).

M_M:
@Ghost: Didn't you post about 50% higher results for GTX465 results yesterday? Why's the difference? Did you change something? Drivers?

Or are there 2 different versons of PowerSpectrum floating around?

Jason G:

--- Quote from: M_M on 19 Nov 2010, 02:09:58 pm ---Or are there 2 different versons of PowerSpectrum floating around?

--- End quote ---

Check the first post, for the updated build & notes. The Mod2 kernel was doing suspect things, so I've knobbled it (for now).

[I see you used the newer build yourself, so yes, mod2 numbers will be lower than yesterday ]

SciManStev:
Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory
Compute capability 2.0
Compiled with CUDA 3020
Stock GetPowerSpectrum<>:
63 threads: 27.7 GFlops 11.1 GB/s 0.0ulps

GetPowerSpectrum<> mod 1: <made Fermi & Pre-Fermi match in accuracy.>
32 threads: 17.4 GFlops 7.0 GB/s 121.7ulps
64 threads: 27.5 GFlops 11.0 GB/s 121.7ulps
128 threads: 36.4 GFlops 14.5 GB/s 121.7ulps
256 threads: 39.6 GFlops 15.8 GB/s 121.7ulps

GetPowerSpectrum<> mod 2 <fixed, but slow>:
32 threads: 18.9 GFlops 7.6 GB/s 0.0ulps
64 threads: 23.1 GFlops 9.2 GB/s 0.0ulps
128 threads: 24.1 GFlops 9.6 GB/s 0.0ulps
256 threads: 22.7 GFlops 9.1 GB/s 0.0ulps

GetPowerSpectrum<> mod 3: <As with mod1, +threads & split loads>
32 threads: 16.7 GFlpos 6.7 GB/s 121.7ulps
64 threads: 26.9 GFlops 10.8 GB/s 121.7ulps
128 threads: 36.0 GFlops 14.4 GB/s 121.7ulps
256 threads: 34.9 GFlops 13.9 GB/s 121.7ulps
512 threads: 34.7 GFlops 13.9 GB/s 121.7ulps
1024 threads: 33.5 GFlops 13.4 GB/s 121.7ulps

Steve

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version