Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (9/62) > >>

Ghost0210:
And on the 465:

Device: GeForce GTX 465, 1215 MHz clock, 994 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
Stock GetPowerSpectrum():
     64 threads:       16.0 GFlops    6.4 GB/s   0.0ulps


GetPowerSpectrum() mod 1: (made Fermi & Pre-Fermi match in accuracy.)
     32 threads:        9.8 GFlops    3.9 GB/s 121.7ulps
     64 threads:       15.8 GFlops    6.3 GB/s 121.7ulps
    128 threads:       20.8 GFlops    8.3 GB/s 121.7ulps
    256 threads:       23.1 GFlops    9.2 GB/s 121.7ulps


GetPowerSpectrum() mod 2 (fixed, but slow):
     32 threads:       10.8 GFlops    4.3 GB/s   0.0ulps
     64 threads:       13.2 GFlops    5.3 GB/s   0.0ulps
    128 threads:       13.3 GFlops    5.3 GB/s   0.0ulps
    256 threads:       12.1 GFlops    4.9 GB/s   0.0ulps


GetPowerSpectrum() mod 3: (As with mod1, +threads & split loads)
     32 threads:        9.4 GFlops    3.7 GB/s 121.7ulps
     64 threads:       15.3 GFlops    6.1 GB/s 121.7ulps
    128 threads:       20.8 GFlops    8.3 GB/s 121.7ulps
    256 threads:       20.6 GFlops    8.3 GB/s 121.7ulps
    512 threads:       20.6 GFlops    8.2 GB/s 121.7ulps
   1024 threads:       18.6 GFlops    7.4 GB/s 121.7ulps

Jason G:
Cheers,
   Will have to test the kernel concurrency next ( launch 2 - 16 powerspectrums at the same time ). No idea how much, if any, overall speed improvement might be achievable with that, but  needs testing.  I'll keep stock & all 3 mods in play for that, since one may 'pack' better than the others (smaller thread counts might pass the larger ones in performance if executing multiple on the same multiprocessor).

M_M:
@Ghost: Didn't you post about 50% higher results for GTX465 results yesterday? Why's the difference? Did you change something? Drivers?

Or are there 2 different versons of PowerSpectrum floating around?

Jason G:

--- Quote from: M_M on 19 Nov 2010, 02:09:58 pm ---Or are there 2 different versons of PowerSpectrum floating around?

--- End quote ---

Check the first post, for the updated build & notes.  The Mod2 kernel was doing suspect things, so I've knobbled it (for now).

[I see you used the newer build yourself, so yes, mod2 numbers will be lower than yesterday ]

SciManStev:
Device:  GeForce GTX 480, 810 MHz clock,  1503 MB memory
Compute capability 2.0
Compiled with CUDA 3020
Stock GetPowerSpectrum<>:
     63 threads:       27.7 GFlops  11.1 GB/s    0.0ulps

GetPowerSpectrum<> mod 1: <made Fermi & Pre-Fermi match in accuracy.>
     32 threads:       17.4 GFlops   7.0 GB/s    121.7ulps
     64 threads:       27.5 GFlops  11.0 GB/s    121.7ulps
    128 threads:       36.4 GFlops  14.5 GB/s    121.7ulps
    256 threads:       39.6 GFlops  15.8 GB/s    121.7ulps

GetPowerSpectrum<> mod 2 <fixed, but slow>:
     32 threads:       18.9 GFlops   7.6 GB/s      0.0ulps
     64 threads:       23.1 GFlops   9.2 GB/s      0.0ulps
    128 threads:       24.1 GFlops   9.6 GB/s      0.0ulps
    256 threads:       22.7 GFlops   9.1 GB/s      0.0ulps

GetPowerSpectrum<> mod 3: <As with mod1, +threads & split loads>
     32 threads:       16.7 GFlpos   6.7 GB/s    121.7ulps
     64 threads:       26.9 GFlops  10.8 GB/s    121.7ulps
    128 threads:       36.0 GFlops  14.4 GB/s    121.7ulps
    256 threads:       34.9 GFlops  13.9 GB/s    121.7ulps
    512 threads:       34.7 GFlops  13.9 GB/s    121.7ulps
   1024 threads:       33.5 GFlops  13.4 GB/s    121.7ulps


Steve

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version