Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (26/62) > >>

Miep:
For sake of completeness:

Device: Quadro FX 570M, 950 MHz clock, 242 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #5
Stock:
 PwrSpec<    64>    4.5 GFlops   17.8 GB/s 1183.3ulps

 SumMax (    64)    0.2 GFlops    1.0 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    0.4 GFlops    1.7 GB/s


GetPowerSpectrum() choice for Opt1: 64 thrds/block
     64 threads:        4.4 GFlops   17.8 GB/s 121.7ulps


Opt1 (PSmod3+SM): 64 thrds/block
   64 threads, fftlen 64: (worst case: full summax copy)
         1.3 GFlops    5.3 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         1.4 GFlops    5.8 GB/s 121.7ulps


Some 10% difference between the two bottom ones.

Jason G:
Cheers,
  Analysing...
   
Average, peak calcs, thread-count hueristic: OK
    worst case speedup: (1.3-0.4)/0.4   ~225%  (3.25x).. Winner!  ;D
    best case speedup:  (1.4-0.4)/0.4    ~250%  (3.5x)


Double checking those ridiculous numbers: (mistakes always possible  ;) )

1.3GFlops(optimised) / 0.5 GFlops(Stock) definitely = 3.25x  (325% of stock throughput)
The perecentage of optimised throughput that is speedup is then 0.9 GFlops / 1.3 GFlops  ~= 69 percent of Opt throughput is Bonus.  Speedup component is 225% of the stock throughput.

#Stock is doing something that GPU doesn't like  :-\

Miep:
I reran a few times, getting 0.8-0.9 1.3 1.4-1.5 now.
i.e. higher baseline, optimazation values stable. Can do some statistics tomorrow.

edit: that 0.4 seems to have been exceptionally low (and no, I didn't have the GPU crunching by accident :P )

Jason G:
OK, non-critical unless I make computation mistakes  ( I was mostly concerned here to not make code slower...).  Stock / x32f code there is doing something your GPU doesn't like IMO.

Was that quadro 'integrated & using some portion of system memory ? or does it use dedicated memory ?

SciManStev:

Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #5
Stock:
 PwrSpec<    64>   28.4 GFlops  113.7 GB/s   0.0ulps

 SumMax (    64)    2.3 GFlops    9.7 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    7.4 GFlops   29.9 GB/s


GetPowerSpectrum() choice for Opt1: 256 thrds/block
    256 threads:       41.4 GFlops  165.5 GB/s 121.7ulps


Opt1 (PSmod3+SM): 256 thrds/block
  256 threads, fftlen 64: (worst case: full summax copy)
        10.9 GFlops   44.0 GB/s 121.7ulps
Every ifft average & peak OK
  256 threads, fftlen 64: (best case, nothing to update)
        16.2 GFlops   65.4 GB/s 121.7ulps


This was much easier than typing it out. Thanks, Richard.

Steve

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version