Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (62/62)

Claggy:
My 9800GTX+ on Win 7 x64:

Device: GeForce 9800 GTX/9800 GTX+, 1900 MHz clock, 496 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   49.81) Peak(   71.73) Min(    8.11) [OK]
   Memory thoughput GB/s   Avg(   29.08) Peak(   44.80) Min(   14.31)


Opt1 (worst case): 64 thrds/block, 2 x 524288 element streams
  revert to single stream from size 128
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(   57.66, 1.16x) Peak(   80.19, 1.12x) Min(   18.07, 2.23x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   35.80, 1.23x) Peak(   50.46, 1.13x) Min(   24.47, 1.71x)

Claggy

glennaxl:
Device: GeForce GTX 260, 1441 MHz clock, 869 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   47.55) Peak(   65.20) Min(   10.16) [OK]
   Memory thoughput GB/s   Avg(   28.09) Peak(   37.12) Min(   17.92)


Opt1 (worst case): 128 thrds/block, 2 x 524288 element streams
  revert to single stream from size 256
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(   84.83, 1.78x) Peak(  111.50, 1.71x) Min(   31.57, 3.11x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   52.63, 1.87x) Peak(   67.26, 1.81x) Min(   36.24, 2.02x)

Jason G:
Thanks both!

@glenaxl: that's some impressive speedup on GTX 260, I'll have to look at that here carefully on mine when I get a chance to do so.

@Claggy, average at 3/4 of peak seems pretty good, but I think we can get some more maybe.

@ALL, Thanks! I'm closing this test for now.  It's been an extremely valuable contribution from you all that has had a huge impact on the pace & quality of our progress (mine in particular). 

FYI: Some urgent issues may have come to light from Raistmer's OpenCL development when combined with the refinements here.  Those will need some fairly close attention for a short while, to get some information back to Berkeley, but stay tuned as there are more tests to come  :)

[Locking thread, Please stay tuned for further Unit Tests!]
Jason

Jason G:
@All:
   Just a note that the concerns that arose, and distracted me from testing & development along this line, have now been at least partially resolved, and don't require any immediate action on our part.   I'm back to ruggedising &  integrating what we've accomplished here into the X-builds, and plan to start devising tests for PoT (Power over Time) processing refinement soon, in similar fashion to this thread.  PoT processing covers Gaussian searches, Triplet & Pulse finding, for which all Cuda releases have known issues to address, so there'll be plenty of tests to devise & collect data for yet.

Cheers once again!  :)
Jason

Navigation

[0] Message Index

[*] Previous page

Go to full version