Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (57/62) > >>

perryjay:
Hey guys, I done something right for a change!!!  :)    ::)  Looking forward to the next test. This time I'll know to turn it off!

PatrickV2:

--- Quote from: Jason G on 25 Dec 2010, 08:53:02 am ---
--- Quote from: PatrickV2 on 25 Dec 2010, 05:21:25 am ---Ran test #9 on my Q6600/8GB/8800GTX, under both WinXP-32 as well as Win7-64.
--- End quote ---

Excellent, not broken on the 8800.  Last hurdle for that code area cleared & can move on  :D

--- End quote ---

Wonderful to hear that. As always, looking forward to the next bit of execution-magic. ;)

Regards, Patrick.

Jason G:
Will take me some time to cook up the next test, working out this streaming stuff.
  Mixed results with kernel streaming so far, appearing to benefit my smaller highly optimised kernels more over the stock-ish larger sizes (don't know why yet, and dividing further into additional streams seems to slow it down again ... tricky!  ):

As with test #9 (single stream)

--- Quote ---Opt1 (worst case): 256 thrds/block, 1 x 1048576 element streams
 FFT+PS+SM(     8)   19.2 GFlops   33.8 GB/s  ulps(fft  1.2,ps 4324.2) [OK]
 FFT+PS+SM(    16)   36.8 GFlops   50.3 GB/s  ulps(fft  1.6,ps 4326.2) [OK]
 FFT+PS+SM(    32)   60.7 GFlops   67.8 GB/s  ulps(fft  1.3,ps 4003.6) [OK]
 FFT+PS+SM(    64)   86.2 GFlops   81.6 GB/s  ulps(fft  1.5,ps 4270.2) [OK]
 FFT+PS+SM(   128)   92.5 GFlops   76.1 GB/s  ulps(fft  1.7,ps 4347.9) [OK]
 FFT+PS+SM(   256)  135.0 GFlops   98.3 GB/s  ulps(fft  1.7,ps 4261.8) [OK]
FFT+PS+SM(   512)  172.0 GFlops  112.4 GB/s  ulps(fft  1.8,ps 4327.4) [OK]
 FFT+PS+SM(  1024)  214.7 GFlops  127.3 GB/s  ulps(fft  2.1,ps 4727.6) [OK]
 FFT+PS+SM(  2048)  225.9 GFlops  122.6 GB/s  ulps(fft  2.2,ps 4921.2) [OK]
 FFT+PS+SM(  4096)  232.3 GFlops  116.2 GB/s  ulps(fft  2.2,ps 4764.3) [OK]
 FFT+PS+SM(  8192)  226.0 GFlops  104.8 GB/s  ulps(fft  2.6,ps 5278.8) [OK]
 FFT+PS+SM( 16384)  221.5 GFlops   95.8 GB/s  ulps(fft  2.6,ps 5357.5) [OK]
 FFT+PS+SM( 32768)  213.1 GFlops   86.3 GB/s  ulps(fft  2.3,ps 4992.8) [OK]
 FFT+PS+SM( 65536)  210.5 GFlops   80.2 GB/s  ulps(fft  2.0,ps 4604.3) [OK]
 FFT+PS+SM(131072)  202.6 GFlops   72.8 GB/s  ulps(fft  2.7,ps 5392.8) [OK]
--- End quote ---

2x streams:

--- Quote ---Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
 FFT+PS+SM(     8)   26.7 GFlops   47.2 GB/s  ulps(fft  1.2,ps 4324.2) [OK]
 FFT+PS+SM(    16)   66.9 GFlops   91.3 GB/s  ulps(fft  1.6,ps 4326.2) [OK]
 FFT+PS+SM(    32)   90.9 GFlops  101.5 GB/s  ulps(fft  1.3,ps 4003.6) [OK]
 FFT+PS+SM(    64)  105.0 GFlops   99.4 GB/s  ulps(fft  1.5,ps 4270.2) [OK]
 FFT+PS+SM(   128)   94.0 GFlops   77.3 GB/s  ulps(fft  1.7,ps 4347.9) [OK]
 FFT+PS+SM(   256)  135.9 GFlops   98.9 GB/s  ulps(fft  1.7,ps 4261.8) [OK]
 FFT+PS+SM(   512)  167.9 GFlops  109.7 GB/s  ulps(fft  1.8,ps 4327.4) [OK]
 FFT+PS+SM(  1024)  198.4 GFlops  117.6 GB/s  ulps(fft  2.1,ps 4727.6) [OK]
 FFT+PS+SM(  2048)  209.1 GFlops  113.4 GB/s  ulps(fft  2.2,ps 4921.2) [OK]
 FFT+PS+SM(  4096)  209.9 GFlops  105.0 GB/s  ulps(fft  2.2,ps 4764.3) [OK]
 FFT+PS+SM(  8192)  204.8 GFlops   95.0 GB/s  ulps(fft  2.6,ps 5278.8) [OK]
 FFT+PS+SM( 16384)  205.0 GFlops   88.6 GB/s  ulps(fft  2.6,ps 5357.5) [OK]
 FFT+PS+SM( 32768)  187.5 GFlops   75.9 GB/s  ulps(fft  2.3,ps 4992.8) [OK]
 FFT+PS+SM( 65536)  195.2 GFlops   74.4 GB/s  ulps(fft  2.0,ps 4604.3) [OK]
 FFT+PS+SM(131072)  172.5 GFlops   62.0 GB/s  ulps(fft  2.7,ps 5392.8) [OK]
--- End quote ---

Jason G:
Updated first Post:

--- Quote ---Update: PowerPsectrum Test #10 (attached)
- summary performance of FFT pipeline improvements against stock, for assessing overall progress
- can vary, so may need a few runs, just to check stability of result
- Please use DLLs provided with Test#9
--- End quote ---

arkayn:

--- Code: ---Device: GeForce GTX 460, 1600 MHz clock, 768 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   67.27) Peak(  111.28) Min(    9.42) [OK]
   Memory thoughput GB/s   Avg(   36.72) Peak(   55.70) Min(   15.41)


Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
  revert to single stream from size 512
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(   84.36, 1.25x) Peak(  131.47, 1.18x) Min(   31.13, 3.30x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   51.22, 1.39x) Peak(   66.16, 1.19x) Min(   34.18, 2.22x)

--- End code ---

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version