Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (56/62) > >>

Miep:
It's not neccessary to completely stop Boinc, but at least the GPU should be snoozed.
Can't test GPU computing/memory transfers when you are crunching with it.
Else you will see reduced values on the test.

perryjay:
Okay, let's see how much of a difference this makes....
Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\perry>cd\test

C:\test>powerspectrumtest9.exe

Device: GeForce 9500 GT, 1848 MHz clock, 1006 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #9 (FFT pipeline)
                                Christmas 2010 edition.
Stock:
 FFT+PS+SM(     8)    3.0 GFlops    5.2 GB/s  ulps(fft  1.3,ps 4775.9) [OK]
 FFT+PS+SM(    16)    4.0 GFlops    5.5 GB/s  ulps(fft  1.6,ps 4817.4) [OK]
 FFT+PS+SM(    32)    4.4 GFlops    5.0 GB/s  ulps(fft  1.6,ps 4628.1) [OK]
 FFT+PS+SM(    64)    7.1 GFlops    6.7 GB/s  ulps(fft  1.6,ps 4557.6) [OK]
 FFT+PS+SM(   128)    9.8 GFlops    8.1 GB/s  ulps(fft  2.0,ps 4942.0) [OK]
 FFT+PS+SM(   256)   11.9 GFlops    8.6 GB/s  ulps(fft  2.0,ps 4967.8) [OK]
 FFT+PS+SM(   512)   15.0 GFlops    9.8 GB/s  ulps(fft  2.1,ps 5128.1) [OK]
 FFT+PS+SM(  1024)   16.2 GFlops    9.6 GB/s  ulps(fft  2.5,ps 5552.5) [OK]
 FFT+PS+SM(  2048)   17.5 GFlops    9.5 GB/s  ulps(fft  2.7,ps 5770.3) [OK]
 FFT+PS+SM(  4096)   13.4 GFlops    6.7 GB/s  ulps(fft  2.4,ps 5313.7) [OK]
 FFT+PS+SM(  8192)   14.2 GFlops    6.6 GB/s  ulps(fft  2.8,ps 5881.1) [OK]
 FFT+PS+SM( 16384)   13.7 GFlops    5.9 GB/s  ulps(fft  3.3,ps 6399.1) [OK]
 FFT+PS+SM( 32768)   12.1 GFlops    4.9 GB/s  ulps(fft  3.3,ps 6380.1) [OK]
 FFT+PS+SM( 65536)   13.0 GFlops    5.0 GB/s  ulps(fft  3.4,ps 6534.8) [OK]
 FFT+PS+SM(131072)   13.9 GFlops    5.0 GB/s  ulps(fft  3.6,ps 6694.2) [OK]


Opt1 (worst case): 64 thrds/block
 FFT+PS+SM(     8)    4.1 GFlops    7.3 GB/s  ulps(fft  1.3,ps 4637.5) [OK]
 FFT+PS+SM(    16)    5.7 GFlops    7.7 GB/s  ulps(fft  1.6,ps 4589.2) [OK]
 FFT+PS+SM(    32)    7.0 GFlops    7.8 GB/s  ulps(fft  1.6,ps 4535.6) [OK]
 FFT+PS+SM(    64)    9.2 GFlops    8.7 GB/s  ulps(fft  1.6,ps 4426.7) [OK]
 FFT+PS+SM(   128)   10.5 GFlops    8.6 GB/s  ulps(fft  2.0,ps 4818.1) [OK]
 FFT+PS+SM(   256)   12.7 GFlops    9.2 GB/s  ulps(fft  2.0,ps 4831.0) [OK]
 FFT+PS+SM(   512)   16.0 GFlops   10.5 GB/s  ulps(fft  2.1,ps 4987.2) [OK]
 FFT+PS+SM(  1024)   17.3 GFlops   10.2 GB/s  ulps(fft  2.5,ps 5438.0) [OK]
 FFT+PS+SM(  2048)   18.5 GFlops   10.0 GB/s  ulps(fft  2.7,ps 5674.7) [OK]
 FFT+PS+SM(  4096)   13.7 GFlops    6.9 GB/s  ulps(fft  2.4,ps 5202.4) [OK]
 FFT+PS+SM(  8192)   14.9 GFlops    6.9 GB/s  ulps(fft  2.8,ps 5765.4) [OK]
 FFT+PS+SM( 16384)   15.4 GFlops    6.6 GB/s  ulps(fft  3.3,ps 6291.8) [OK]
 FFT+PS+SM( 32768)   13.1 GFlops    5.3 GB/s  ulps(fft  3.3,ps 6275.5) [OK]
 FFT+PS+SM( 65536)   13.8 GFlops    5.3 GB/s  ulps(fft  3.4,ps 6429.1) [OK]
 FFT+PS+SM(131072)   14.5 GFlops    5.2 GB/s  ulps(fft  3.6,ps 6590.4) [OK]



C:\test>

Jason G:
Ahah! that explained the inflated speedup on the previous run  :) .  In essence (some of) the optimisations (namely, asynchronous transfers) I'm trying out should be less susceptible to  slowdowns under load than stock code (synchronous transfers)....

I wasn't looking to test/refine that aspect yet, but you managed to prove it already works... Thanks!  ;D

(Overlapped execution/transfers on Pre-Fermi, and concurrent kernels on Fermi next .... )

perryjay:
Sorry bout that... hope I didn't mess you up too much. Glad it gave you some extra to think about.

Jason G:

--- Quote from: perryjay on 25 Dec 2010, 03:39:09 pm ---Sorry bout that... hope I didn't mess you up too much. Glad it gave you some extra to think about.

--- End quote ---
  Not at all messed up, just had me wondering how 9500GT was managing to get 3x throughput at some sizes, and now we know it was under load ;).  That unexpected benefit does indeed give me some more things to consider for the next stage, and it looks like we might be able to push a bit harder than I thought.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version