Forum > GPU crunching
[Split] PowerSpectrum Unit Test
Miep:
It's not neccessary to completely stop Boinc, but at least the GPU should be snoozed.
Can't test GPU computing/memory transfers when you are crunching with it.
Else you will see reduced values on the test.
perryjay:
Okay, let's see how much of a difference this makes....
Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.
C:\Users\perry>cd\test
C:\test>powerspectrumtest9.exe
Device: GeForce 9500 GT, 1848 MHz clock, 1006 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #9 (FFT pipeline)
Christmas 2010 edition.
Stock:
FFT+PS+SM( 8) 3.0 GFlops 5.2 GB/s ulps(fft 1.3,ps 4775.9) [OK]
FFT+PS+SM( 16) 4.0 GFlops 5.5 GB/s ulps(fft 1.6,ps 4817.4) [OK]
FFT+PS+SM( 32) 4.4 GFlops 5.0 GB/s ulps(fft 1.6,ps 4628.1) [OK]
FFT+PS+SM( 64) 7.1 GFlops 6.7 GB/s ulps(fft 1.6,ps 4557.6) [OK]
FFT+PS+SM( 128) 9.8 GFlops 8.1 GB/s ulps(fft 2.0,ps 4942.0) [OK]
FFT+PS+SM( 256) 11.9 GFlops 8.6 GB/s ulps(fft 2.0,ps 4967.8) [OK]
FFT+PS+SM( 512) 15.0 GFlops 9.8 GB/s ulps(fft 2.1,ps 5128.1) [OK]
FFT+PS+SM( 1024) 16.2 GFlops 9.6 GB/s ulps(fft 2.5,ps 5552.5) [OK]
FFT+PS+SM( 2048) 17.5 GFlops 9.5 GB/s ulps(fft 2.7,ps 5770.3) [OK]
FFT+PS+SM( 4096) 13.4 GFlops 6.7 GB/s ulps(fft 2.4,ps 5313.7) [OK]
FFT+PS+SM( 8192) 14.2 GFlops 6.6 GB/s ulps(fft 2.8,ps 5881.1) [OK]
FFT+PS+SM( 16384) 13.7 GFlops 5.9 GB/s ulps(fft 3.3,ps 6399.1) [OK]
FFT+PS+SM( 32768) 12.1 GFlops 4.9 GB/s ulps(fft 3.3,ps 6380.1) [OK]
FFT+PS+SM( 65536) 13.0 GFlops 5.0 GB/s ulps(fft 3.4,ps 6534.8) [OK]
FFT+PS+SM(131072) 13.9 GFlops 5.0 GB/s ulps(fft 3.6,ps 6694.2) [OK]
Opt1 (worst case): 64 thrds/block
FFT+PS+SM( 8) 4.1 GFlops 7.3 GB/s ulps(fft 1.3,ps 4637.5) [OK]
FFT+PS+SM( 16) 5.7 GFlops 7.7 GB/s ulps(fft 1.6,ps 4589.2) [OK]
FFT+PS+SM( 32) 7.0 GFlops 7.8 GB/s ulps(fft 1.6,ps 4535.6) [OK]
FFT+PS+SM( 64) 9.2 GFlops 8.7 GB/s ulps(fft 1.6,ps 4426.7) [OK]
FFT+PS+SM( 128) 10.5 GFlops 8.6 GB/s ulps(fft 2.0,ps 4818.1) [OK]
FFT+PS+SM( 256) 12.7 GFlops 9.2 GB/s ulps(fft 2.0,ps 4831.0) [OK]
FFT+PS+SM( 512) 16.0 GFlops 10.5 GB/s ulps(fft 2.1,ps 4987.2) [OK]
FFT+PS+SM( 1024) 17.3 GFlops 10.2 GB/s ulps(fft 2.5,ps 5438.0) [OK]
FFT+PS+SM( 2048) 18.5 GFlops 10.0 GB/s ulps(fft 2.7,ps 5674.7) [OK]
FFT+PS+SM( 4096) 13.7 GFlops 6.9 GB/s ulps(fft 2.4,ps 5202.4) [OK]
FFT+PS+SM( 8192) 14.9 GFlops 6.9 GB/s ulps(fft 2.8,ps 5765.4) [OK]
FFT+PS+SM( 16384) 15.4 GFlops 6.6 GB/s ulps(fft 3.3,ps 6291.8) [OK]
FFT+PS+SM( 32768) 13.1 GFlops 5.3 GB/s ulps(fft 3.3,ps 6275.5) [OK]
FFT+PS+SM( 65536) 13.8 GFlops 5.3 GB/s ulps(fft 3.4,ps 6429.1) [OK]
FFT+PS+SM(131072) 14.5 GFlops 5.2 GB/s ulps(fft 3.6,ps 6590.4) [OK]
C:\test>
Jason G:
Ahah! that explained the inflated speedup on the previous run :) . In essence (some of) the optimisations (namely, asynchronous transfers) I'm trying out should be less susceptible to slowdowns under load than stock code (synchronous transfers)....
I wasn't looking to test/refine that aspect yet, but you managed to prove it already works... Thanks! ;D
(Overlapped execution/transfers on Pre-Fermi, and concurrent kernels on Fermi next .... )
perryjay:
Sorry bout that... hope I didn't mess you up too much. Glad it gave you some extra to think about.
Jason G:
--- Quote from: perryjay on 25 Dec 2010, 03:39:09 pm ---Sorry bout that... hope I didn't mess you up too much. Glad it gave you some extra to think about.
--- End quote ---
Not at all messed up, just had me wondering how 9500GT was managing to get 3x throughput at some sizes, and now we know it was under load ;). That unexpected benefit does indeed give me some more things to consider for the next stage, and it looks like we might be able to push a bit harder than I thought.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version