Forum > GPU crunching
[Split] PowerSpectrum Unit Test
Jason G:
--- Quote from: SciManStev on 05 Dec 2010, 05:06:18 pm ---My CPU memory is at 1774 MHz. My PCIe buss is slightly over clocked. ..
--- End quote ---
Whew! that's a relief. My host is only running dual channel DDR2 memory (corsair stuff though), so I'm due for some upgrades on the host if it's limiting the 480. Will see if I can hold out 'till Sandy Bridge release & get decent CPU/RAM/Mobo to drive it :-\.
Richard Haselgrove:
9800GT, Windows XP/32
--- Code: ---Device: GeForce 9800 GT, 1500 MHz clock, 512 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
PwrSpec< 64> 12.1 GFlops 48.5 GB/s 1183.3ulps
SumMax ( 64) 1.1 GFlops 4.8 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 3.5 GFlops 14.2 GB/s
GetPowerSpectrum() choice for Opt1: 64 thrds/block
64 threads: 12.1 GFlops 48.4 GB/s 121.7ulps
Opt1 (PSmod3+SM): 64 thrds/block
PowerSpectrumSumMax array pinned in host memory.
64 threads, fftlen 64: (worst case: full summax copy)
5.8 GFlops 23.4 GB/s 121.7ulps
Every ifft average & peak OK
64 threads, fftlen 64: (best case, nothing to update)
7.0 GFlops 28.4 GB/s 121.7ulps
--- End code ---
glennaxl:
Win7 x64
*********
-device 0
Device: GeForce GTX 295, 1476 MHz clock, 874 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
PwrSpec< 64> 26.5 GFlops 105.8 GB/s 1183.3ulps
SumMax ( 64) 2.2 GFlops 9.3 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 6.8 GFlops 27.3 GB/s
GetPowerSpectrum() choice for Opt1: 128 thrds/block
128 threads: 26.7 GFlops 106.9 GB/s 121.7ulps
Opt1 (PSmod3+SM): 128 thrds/block
PowerSpectrumSumMax array pinned in host memory.
128 threads, fftlen 64: (worst case: full summax copy)
11.4 GFlops 46.1 GB/s 121.7ulps
Every ifft average & peak OK
128 threads, fftlen 64: (best case, nothing to update)
15.5 GFlops 62.8 GB/s 121.7ulps
-device 1
Device: GeForce GTX 295, 1476 MHz clock, 873 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
PwrSpec< 64> 26.1 GFlops 104.3 GB/s 1183.3ulps
SumMax ( 64) 2.2 GFlops 9.2 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 6.9 GFlops 28.0 GB/s
GetPowerSpectrum() choice for Opt1: 128 thrds/block
128 threads: 26.4 GFlops 105.5 GB/s 121.7ulps
Opt1 (PSmod3+SM): 128 thrds/block
PowerSpectrumSumMax array pinned in host memory.
128 threads, fftlen 64: (worst case: full summax copy)
11.3 GFlops 45.9 GB/s 121.7ulps
Every ifft average & peak OK
128 threads, fftlen 64: (best case, nothing to update)
15.4 GFlops 62.2 GB/s 121.7ulps
-device 2
Device: GeForce GTX 260, 1487 MHz clock, 874 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
PwrSpec< 64> 25.5 GFlops 101.9 GB/s 1183.3ulps
SumMax ( 64) 2.1 GFlops 8.7 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 6.6 GFlops 26.7 GB/s
GetPowerSpectrum() choice for Opt1: 128 thrds/block
128 threads: 25.9 GFlops 103.7 GB/s 121.7ulps
Opt1 (PSmod3+SM): 128 thrds/block
PowerSpectrumSumMax array pinned in host memory.
128 threads, fftlen 64: (worst case: full summax copy)
10.8 GFlops 43.5 GB/s 121.7ulps
Every ifft average & peak OK
128 threads, fftlen 64: (best case, nothing to update)
14.4 GFlops 58.2 GB/s 121.7ulps
Jason G:
Ahah, I wondered how the 200 series would respond (haven't had a chance to test on the 260 in the other room yet). Looks like they appreciate the lifting of memory constraints as well. That means we'll probably All start going up in GFlops as we pack in more computation (Chirps, FFTs, findspikes, etc ). This latest test appears to be capping out at host memory & PCIe bus speeds, so while faster, it has an artificial ceiling imposed by the current code designs & their communication costs (memory & bus bound), rather than GPU compute performance .
Miep:
and one small mobile GPU ;) :
Device: Quadro FX 570M, 950 MHz clock, 242 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
PwrSpec< 64> 4.5 GFlops 17.8 GB/s 1183.3ulps
SumMax ( 64) 0.2 GFlops 1.0 GB/s
Every ifft average & peak OK
PS+SuMx( 64) 0.9 GFlops 3.4 GB/s
GetPowerSpectrum() choice for Opt1: 64 thrds/block
64 threads: 4.5 GFlops 17.8 GB/s 121.7ulps
Opt1 (PSmod3+SM): 64 thrds/block
PowerSpectrumSumMax array pinned in host memory.
64 threads, fftlen 64: (worst case: full summax copy)
1.5 GFlops 5.9 GB/s 121.7ulps
Every ifft average & peak OK
64 threads, fftlen 64: (best case, nothing to update)
1.6 GFlops 6.7 GB/s 121.7ulps
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version