Forum > GPU crunching

[Split] PowerSpectrum Unit Test

<< < (35/62) > >>

Jason G:

--- Quote from: SciManStev on 05 Dec 2010, 05:06:18 pm ---My CPU memory is at 1774 MHz. My PCIe buss is slightly over clocked. ..
--- End quote ---

Whew! that's a relief. My host is only running dual channel DDR2 memory (corsair stuff though), so I'm due for some upgrades on the host if it's limiting the 480.  Will see if I can hold out 'till Sandy Bridge release & get decent CPU/RAM/Mobo to drive it :-\.

Richard Haselgrove:
9800GT, Windows XP/32


--- Code: ---Device: GeForce 9800 GT, 1500 MHz clock, 512 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   12.1 GFlops   48.5 GB/s 1183.3ulps

 SumMax (    64)    1.1 GFlops    4.8 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    3.5 GFlops   14.2 GB/s


GetPowerSpectrum() choice for Opt1: 64 thrds/block
     64 threads:       12.1 GFlops   48.4 GB/s 121.7ulps


Opt1 (PSmod3+SM): 64 thrds/block
PowerSpectrumSumMax array pinned in host memory.
   64 threads, fftlen 64: (worst case: full summax copy)
         5.8 GFlops   23.4 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         7.0 GFlops   28.4 GB/s 121.7ulps
--- End code ---

glennaxl:
Win7 x64
*********
-device 0
Device: GeForce GTX 295, 1476 MHz clock, 874 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   26.5 GFlops  105.8 GB/s 1183.3ulps

 SumMax (    64)    2.2 GFlops    9.3 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    6.8 GFlops   27.3 GB/s


GetPowerSpectrum() choice for Opt1: 128 thrds/block
    128 threads:       26.7 GFlops  106.9 GB/s 121.7ulps


Opt1 (PSmod3+SM): 128 thrds/block
PowerSpectrumSumMax array pinned in host memory.
  128 threads, fftlen 64: (worst case: full summax copy)
        11.4 GFlops   46.1 GB/s 121.7ulps
Every ifft average & peak OK
  128 threads, fftlen 64: (best case, nothing to update)
        15.5 GFlops   62.8 GB/s 121.7ulps

-device 1
Device: GeForce GTX 295, 1476 MHz clock, 873 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   26.1 GFlops  104.3 GB/s 1183.3ulps

 SumMax (    64)    2.2 GFlops    9.2 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    6.9 GFlops   28.0 GB/s


GetPowerSpectrum() choice for Opt1: 128 thrds/block
    128 threads:       26.4 GFlops  105.5 GB/s 121.7ulps


Opt1 (PSmod3+SM): 128 thrds/block
PowerSpectrumSumMax array pinned in host memory.
  128 threads, fftlen 64: (worst case: full summax copy)
        11.3 GFlops   45.9 GB/s 121.7ulps
Every ifft average & peak OK
  128 threads, fftlen 64: (best case, nothing to update)
        15.4 GFlops   62.2 GB/s 121.7ulps

-device 2
Device: GeForce GTX 260, 1487 MHz clock, 874 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   25.5 GFlops  101.9 GB/s 1183.3ulps

 SumMax (    64)    2.1 GFlops    8.7 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    6.6 GFlops   26.7 GB/s


GetPowerSpectrum() choice for Opt1: 128 thrds/block
    128 threads:       25.9 GFlops  103.7 GB/s 121.7ulps


Opt1 (PSmod3+SM): 128 thrds/block
PowerSpectrumSumMax array pinned in host memory.
  128 threads, fftlen 64: (worst case: full summax copy)
        10.8 GFlops   43.5 GB/s 121.7ulps
Every ifft average & peak OK
  128 threads, fftlen 64: (best case, nothing to update)
        14.4 GFlops   58.2 GB/s 121.7ulps

Jason G:
Ahah, I wondered how the 200 series would respond (haven't had a chance to test on the 260 in the other room yet).  Looks like they appreciate the lifting of memory constraints as well.  That means we'll probably All start going up in GFlops as we pack in more computation (Chirps, FFTs, findspikes, etc ).  This latest test appears to be capping out at host memory & PCIe bus speeds, so while faster, it has an artificial ceiling imposed by the current code designs & their communication costs (memory & bus bound), rather than GPU compute performance .

Miep:
and one small mobile GPU ;) :

Device: Quadro FX 570M, 950 MHz clock, 242 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>    4.5 GFlops   17.8 GB/s 1183.3ulps

 SumMax (    64)    0.2 GFlops    1.0 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    0.9 GFlops    3.4 GB/s


GetPowerSpectrum() choice for Opt1: 64 thrds/block
     64 threads:        4.5 GFlops   17.8 GB/s 121.7ulps


Opt1 (PSmod3+SM): 64 thrds/block
PowerSpectrumSumMax array pinned in host memory.
   64 threads, fftlen 64: (worst case: full summax copy)
         1.5 GFlops    5.9 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         1.6 GFlops    6.7 GB/s 121.7ulps

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version