+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: [Split] PowerSpectrum Unit Test  (Read 162825 times)

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: [Split] PowerSpectrum Unit Test
« Reply #165 on: 05 Dec 2010, 11:31:33 am »
Vista64
~~~~
Stopping Boinc...
PowerSpectrumTest6.exe -device 0

Device: GeForce GTX 470, 810 MHz clock, 1248 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   20.4 GFlops   81.6 GB/s   0.0ulps

 SumMax (    64)    1.4 GFlops    6.0 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    4.6 GFlops   18.7 GB/s


GetPowerSpectrum() choice for Opt1: 256 thrds/block
    256 threads:       30.0 GFlops  119.9 GB/s 121.7ulps


Opt1 (PSmod3+SM): 256 thrds/block
PowerSpectrumSumMax array pinned in host memory.
  256 threads, fftlen 64: (worst case: full summax copy)
         7.1 GFlops   28.8 GB/s 121.7ulps
Every ifft average & peak OK
  256 threads, fftlen 64: (best case, nothing to update)
        11.1 GFlops   45.1 GB/s 121.7ulps


PowerSpectrumTest6.exe -device 1

Device: GeForce GTX 470, 810 MHz clock, 1249 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   20.4 GFlops   81.8 GB/s   0.0ulps

 SumMax (    64)    1.4 GFlops    5.9 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    4.6 GFlops   18.5 GB/s


GetPowerSpectrum() choice for Opt1: 256 thrds/block
    256 threads:       30.1 GFlops  120.6 GB/s 121.7ulps


Opt1 (PSmod3+SM): 256 thrds/block
PowerSpectrumSumMax array pinned in host memory.
  256 threads, fftlen 64: (worst case: full summax copy)
         7.3 GFlops   29.7 GB/s 121.7ulps
Every ifft average & peak OK
  256 threads, fftlen 64: (best case, nothing to update)
        11.2 GFlops   45.2 GB/s 121.7ulps


.
Done
Restarting Boinc...

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #166 on: 05 Dec 2010, 12:40:39 pm »
Thanks Richard, perryjay & Heinz.

All fit with the models so far.

The Compute capability 1.1, devices, Richard & Perryjay,  are IMO doing their memory bound best with the powerspectrum, ~matching stock 'PwrSpec' speed for that, then 'magically' lifting with the reductions (summax)  for Opt1 worst case.  I beleive that must be purely a result of the memory transfer hiding, since the compute density of the reduction hasn't changed from O(logn).

@Heinz, glad to see your numbers back up to where they should be.  I reckon that's scaling well against my OC'd 480:
Stock (PS+Summax): 5.9 GFlops  , 23.7 GB/s
worse (opt1):          10.0 GFlops , 40.4 GB/s
best   (opt1):          16.0 GFlops , 64.8 GB/s


Offline SciManStev

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 263
Re: [Split] PowerSpectrum Unit Test
« Reply #167 on: 05 Dec 2010, 02:43:36 pm »

Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   28.1 GFlops  112.5 GB/s   0.0ulps

 SumMax (    64)    2.3 GFlops    9.6 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    7.2 GFlops   29.2 GB/s


GetPowerSpectrum() choice for Opt1: 256 thrds/block
    256 threads:       41.4 GFlops  165.6 GB/s 121.7ulps


Opt1 (PSmod3+SM): 256 thrds/block
PowerSpectrumSumMax array pinned in host memory.
  256 threads, fftlen 64: (worst case: full summax copy)
        12.7 GFlops   51.5 GB/s 121.7ulps
Every ifft average & peak OK
  256 threads, fftlen 64: (best case, nothing to update)
        16.1 GFlops   65.3 GB/s 121.7ulps


Steve

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #168 on: 05 Dec 2010, 04:14:37 pm »
Ouch! 27% more throughput on worst case Opt1 than mine  (12.7 Vs 10 GFlops) ;D despite slower powerspectrum (memory), that can't be core (same 'best' case @16.1) .... PCIe Bus overclocked ? (ahh, faster host memory too I suppose)
« Last Edit: 05 Dec 2010, 04:23:04 pm by Jason G »

Offline SciManStev

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 263
Re: [Split] PowerSpectrum Unit Test
« Reply #169 on: 05 Dec 2010, 05:06:18 pm »
My CPU memory is at 1774 MHz. My PCIe buss is slightly over clocked. I adjusted my GPU RAM to 1900 MHz. There is still room for more. I am on my last GPU wu for Einstein. There aren't any available at the moment. Piggy hit the #5 spot for the top rigs at Einstein with a RAC of over 14,000. There is nothing slow about Piggy. It does a fantastic job at running Starry Night Pro Plus astronomy software. I can't wait to get back to SETI crunching!

Steve

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #170 on: 05 Dec 2010, 05:13:01 pm »
My CPU memory is at 1774 MHz. My PCIe buss is slightly over clocked. ..

Whew! that's a relief. My host is only running dual channel DDR2 memory (corsair stuff though), so I'm due for some upgrades on the host if it's limiting the 480.  Will see if I can hold out 'till Sandy Bridge release & get decent CPU/RAM/Mobo to drive it :-\.

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: [Split] PowerSpectrum Unit Test
« Reply #171 on: 05 Dec 2010, 06:25:39 pm »
9800GT, Windows XP/32

Code: [Select]
Device: GeForce 9800 GT, 1500 MHz clock, 512 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   12.1 GFlops   48.5 GB/s 1183.3ulps

 SumMax (    64)    1.1 GFlops    4.8 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    3.5 GFlops   14.2 GB/s


GetPowerSpectrum() choice for Opt1: 64 thrds/block
     64 threads:       12.1 GFlops   48.4 GB/s 121.7ulps


Opt1 (PSmod3+SM): 64 thrds/block
PowerSpectrumSumMax array pinned in host memory.
   64 threads, fftlen 64: (worst case: full summax copy)
         5.8 GFlops   23.4 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         7.0 GFlops   28.4 GB/s 121.7ulps

Offline glennaxl

  • Knight o' The Realm
  • **
  • Posts: 86
Re: [Split] PowerSpectrum Unit Test
« Reply #172 on: 05 Dec 2010, 09:16:32 pm »
Win7 x64
*********
-device 0
Device: GeForce GTX 295, 1476 MHz clock, 874 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   26.5 GFlops  105.8 GB/s 1183.3ulps

 SumMax (    64)    2.2 GFlops    9.3 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    6.8 GFlops   27.3 GB/s


GetPowerSpectrum() choice for Opt1: 128 thrds/block
    128 threads:       26.7 GFlops  106.9 GB/s 121.7ulps


Opt1 (PSmod3+SM): 128 thrds/block
PowerSpectrumSumMax array pinned in host memory.
  128 threads, fftlen 64: (worst case: full summax copy)
        11.4 GFlops   46.1 GB/s 121.7ulps
Every ifft average & peak OK
  128 threads, fftlen 64: (best case, nothing to update)
        15.5 GFlops   62.8 GB/s 121.7ulps

-device 1
Device: GeForce GTX 295, 1476 MHz clock, 873 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   26.1 GFlops  104.3 GB/s 1183.3ulps

 SumMax (    64)    2.2 GFlops    9.2 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    6.9 GFlops   28.0 GB/s


GetPowerSpectrum() choice for Opt1: 128 thrds/block
    128 threads:       26.4 GFlops  105.5 GB/s 121.7ulps


Opt1 (PSmod3+SM): 128 thrds/block
PowerSpectrumSumMax array pinned in host memory.
  128 threads, fftlen 64: (worst case: full summax copy)
        11.3 GFlops   45.9 GB/s 121.7ulps
Every ifft average & peak OK
  128 threads, fftlen 64: (best case, nothing to update)
        15.4 GFlops   62.2 GB/s 121.7ulps

-device 2
Device: GeForce GTX 260, 1487 MHz clock, 874 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   25.5 GFlops  101.9 GB/s 1183.3ulps

 SumMax (    64)    2.1 GFlops    8.7 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    6.6 GFlops   26.7 GB/s


GetPowerSpectrum() choice for Opt1: 128 thrds/block
    128 threads:       25.9 GFlops  103.7 GB/s 121.7ulps


Opt1 (PSmod3+SM): 128 thrds/block
PowerSpectrumSumMax array pinned in host memory.
  128 threads, fftlen 64: (worst case: full summax copy)
        10.8 GFlops   43.5 GB/s 121.7ulps
Every ifft average & peak OK
  128 threads, fftlen 64: (best case, nothing to update)
        14.4 GFlops   58.2 GB/s 121.7ulps

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #173 on: 05 Dec 2010, 10:04:13 pm »
Ahah, I wondered how the 200 series would respond (haven't had a chance to test on the 260 in the other room yet).  Looks like they appreciate the lifting of memory constraints as well.  That means we'll probably All start going up in GFlops as we pack in more computation (Chirps, FFTs, findspikes, etc ).  This latest test appears to be capping out at host memory & PCIe bus speeds, so while faster, it has an artificial ceiling imposed by the current code designs & their communication costs (memory & bus bound), rather than GPU compute performance .
« Last Edit: 05 Dec 2010, 10:10:30 pm by Jason G »

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: [Split] PowerSpectrum Unit Test
« Reply #174 on: 06 Dec 2010, 05:36:57 am »
and one small mobile GPU ;) :

Device: Quadro FX 570M, 950 MHz clock, 242 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>    4.5 GFlops   17.8 GB/s 1183.3ulps

 SumMax (    64)    0.2 GFlops    1.0 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    0.9 GFlops    3.4 GB/s


GetPowerSpectrum() choice for Opt1: 64 thrds/block
     64 threads:        4.5 GFlops   17.8 GB/s 121.7ulps


Opt1 (PSmod3+SM): 64 thrds/block
PowerSpectrumSumMax array pinned in host memory.
   64 threads, fftlen 64: (worst case: full summax copy)
         1.5 GFlops    5.9 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         1.6 GFlops    6.7 GB/s 121.7ulps

The road to hell is paved with good intentions

Offline Vyper

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 376
Re: [Split] PowerSpectrum Unit Test
« Reply #175 on: 06 Dec 2010, 07:38:30 am »
Well here is one of my slightly overclocked GTX460.

Running Win7X64 & 260.99 version.

Kind regards Vyper

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #176 on: 06 Dec 2010, 08:09:17 am »
and one small mobile GPU ;) :
The worst case reduction is faster while the powerspectrum same speed, great ;D

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #177 on: 06 Dec 2010, 08:11:29 am »
Well here is one of my slightly overclocked GTX460.

Thank's!  We Fermi users are going to need more computation packed in there to bring those GFlops up.

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: [Split] PowerSpectrum Unit Test
« Reply #178 on: 06 Dec 2010, 09:30:38 am »
ok, a bit of statistics then. average +- std dev over 15 runs

Device: Quadro FX 570M, 950 MHz clock, 242 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>    4.4 GFlops   17.5 GB/s 1183.3ulps

 SumMax (    64)    0.3 GFlops    1.1 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    0.82 +- 0.086 GFlops    3.5 GB/s


GetPowerSpectrum() choice for Opt1: 64 thrds/block
     64 threads:        4.37 +- 0.046 GFlops   17.5 GB/s 121.7ulps


Opt1 (PSmod3+SM): 64 thrds/block
PowerSpectrumSumMax array pinned in host memory.
   64 threads, fftlen 64: (worst case: full summax copy)
         1.37 +- 0.149 GFlops    6.0 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         1.61 +- 0.026 GFlops    6.6 GB/s 121.7ulps


now if the pink was better distingushabel from the white ::)
would you like that for the GB/s as well?
The road to hell is paved with good intentions

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #179 on: 06 Dec 2010, 09:57:58 am »
now if the pink was better distingushabel from the white ::)
would you like that for the GB/s as well?

Thanks for the tolerances.  Being largely memory bound, the FLops tolerances are more than enough, and indicate +/- 10% variation of worst case on that.  I presume that's driving a display, so that's reasonable.

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 355
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 20
Total: 20
Powered by EzPortal