+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: [Split] PowerSpectrum Unit Test  (Read 138428 times)

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #285 on: 26 Dec 2010, 12:39:51 pm »
Cheers,
  BTW: average roughly represents overall improvement, Peak represents speed change in the fastest Kernels, and Min is the speed change in the slowest Kernels ... So I regard 'Avg' & 'Min' as most important, with Peak being mostly just a possible indicator of remaining headroom.

[Edit:] Similarish looking deal with the 480
Code: [Select]
Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(  104.34) Peak(  157.97) Min(   12.79) [OK]
   Memory thoughput GB/s   Avg(   57.34) Peak(   82.25) Min(   22.55)


Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
  revert to single stream from size 512
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(  162.15, 1.55x) Peak(  232.02, 1.47x) Min(   26.47, 2.07x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   95.38, 1.66x) Peak(  127.32, 1.55x) Min(   46.67, 2.07x)
« Last Edit: 26 Dec 2010, 12:43:36 pm by Jason G »

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: [Split] PowerSpectrum Unit Test
« Reply #286 on: 26 Dec 2010, 01:03:59 pm »
Hi Jason,
new results from Test10
~~~~~~~~~~~~~~~
PowerSpectrumTest10.exe -device 0

Device: GeForce GTX 470, 810 MHz clock, 1248 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #10 (FFT pipeline throughput
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   82.93) Peak(  130.76) Min(   12.00) [OK]
   Memory thoughput GB/s   Avg(   46.20) Peak(   64.10) Min(   21.16)


Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
  revert to single stream from size 512
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(  125.13, 1.51x) Peak(  178.98, 1.37x) Min(   37.50, 3.12x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   75.48, 1.63x) Peak(   95.64, 1.49x) Min(   52.23, 2.47x)


PowerSpectrumTest10.exe -device 1

Device: GeForce GTX 470, 810 MHz clock, 1249 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #10 (FFT pipeline throughput
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   80.74) Peak(  126.77) Min(   11.69) [OK]
   Memory thoughput GB/s   Avg(   44.99) Peak(   59.75) Min(   20.61)


Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
  revert to single stream from size 512
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(  125.57, 1.56x) Peak(  179.89, 1.42x) Min(   37.72, 3.23x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   75.75, 1.68x) Peak(   95.76, 1.60x) Min(   52.48, 2.55x)


.
Done
PowerSpectrumTest10.exe -device 0

Device: ION, 1161 MHz clock, 242 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(    4.38) Peak(    6.24) Min(    1.31) [OK]
   Memory thoughput GB/s   Avg(    2.66) Peak(    3.97) Min(    1.80)


Opt1 (worst case): 64 thrds/block, 2 x 524288 element streams
  revert to single stream from size 128
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(    4.86, 1.11x) Peak(    6.64, 1.06x) Min(    1.86, 1.41x) [OK]
   Memory thoughput [GB/s]   -
      Avg(    3.08, 1.16x) Peak(    4.29, 1.08x) Min(    2.10, 1.17x)


.
Done

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #287 on: 26 Dec 2010, 01:07:00 pm »
Works on ION, YaY!  :)

Offline Claggy

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 3111
    • My computers at Seti Beta
Re: [Split] PowerSpectrum Unit Test
« Reply #288 on: 26 Dec 2010, 01:20:11 pm »
On my 128Mb 8400M GS:

Device: GeForce 8400M GS, 800 MHz clock, 114 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(    4.07) Peak(    5.64) Min(    1.19) [OK]
   Memory thoughput GB/s   Avg(    2.44) Peak(    3.69) Min(    1.51)


Opt1 (worst case): 64 thrds/block, 2 x 524288 element streams
  revert to single stream from size 128
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(    4.30, 1.06x) Peak(    5.78, 1.03x) Min(    1.68, 1.41x) [OK]
   Memory thoughput [GB/s]   -
      Avg(    2.70, 1.11x) Peak(    3.78, 1.03x) Min(    1.90, 1.26x)


Claggy

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #289 on: 26 Dec 2010, 01:25:11 pm »
On my 128Mb 8400M GS:

Work's on that too  :D,  looks like we've managed to max that one out  ;)

Ghost0210

  • Guest
Re: [Split] PowerSpectrum Unit Test
« Reply #290 on: 26 Dec 2010, 01:25:30 pm »
And My 465:

Device: GeForce GTX 465, 1215 MHz clock, 994 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   69.41) Peak(  104.12) Min(   10.56) [OK]
   Memory thoughput GB/s   Avg(   38.49) Peak(   54.71) Min(   18.61)


Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
  revert to single stream from size 512
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(  101.54, 1.46x) Peak(  140.32, 1.35x) Min(   36.67, 3.47x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   61.36, 1.59x) Peak(   78.16, 1.43x) Min(   46.65, 2.51x)

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: [Split] PowerSpectrum Unit Test
« Reply #291 on: 26 Dec 2010, 01:26:47 pm »
Okay, I remembered to stop BOINC this time....
Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\perry>cd\test

C:\test>powerspectrumtest10.exe

Device: GeForce 9500 GT, 1848 MHz clock, 1006 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   11.40) Peak(   17.48) Min(    2.91) [OK]
   Memory thoughput GB/s   Avg(    6.85) Peak(    9.86) Min(    4.95)


Opt1 (worst case): 64 thrds/block, 2 x 524288 element streams
  revert to single stream from size 128
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(   12.35, 1.08x) Peak(   18.33, 1.05x) Min(    4.45, 1.53x) [OK]
   Memory thoughput [GB/s]   -
      Avg(    7.76, 1.13x) Peak(   10.33, 1.05x) Min(    5.14, 1.04x)



C:\test>

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #292 on: 26 Dec 2010, 01:34:40 pm »
And My 465:

and
Okay, I remembered to stop BOINC this time....
...
Device: GeForce 9500 GT, 1848 MHz clock, 1006 MB memory.
...

Thanks both! Still some breathing room between avg & peak on those.

Offline SciManStev

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 263
Re: [Split] PowerSpectrum Unit Test
« Reply #293 on: 26 Dec 2010, 02:11:17 pm »

Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(  114.30) Peak(  169.79) Min(   21.35) [OK]
   Memory thoughput GB/s   Avg(   64.38) Peak(   89.45) Min(   34.20)


Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
  revert to single stream from size 512
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(  165.56, 1.45x) Peak(  234.17, 1.38x) Min(   61.06, 2.86x) [OK]
   Memory thoughput [GB/s]   -
      Avg(  100.82, 1.57x) Peak(  126.77, 1.42x) Min(   70.89, 2.07x)

Steve

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #294 on: 26 Dec 2010, 02:21:14 pm »
Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
...
  Compute thoughput [GFlops] -
      Avg(  165.56, 1.45x) Peak(  234.17, 1.38x) Min(   61.06, 2.86x) [OK]

Winning! (just ;))  Glad you're on water cooling with those, My fan cranks up with that and creates a vortex in my room :D.

It made me think '1.21 GigaWatts!'.   I'll be checking out & researching on water cooling the 480 here,  sometime in the new year.  Starting with the basics with guides like This one,  & doing my homework.

Offline SciManStev

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 263
Re: [Split] PowerSpectrum Unit Test
« Reply #295 on: 26 Dec 2010, 03:22:39 pm »
Device: GeForce GTX 480, 810 MHz clock, 1503 MB memory.
...
  Compute thoughput [GFlops] -
      Avg(  165.56, 1.45x) Peak(  234.17, 1.38x) Min(   61.06, 2.86x) [OK]

Winning! (just ;))  Glad you're on water cooling with those, My fan cranks up with that and creates a vortex in my room :D.

It made me think '1.21 GigaWatts!'.   I'll be checking out & researching on water cooling the 480 here,  sometime in the new year.  Starting with the basics with guides like This one,  & doing my homework.


With all the help you have given others, I would be happy to offer any assistance I could should you choose to go with water cooling. There is a lot in my system Tuning thread in NC you might find interesting. System Tuning

Steve

Offline PatrickV2

  • Knight o' The Round Table
  • ***
  • Posts: 139
Re: [Split] PowerSpectrum Unit Test
« Reply #296 on: 26 Dec 2010, 06:22:13 pm »
Q6600/8GB/8800GTX.

One remark though: if you want to run a test multiple times, why not do that in the download-able executable? I don't mind if a benchmark of yours runs several minutes on my rig, so just do a few test-runs, determine the max/min and standard-deviation or something and output that?

I have in any case run the benchmark 3 times on both OS versions, before running a 4th one redirected to a text-file (and compared that one too). Results and speed-ups looked stable to my 'naked' eye.

WinXP-32:

Code: [Select]
Device: GeForce 8800 GTX, 1350 MHz clock, 768 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   51.45) Peak(   72.63) Min(    9.33) [OK]
   Memory thoughput GB/s   Avg(   30.07) Peak(   47.47) Min(   16.45)


Opt1 (worst case): 64 thrds/block, 2 x 524288 element streams
  revert to single stream from size 128
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(   55.01, 1.07x) Peak(   75.98, 1.05x) Min(   13.89, 1.49x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   33.46, 1.11x) Peak(   49.65, 1.05x) Min(   24.23, 1.47x)

Win7-64:

Code: [Select]
Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   45.04) Peak(   62.72) Min(    8.62) [OK]
   Memory thoughput GB/s   Avg(   26.39) Peak(   40.07) Min(   15.21)


Opt1 (worst case): 64 thrds/block, 2 x 524288 element streams
  revert to single stream from size 128
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(   54.49, 1.21x) Peak(   75.17, 1.20x) Min(   13.75, 1.59x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   33.12, 1.26x) Peak(   49.13, 1.23x) Min(   24.07, 1.58x)

Regards, Patrick.

Offline MarkJ

  • Knight o' The Realm
  • **
  • Posts: 96
Re: [Split] PowerSpectrum Unit Test
« Reply #297 on: 27 Dec 2010, 12:17:59 am »
Did a few runs for test #10 on different cards/machines...

Cheers,
MarkJ

-------------------------------------------------

Device: GeForce GT 240, 1340 MHz clock, 475 MB memory.
Compute capability 1.2
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   32.78) Peak(   48.81) Min(    8.49) [OK]
   Memory thoughput GB/s   Avg(   19.49) Peak(   28.94) Min(   12.38)

Opt1 (worst case): 64 thrds/block, 2 x 524288 element streams
  revert to single stream from size 128
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(   35.66, 1.09x) Peak(   51.41, 1.05x) Min(   12.84, 1.51x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   22.13, 1.14x) Peak(   30.48, 1.05x) Min(   15.22, 1.23x)

------------------------------------------------------------

Device: GeForce GTX 460, 1350 MHz clock, 768 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   62.95) Peak(  102.88) Min(    8.18) [OK]
   Memory thoughput GB/s   Avg(   34.05) Peak(   52.16) Min(   13.33)

Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
  revert to single stream from size 512
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(   79.87, 1.27x) Peak(  121.17, 1.18x) Min(   23.84, 2.91x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   47.79, 1.40x) Peak(   63.10, 1.21x) Min(   33.50, 2.51x)

-----------------------------------------------------------

Device: GeForce GTX 570, 1464 MHz clock, 1248 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(  101.46) Peak(  151.95) Min(   20.02) [OK]
   Memory thoughput GB/s   Avg(   57.48) Peak(   79.89) Min(   30.85)

Opt1 (worst case): 256 thrds/block, 2 x 524288 element streams
  revert to single stream from size 512
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(  139.93, 1.38x) Peak(  199.62, 1.31x) Min(   51.29, 2.56x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   85.24, 1.48x) Peak(  106.89, 1.34x) Min(   58.81, 1.91x)

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #298 on: 27 Dec 2010, 02:32:59 am »
Q6600/8GB/8800GTX.

One remark though: if you want to run a test multiple times, why not do that in the download-able executable? I don't mind if a benchmark of yours runs several minutes on my rig, so just do a few test-runs, determine the max/min and standard-deviation or something and output that?

I have in any case run the benchmark 3 times on both OS versions, before running a 4th one redirected to a text-file (and compared that one too). Results and speed-ups looked stable to my 'naked' eye.


Cheers & No worries Patrick,
     Just wasn't sure extending the test was going to be needed.  Naked eye judgement is plenty for the purposes of testing scientific repeatability here, and running multiple times in the same exe would make it one large test rather than several small ones for comparison (if that makes any sense).  I'm happy that the 8800 seems to have some headroom left, and the 'Min' numbers indicate the sloest kernels have received a niice boost. 

Win7(WDDM) & XP(XPDM) driver model performance difference is 'gone'  ;D

Secondary confirmation from a friend's 8800GTS:
XP32
Code: [Select]
Device: GeForce 8800 GTS 512, 1625 MHz clock, 512 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   44.40) Peak(   66.68) Min(    7.85) [OK]
   Memory thoughput GB/s   Avg(   26.26) Peak(   41.19) Min(   13.83)


Opt1 (worst case): 64 thrds/block, 2 x 524288 element streams
  revert to single stream from size 128
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(   47.57, 1.07x) Peak(   67.80, 1.02x) Min(   17.37, 2.21x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   30.04, 1.14x) Peak(   41.89, 1.02x) Min(   19.00, 1.37x)
Win7-32
Code: [Select]
Device: GeForce 8800 GTS 512, 1625 MHz clock, 500 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   40.57) Peak(   57.91) Min(    7.32) [OK]
   Memory thoughput GB/s   Avg(   23.86) Peak(   35.82) Min(   12.91)


Opt1 (worst case): 64 thrds/block, 2 x 524288 element streams
  revert to single stream from size 128
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(   48.43, 1.19x) Peak(   66.67, 1.15x) Min(   15.87, 2.17x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   30.30, 1.27x) Peak(   41.94, 1.17x) Min(   20.41, 1.58x)
« Last Edit: 27 Dec 2010, 02:42:27 am by Jason G »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #299 on: 27 Dec 2010, 02:45:28 am »
Did a few runs for test #10 on different cards/machines...

Cheers,
MarkJ

Thanks Mark! Starting to make a dent with the stubborn 240, and the Fermi boosts looking healthy.
I will need to get to checking the 260 in the other room soon, then we should have 'the full set'

[Later:] Here 'tis
Quote
Device: GeForce GTX 260, 1242 MHz clock, 896 MB memory.
Compute capability 1.3
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #10 (FFT pipeline throughput)
Stock:
  Processing... Done!
  Compute Thoughput GFlops Avg(   62.64) Peak(   93.36) Min(    4.48) [OK]
   Memory thoughput GB/s   Avg(   34.47) Peak(   52.71) Min(    7.89)


Opt1 (worst case): 128 thrds/block, 2 x 524288 element streams
  revert to single stream from size 256
  Processing... Done!
  Compute thoughput [GFlops] -
      Avg(   67.78, 1.08x) Peak(   95.96, 1.03x) Min(    5.69, 1.27x) [OK]
   Memory thoughput [GB/s]   -
      Avg(   38.80, 1.13x) Peak(   55.48, 1.05x) Min(   10.03, 1.27x)
Maybe still some headroom on 200 series as well.
« Last Edit: 27 Dec 2010, 03:44:06 am by Jason G »

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 170
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 126
Total: 126
Powered by EzPortal