+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: [Split] PowerSpectrum Unit Test  (Read 162570 times)

Ghost0210

  • Guest
Re: [Split] PowerSpectrum Unit Test
« Reply #240 on: 23 Dec 2010, 01:23:33 pm »
All OK (5 runs) on GTX 465:

Stock Best Result:
Quote
PS+SuMx( 32768) [OK]   12.2 GFlops   48.7 GB/s
Opt1 Best Result:
Quote
Opt1: 256 thrds/block
                                   worst case                  best case
                              GFlps  GB/s  ulps           GFlps   GB/s  ulps
 PS+SuMx( 32768)   17.7   71.0 121.7 [OK]   24.6   98.2 121.7

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #241 on: 23 Dec 2010, 01:29:52 pm »
Hey Ghost, what's the memory bus width & memory clock on that 465 ?

Ghost0210

  • Guest
Re: [Split] PowerSpectrum Unit Test
« Reply #242 on: 23 Dec 2010, 01:48:51 pm »
Hey Ghost, what's the memory bus width & memory clock on that 465 ?

Here's a GPU-Z image for the card

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #243 on: 23 Dec 2010, 01:53:55 pm »
PS+SuMx( 32768)   17.7   71.0 121.7 [OK]   24.6   98.2 121.7

Hmm this *could* be near max theoretical then ... checking
« Last Edit: 23 Dec 2010, 02:02:07 pm by Jason G »

Ghost0210

  • Guest
Re: [Split] PowerSpectrum Unit Test
« Reply #244 on: 23 Dec 2010, 01:59:50 pm »
PS+SuMx( 32768)   17.7   71.0 121.7 [OK]   24.6   98.2 121.7

Hmm this *could* be near max theoretical then ... checking

Thats good  :D
Was getting a nice capacitor whine when running the tests, so knew it was being pushed hard!

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #245 on: 23 Dec 2010, 02:02:14 pm »
I calculate 122.24 GB/s theoretical max (matching GPU-z listing), so 98.2 seems pretty good.  I'll look at what that size is doing & see if I can spread some performance around up in that area.

[Edit:] I get the impression we might be best seeing what streaming those kernels will do sometime soon  :-\  too many new fan-dangled features in this stuff  ;)
« Last Edit: 23 Dec 2010, 02:08:24 pm by Jason G »

Offline Claggy

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 3111
    • My computers at Seti Beta
Re: [Split] PowerSpectrum Unit Test
« Reply #246 on: 23 Dec 2010, 02:47:43 pm »
OK here on 9800GTX+ (5 runs) GPU usage up from stock's ~80% to ~95% on Opt1:

Best Stock result:
Quote
PS+SuMx( 65536) [OK]   11.6 GFlops   46.5 GB/s

Opt1 Best Result:
Quote
Opt1: 64 thrds/block
                                worst case                 best case
                           GFlps   GB/s   ulps         GFlps   GB/s  ulps
PS+SuMx( 65536)   13.0   52.1 121.7 [OK]   15.6   62.5 121.7

and O.K on 128Mb 8400M GS (5 runs):


Quote
Device: GeForce 8400M GS, 800 MHz clock, 114 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #8 (Sanity Check)
Stock:
 PS+SuMx(     8) [OK]    0.3 GFlops    1.3 GB/s
 PS+SuMx(    16) [OK]    0.3 GFlops    1.2 GB/s
 PS+SuMx(    32) [OK]    0.2 GFlops    0.9 GB/s
 PS+SuMx(    64) [OK]    0.4 GFlops    1.5 GB/s
 PS+SuMx(   128) [OK]    0.5 GFlops    2.2 GB/s
 PS+SuMx(   256) [OK]    0.7 GFlops    2.8 GB/s
 PS+SuMx(   512) [OK]    0.8 GFlops    3.4 GB/s
 PS+SuMx(  1024) [OK]    0.9 GFlops    3.5 GB/s
 PS+SuMx(  2048) [OK]    1.0 GFlops    4.0 GB/s
 PS+SuMx(  4096) [OK]    0.9 GFlops    3.7 GB/s
 PS+SuMx(  8192) [OK]    1.0 GFlops    4.0 GB/s
 PS+SuMx( 16384) [OK]    1.0 GFlops    3.9 GB/s
 PS+SuMx( 32768) [OK]    1.0 GFlops    4.1 GB/s
 PS+SuMx( 65536) [OK]    1.1 GFlops    4.2 GB/s
 PS+SuMx(131072) [OK]    1.1 GFlops    4.3 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    0.4    1.9 121.7 [OK]    0.5    2.1 121.7
 PS+SuMx(    16)    0.4    1.8 121.7 [OK]    0.5    1.9 121.7
 PS+SuMx(    32)    0.4    1.7 121.7 [OK]    0.4    1.7 121.7
 PS+SuMx(    64)    0.5    2.1 121.7 [OK]    0.5    2.2 121.7
 PS+SuMx(   128)    0.6    2.2 121.7 [OK]    0.6    2.3 121.7
 PS+SuMx(   256)    0.7    2.9 121.7 [OK]    0.7    3.0 121.7
 PS+SuMx(   512)    0.9    3.5 121.7 [OK]    0.9    3.6 121.7
 PS+SuMx(  1024)    0.9    3.5 121.7 [OK]    0.9    3.7 121.7
 PS+SuMx(  2048)    1.0    4.0 121.7 [OK]    1.0    4.2 121.7
 PS+SuMx(  4096)    0.9    3.8 121.7 [OK]    1.0    3.9 121.7
 PS+SuMx(  8192)    1.0    4.0 121.7 [OK]    1.0    4.2 121.7
 PS+SuMx( 16384)    1.0    4.0 121.7 [OK]    1.0    4.1 121.7
 PS+SuMx( 32768)    1.1    4.2 121.7 [OK]    1.1    4.3 121.7
 PS+SuMx( 65536)    1.1    4.3 121.7 [OK]    1.1    4.5 121.7
 PS+SuMx(131072)    1.1    4.4 121.7 [OK]    1.1    4.5 121.7

Claggy
« Last Edit: 23 Dec 2010, 05:03:16 pm by Claggy »

Offline PatrickV2

  • Knight o' The Round Table
  • ***
  • Posts: 139
Re: [Split] PowerSpectrum Unit Test
« Reply #247 on: 23 Dec 2010, 04:06:32 pm »
I ran this on my usual rig (Q6600/8GB/8800GTX) but version 8 added something new, an error. Under WinXP it just shows the error, but under Win7-64 the screen turns black and I get a "driver stopped responding error". Running 260.99.

First the WinXP-32 log:

Device: GeForce 8800 GTX, 1350 MHz clock, 768 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #8 (Sanity Check)
Stock:
 PS+SuMx(     8) [OK]    2.2 GFlops    9.7 GB/s
 PS+SuMx(    16) [OK]    2.6 GFlops   11.1 GB/s
 PS+SuMx(    32) [OK]    2.6 GFlops   10.5 GB/s
 PS+SuMx(    64) [OK]    4.3 GFlops   17.6 GB/s
 PS+SuMx(   128) [OK]    6.7 GFlops   26.9 GB/s
 PS+SuMx(   256) [OK]    9.0 GFlops   36.0 GB/s
 PS+SuMx(   512) [OK]   11.2 GFlops   44.7 GB/s
 PS+SuMx(  1024) [OK]   11.8 GFlops   47.4 GB/s
 PS+SuMx(  2048) [OK]   13.5 GFlops   53.9 GB/s
 PS+SuMx(  4096) [OK]   13.2 GFlops   52.6 GB/s
 PS+SuMx(  8192) [OK]   14.4 GFlops   57.4 GB/s
 PS+SuMx( 16384) [OK]   14.1 GFlops   56.4 GB/s
 PS+SuMx( 32768) [OK]   14.9 GFlops   59.5 GB/s
 PS+SuMx( 65536) [OK]   15.3 GFlops   61.1 GB/s
 PS+SuMx(131072) [OK]   11.9 GFlops   47.7 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    3.6   15.8 121.7 [OK]    6.2   27.2 121.7
 PS+SuMx(    16)    4.5   18.8 121.7 [OK]    6.1   25.5 121.7
 PS+SuMx(    32)    4.9   20.1 121.7 [OK]    5.8   23.8 121.7
 PS+SuMx(    64)
 FAILURE in c:/[Projects]/LunaticsUnited/Tools/Tests/PowerSpectrum/main.cpp, lin
e 456

Then the Win7-64 log:

Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #8 (Sanity Check)
Stock:
 PS+SuMx(     8) [OK]    2.0 GFlops    9.0 GB/s
 PS+SuMx(    16) [OK]    2.4 GFlops   10.2 GB/s
 PS+SuMx(    32) [OK]    2.4 GFlops    9.8 GB/s
 PS+SuMx(    64) [OK]    3.9 GFlops   15.6 GB/s
 PS+SuMx(   128) [OK]    5.7 GFlops   22.8 GB/s
 PS+SuMx(   256) [OK]    7.2 GFlops   28.8 GB/s
 PS+SuMx(   512) [OK]    8.5 GFlops   34.1 GB/s
 PS+SuMx(  1024) [OK]    8.9 GFlops   35.8 GB/s
 PS+SuMx(  2048) [OK]    9.8 GFlops   39.3 GB/s
 PS+SuMx(  4096) [OK]    9.7 GFlops   38.8 GB/s
 PS+SuMx(  8192) [OK]   10.3 GFlops   41.3 GB/s
 PS+SuMx( 16384) [OK]   10.1 GFlops   40.5 GB/s
 PS+SuMx( 32768) [OK]   10.6 GFlops   42.2 GB/s
 PS+SuMx( 65536) [OK]   10.7 GFlops   43.0 GB/s
 PS+SuMx(131072) [OK]    9.0 GFlops   36.0 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    3.4   14.8 121.7 [OK]    6.1   26.8 121.7
 PS+SuMx(    16)    4.2   17.4 121.7 [OK]    6.0   25.3 121.7
 PS+SuMx(    32)    4.6   18.7 121.7 [OK]    5.8   23.7 121.7
 PS+SuMx(    64)
 FAILURE in c:/[Projects]/LunaticsUnited/Tools/Tests/PowerSpectrum/main.cpp, lin
e 456

Regards, Patrick.

Offline SciManStev

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 263
Re: [Split] PowerSpectrum Unit Test
« Reply #248 on: 23 Dec 2010, 04:36:20 pm »
All OK here with GPU RAM at 1975 MHz with 5 runs

Best Stock result
Quote
PS+SuMx( 32768) [OK]   18.7 GFlops   75.0 GB/s

Best Opt. 1 result
Quote
PS+SuMx( 32768)   26.8  107.4 121.7 [OK]   37.0  148.1 121.7

Steve

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: [Split] PowerSpectrum Unit Test
« Reply #249 on: 23 Dec 2010, 06:00:09 pm »
very interesting test8 shows for the cards GTX470/480 --> 32768 as best result.
But with slow end cards 131072 is best.

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: [Split] PowerSpectrum Unit Test
« Reply #250 on: 23 Dec 2010, 06:37:37 pm »
All OK worst 0-0.4 faster than stock, best another .1-.4 faster than worst.
about 5 runs.
« Last Edit: 23 Dec 2010, 06:40:59 pm by Miep »
The road to hell is paved with good intentions

Offline arkayn

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 1230
  • Aaaarrrrgggghhhh
    • My Little Place On The Internet
Re: [Split] PowerSpectrum Unit Test
« Reply #251 on: 23 Dec 2010, 07:19:45 pm »
Device: GeForce GTX 460, 1600 MHz clock, 768 MB memory.
Compute capability 2.1
Compiled with CUDA 3020.

Code: [Select]
PS+SuMx( 65536) [OK]   12.4 GFlops   49.6 GB/s
Code: [Select]
PS+SuMx( 65536)   16.6   66.4 121.7 [OK]   17.7   70.7 121.7

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #252 on: 24 Dec 2010, 12:47:51 am »
PS+SuMx(    64)
 FAILURE in c:/[Projects]/LunaticsUnited/Tools/Tests/PowerSpectrum/main.cpp, lin
e 456

Wow Patrick, clearly something I'm doing in size 64 has changed (and only appears on cc1.0  :o), will check.  we're going to need to fix that before moving on.

[Later:] @Patrick: when you can, please reboot & try the attached fix attempt ( for compute cap 1.0)... If OK on that card I'll be able to avoid breaking that again...

[Removed attachment]
« Last Edit: 24 Dec 2010, 11:57:11 am by Jason G »

Offline PatrickV2

  • Knight o' The Round Table
  • ***
  • Posts: 139
Re: [Split] PowerSpectrum Unit Test
« Reply #253 on: 24 Dec 2010, 05:03:00 am »
PS+SuMx(    64)
 FAILURE in c:/[Projects]/LunaticsUnited/Tools/Tests/PowerSpectrum/main.cpp, lin
e 456

Wow Patrick, clearly something I'm doing in size 64 has changed (and only appears on cc1.0  :o), will check.  we're going to need to fix that before moving on.

[Later:] @Patrick: when you can, please reboot & try the attached fix attempt ( for compute cap 1.0)... If OK on that card I'll be able to avoid breaking that again...

It looks like you fixed it, full loggings for completion sake:

WinXP-32:

Code: [Select]
Device: GeForce 8800 GTX, 1350 MHz clock, 768 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #8 (Sanity Check)
Stock:
 PS+SuMx(     8) [OK]    2.2 GFlops    9.7 GB/s
 PS+SuMx(    16) [OK]    2.6 GFlops   11.1 GB/s
 PS+SuMx(    32) [OK]    2.6 GFlops   10.5 GB/s
 PS+SuMx(    64) [OK]    4.3 GFlops   17.6 GB/s
 PS+SuMx(   128) [OK]    6.7 GFlops   26.9 GB/s
 PS+SuMx(   256) [OK]    9.0 GFlops   36.0 GB/s
 PS+SuMx(   512) [OK]   11.2 GFlops   44.7 GB/s
 PS+SuMx(  1024) [OK]   11.8 GFlops   47.4 GB/s
 PS+SuMx(  2048) [OK]   13.5 GFlops   53.9 GB/s
 PS+SuMx(  4096) [OK]   13.2 GFlops   52.6 GB/s
 PS+SuMx(  8192) [OK]   14.4 GFlops   57.5 GB/s
 PS+SuMx( 16384) [OK]   14.1 GFlops   56.5 GB/s
 PS+SuMx( 32768) [OK]   14.9 GFlops   59.5 GB/s
 PS+SuMx( 65536) [OK]   15.3 GFlops   61.2 GB/s
 PS+SuMx(131072) [OK]   12.0 GFlops   47.8 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    3.6   15.8 121.7 [OK]    6.2   27.2 121.7
 PS+SuMx(    16)    4.5   18.8 121.7 [OK]    6.1   25.5 121.7
 PS+SuMx(    32)    4.9   20.1 121.7 [OK]    5.8   23.8 121.7
 PS+SuMx(    64)    6.5   26.5 121.7 [OK]    7.4   30.0 121.7
 PS+SuMx(   128)    7.2   28.8 121.7 [OK]    7.8   31.3 121.7
 PS+SuMx(   256)    9.4   37.8 121.7 [OK]   10.2   40.7 121.7
 PS+SuMx(   512)   11.6   46.3 121.7 [OK]   12.4   49.7 121.7
 PS+SuMx(  1024)   12.1   48.5 121.7 [OK]   12.9   51.6 121.7
 PS+SuMx(  2048)   13.7   54.9 121.7 [OK]   14.6   58.5 121.7
 PS+SuMx(  4096)   13.4   53.5 121.7 [OK]   14.2   56.8 121.7
 PS+SuMx(  8192)   14.5   58.2 121.7 [OK]   15.5   62.0 121.7
 PS+SuMx( 16384)   14.3   57.1 121.7 [OK]   15.2   60.9 121.7
 PS+SuMx( 32768)   15.1   60.3 121.7 [OK]   16.1   64.4 121.7
 PS+SuMx( 65536)   15.5   62.0 121.7 [OK]   16.5   66.2 121.7
 PS+SuMx(131072)   12.1   48.2 121.7 [OK]   12.7   50.8 121.7

Win7-64:

Code: [Select]
Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
PowerSpectrum+summax Unit test #8 (Sanity Check)
Stock:
 PS+SuMx(     8) [OK]    2.0 GFlops    8.7 GB/s
 PS+SuMx(    16) [OK]    2.4 GFlops   10.2 GB/s
 PS+SuMx(    32) [OK]    2.4 GFlops    9.7 GB/s
 PS+SuMx(    64) [OK]    3.9 GFlops   15.8 GB/s
 PS+SuMx(   128) [OK]    5.6 GFlops   22.7 GB/s
 PS+SuMx(   256) [OK]    7.2 GFlops   29.0 GB/s
 PS+SuMx(   512) [OK]    8.7 GFlops   34.7 GB/s
 PS+SuMx(  1024) [OK]    9.0 GFlops   36.0 GB/s
 PS+SuMx(  2048) [OK]   10.0 GFlops   40.1 GB/s
 PS+SuMx(  4096) [OK]    9.8 GFlops   39.0 GB/s
 PS+SuMx(  8192) [OK]   10.4 GFlops   41.6 GB/s
 PS+SuMx( 16384) [OK]   10.2 GFlops   40.7 GB/s
 PS+SuMx( 32768) [OK]   10.8 GFlops   43.2 GB/s
 PS+SuMx( 65536) [OK]   10.9 GFlops   43.6 GB/s
 PS+SuMx(131072) [OK]    9.0 GFlops   36.1 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    3.4   14.9 121.7 [OK]    6.1   26.8 121.7
 PS+SuMx(    16)    4.2   17.6 121.7 [OK]    6.1   25.4 121.7
 PS+SuMx(    32)    4.6   18.7 121.7 [OK]    5.8   23.7 121.7
 PS+SuMx(    64)    6.0   24.2 121.7 [OK]    7.3   29.4 121.7
 PS+SuMx(   128)    6.5   26.0 121.7 [OK]    7.7   31.1 121.7
 PS+SuMx(   256)    8.3   33.3 121.7 [OK]   10.1   40.4 121.7
 PS+SuMx(   512)    9.9   39.8 121.7 [OK]   12.3   49.4 121.7
 PS+SuMx(  1024)   10.2   40.8 121.7 [OK]   12.8   51.3 121.7
 PS+SuMx(  2048)   11.3   45.2 121.7 [OK]   14.5   58.2 121.7
 PS+SuMx(  4096)   11.2   44.6 121.7 [OK]   14.1   56.3 121.7
 PS+SuMx(  8192)   12.1   48.3 121.7 [OK]   15.4   61.5 121.7
 PS+SuMx( 16384)   11.7   46.8 121.7 [OK]   15.1   60.4 121.7
 PS+SuMx( 32768)   12.2   48.8 121.7 [OK]   16.0   63.8 121.7
 PS+SuMx( 65536)   12.5   50.0 121.7 [OK]   16.4   65.8 121.7
 PS+SuMx(131072)   10.1   40.5 121.7 [OK]   12.6   50.5 121.7

Regards, Patrick.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #254 on: 24 Dec 2010, 06:47:56 am »
Phew!  cool, thanks  ;D

Not much headroom on that chip either, but I'll be happy with that small fraction improvement on the oldest cards for now. 

Moving onto test #9 soon, will add in the FFTs, then will stream the test kernels after that, just to see what that does... Progress at last  ;D

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 355
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 46
Total: 46
Powered by EzPortal