+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: [Split] PowerSpectrum Unit Test  (Read 162579 times)

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #225 on: 22 Dec 2010, 06:12:50 pm »
Well, to get that 30-50% speedup (1.5-2x) on the small GPU, we went a bit further than what the nVidia documentation specifies for efficient reductions, and the code 'looks nice' (a good sign in engineering)... Still the larger sizes to go, might have to send some notes back to nVidia after we finish this, to update the optimisation manual a bit  :o

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #226 on: 22 Dec 2010, 06:29:34 pm »
looks like a issue ?

Not 'our' problem  ;)  see what msi afterburner says (for memory),  Maybe they confuse ION & ION2, don't know
« Last Edit: 22 Dec 2010, 06:32:47 pm by Jason G »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #227 on: 23 Dec 2010, 04:45:06 am »
First post updated:
Quote
Update: PowerSpectrum(+summax reduction) Test #8 - 'Sanity check'
- Check of all needed reduction sizes
- minimal changes to larger sizes, larger than selected thrds/blk is 'almost' stock (but a bit better)
- Looking for any hardware that could yield [BAD] instead of [OK] on some sizes, particularly around selected thrds/blk
-
Don't need full results, just confirmation all [OK] & no Opt1 'worst case' slower than stock
- Intend to integrate FFTs next, so this is a critical sanity check.
- having all sizes it's a longer run, and may require several runs to see if a '[BAD]' will manifest.

Please test repeatedly on all Cuda enabled GPUs... No posting of results please (too large for me to look through, I'll go crosseyed  ;)), just confirm all Opt1 [OK] & faster at all sizes, And alert if you see and marked [BAD] or too slow, may need to run several times to see if a problem appears or not.

Jason
« Last Edit: 23 Dec 2010, 04:47:23 am by Jason G »

Offline glennaxl

  • Knight o' The Realm
  • **
  • Posts: 86
Re: [Split] PowerSpectrum Unit Test
« Reply #228 on: 23 Dec 2010, 06:05:33 am »
All systems are go except....

gtx 295
core 0 - 1 bad at test 1/5 under 128 size
core 1 - 1 slow at test 2/5 under 128 size

gtx 260 - 1 slow at test 4/5 under 256 size

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #229 on: 23 Dec 2010, 06:09:48 am »
All systems are go except....

gtx 295
core 0 - 1 bad at test 1/5 under 128 size
core 1 - 1 slow at test 2/5 under 128 size

gtx 260 - 1 slow at test 4/5 under 256 size

Thanks!  on the 295 is the Video memory OC'd ?  I found here the Opt1 around size #thrds/block(256 on Fermi, 128 on 2xx)  can be unstable if VRAM OC is pushed.  I had to back off my Video memory OC by 80MHz for it to stabilise

GTX260 - Please run that one a few times & see if that's consistently slower than stock at size 256.  Will be checking that code  in the meantime.
[Edit:] I see you did, & got one slow out of 5 ... OK


[Edit2:] Darn 128 still a little unstable here too  ???, will dial size 128 & 256 back & replace the test shortly (might be pushing a tad hard )
Jason
« Last Edit: 23 Dec 2010, 06:16:21 am by Jason G »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #230 on: 23 Dec 2010, 06:23:47 am »
@glenaxl: have updated the PowerSpectrumTest8 archive attached to first post, to dial back the borderline kernels a bit (for now, will dig deeper into those later if needed).

Jason
« Last Edit: 23 Dec 2010, 06:52:06 am by Jason G »

Offline glennaxl

  • Knight o' The Realm
  • **
  • Posts: 86
Re: [Split] PowerSpectrum Unit Test
« Reply #231 on: 23 Dec 2010, 07:25:44 am »
@glenaxl: have updated the PowerSpectrumTest8 archive attached to first post, to dial back the borderline kernels a bit (for now, will dig deeper into those later if needed).

Jason
Yah, my gtx 295 vram is oc'd to 1080 from 999


The new test8 are all good now. Perfect!  ;)

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #232 on: 23 Dec 2010, 07:27:50 am »
The new test8 are all good now. Perfect!  ;)

Good, good.  will keep those ones dialled in a bit then, allowing some possible fine tuning later.  It seems cramming that much data through we're beginning to find weak spots, so will look at moving onto FFT integration.

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: [Split] PowerSpectrum Unit Test
« Reply #233 on: 23 Dec 2010, 09:41:44 am »
Hi Jason,
Device: GeForce GTX 470, 810 MHz clock, 1248 MB memory.
Stock best result
 PS+SuMx( 32768) [OK]   12.7 GFlops   50.7 GB/s

Opt best result
 PS+SuMx( 32768)   16.4   65.7 121.7 [OK]   27.8  111.4 121.7

all others are ok

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: [Split] PowerSpectrum Unit Test
« Reply #234 on: 23 Dec 2010, 09:57:28 am »
Hi Jason,
excellent performance on the ION
worth to post full result
PowerSpectrumTest8.exe -device 0

Device: ION, 1161 MHz clock, 242 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #8 (Sanity Check)
Stock:
 PS+SuMx(     8) [OK]    0.4 GFlops    1.6 GB/s
 PS+SuMx(    16) [OK]    0.3 GFlops    1.5 GB/s
 PS+SuMx(    32) [OK]    0.3 GFlops    1.1 GB/s
 PS+SuMx(    64) [OK]    0.4 GFlops    1.8 GB/s
 PS+SuMx(   128) [OK]    0.7 GFlops    2.7 GB/s
 PS+SuMx(   256) [OK]    0.8 GFlops    3.4 GB/s
 PS+SuMx(   512) [OK]    1.1 GFlops    4.3 GB/s
 PS+SuMx(  1024) [OK]    1.1 GFlops    4.4 GB/s
 PS+SuMx(  2048) [OK]    1.2 GFlops    4.9 GB/s
 PS+SuMx(  4096) [OK]    1.2 GFlops    4.8 GB/s
 PS+SuMx(  8192) [OK]    1.3 GFlops    5.2 GB/s
 PS+SuMx( 16384) [OK]    1.3 GFlops    5.1 GB/s
 PS+SuMx( 32768) [OK]    1.3 GFlops    5.4 GB/s
 PS+SuMx( 65536) [OK]    1.4 GFlops    5.4 GB/s
PS+SuMx(131072) [OK]    1.4 GFlops    5.6 GB/s

Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    0.6    2.5 121.7 [OK]    0.7    2.9 121.7
 PS+SuMx(    16)    0.6    2.4 121.7 [OK]    0.6    2.7 121.7
 PS+SuMx(    32)    0.6    2.3 121.7 [OK]    0.6    2.4 121.7
 PS+SuMx(    64)    0.7    2.8 121.7 [OK]    0.7    3.0 121.7
 PS+SuMx(   128)    0.7    2.7 121.7 [OK]    0.7    3.0 121.7
 PS+SuMx(   256)    0.9    3.5 121.7 [OK]    1.0    3.9 121.7
 PS+SuMx(   512)    1.1    4.5 121.7 [OK]    1.2    5.0 121.7
 PS+SuMx(  1024)    1.2    4.6 121.7 [OK]    1.3    5.1 121.7
 PS+SuMx(  2048)    1.3    5.3 121.7 [OK]    1.5    5.9 121.7
 PS+SuMx(  4096)    1.3    5.0 121.7 [OK]    1.4    5.6 121.7
 PS+SuMx(  8192)    1.4    5.5 121.7 [OK]    1.5    6.1 121.7
 PS+SuMx( 16384)    1.3    5.4 121.7 [OK]    1.5    6.0 121.7
 PS+SuMx( 32768)    1.4    5.7 121.7 [OK]    1.6    6.4 121.7
 PS+SuMx( 65536)    1.4    5.8 121.7 [OK]    1.6    6.5 121.7
PS+SuMx(131072)    1.2    4.8 121.7 [OK]    1.7    6.6 121.7

.
Done

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #235 on: 23 Dec 2010, 10:18:54 am »
Yes, size 128k drops off a bit on mine too, not sure why yet.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: [Split] PowerSpectrum Unit Test
« Reply #236 on: 23 Dec 2010, 11:13:47 am »
Was able to get results for GSO9600 at last:


Device: GeForce 9600 GSO, 1700 MHz clock, 384 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #8 (Sanity Check)
Stock:
 PS+SuMx(     8) [OK]    1.2 GFlops    5.4 GB/s
 PS+SuMx(    16) [OK]    1.6 GFlops    6.9 GB/s
 PS+SuMx(    32) [OK]    1.8 GFlops    7.3 GB/s
 PS+SuMx(    64) [OK]    2.9 GFlops   11.8 GB/s
 PS+SuMx(   128) [OK]    4.3 GFlops   17.1 GB/s
 PS+SuMx(   256) [OK]    5.5 GFlops   22.1 GB/s
 PS+SuMx(   512) [OK]    6.7 GFlops   27.0 GB/s
 PS+SuMx(  1024) [OK]    7.0 GFlops   28.1 GB/s
 PS+SuMx(  2048) [OK]    7.7 GFlops   30.8 GB/s
 PS+SuMx(  4096) [OK]    7.6 GFlops   30.4 GB/s
 PS+SuMx(  8192) [OK]    7.9 GFlops   31.6 GB/s
 PS+SuMx( 16384) [OK]    7.7 GFlops   31.0 GB/s
 PS+SuMx( 32768) [OK]    8.1 GFlops   32.5 GB/s
 PS+SuMx( 65536) [OK]    7.8 GFlops   31.3 GB/s
 PS+SuMx(131072) [OK]    8.0 GFlops   32.2 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    1.5    6.5 121.7 [OK]    4.5   19.6 121.7
 PS+SuMx(    16)    2.3    9.6 121.7 [OK]    4.8   20.0 121.7
 PS+SuMx(    32)    3.0   12.1 121.7 [OK]    4.5   18.5 121.7
 PS+SuMx(    64)    3.1   12.7 121.7 [OK]    5.4   21.7 121.7
 PS+SuMx(   128)    4.5   18.1 121.7 [OK]    5.3   21.3 121.7
 PS+SuMx(   256)    5.8   23.1 121.7 [OK]    6.5   25.9 121.7
 PS+SuMx(   512)    6.9   27.8 121.7 [OK]    7.5   30.0 121.7
 PS+SuMx(  1024)    7.3   29.1 121.7 [OK]    7.8   31.2 121.7
 PS+SuMx(  2048)    7.9   31.5 121.7 [OK]    8.4   33.6 121.7
 PS+SuMx(  4096)    7.8   31.1 121.7 [OK]    8.2   32.6 121.7
 PS+SuMx(  8192)    8.1   32.3 121.7 [OK]    8.5   33.9 121.7
 PS+SuMx( 16384)    7.9   31.5 121.7 [OK]    8.2   32.8 121.7
 PS+SuMx( 32768)    8.1   32.5 121.7 [OK]    8.6   34.6 121.7
 PS+SuMx( 65536)    5.7   22.7 121.7 [OK]    8.3   33.2 121.7
 PS+SuMx(131072)    8.2   32.6 121.7 [OK]    8.5   34.1 121.7

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: [Split] PowerSpectrum Unit Test
« Reply #237 on: 23 Dec 2010, 11:21:18 am »
Okay, here's test 8. Figured it would be better for me to post it rather than try to explain what I don't understand.  :8

Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\perry>cd\test

C:\test> powerspectrumtest8.exe

Device: GeForce 9500 GT, 1848 MHz clock, 1006 MB memory.
Compute capability 1.1
Compiled with CUDA 3020.
                PowerSpectrum+summax Unit test #8 (Sanity Check)
Stock:
 PS+SuMx(     8) [OK]    0.7 GFlops    3.1 GB/s
 PS+SuMx(    16) [OK]    0.8 GFlops    3.2 GB/s
 PS+SuMx(    32) [OK]    0.7 GFlops    3.0 GB/s
 PS+SuMx(    64) [OK]    1.0 GFlops    4.2 GB/s
 PS+SuMx(   128) [OK]    0.8 GFlops    3.4 GB/s
 PS+SuMx(   256) [OK]    1.6 GFlops    6.6 GB/s
 PS+SuMx(   512) [OK]    2.0 GFlops    7.8 GB/s
 PS+SuMx(  1024) [OK]    2.1 GFlops    8.2 GB/s
 PS+SuMx(  2048) [OK]    2.1 GFlops    8.2 GB/s
 PS+SuMx(  4096) [OK]    2.0 GFlops    8.1 GB/s
 PS+SuMx(  8192) [OK]    2.1 GFlops    8.4 GB/s
 PS+SuMx( 16384) [OK]    2.1 GFlops    8.4 GB/s
 PS+SuMx( 32768) [OK]    0.5 GFlops    1.9 GB/s
 PS+SuMx( 65536) [OK]    0.4 GFlops    1.5 GB/s
 PS+SuMx(131072) [OK]    2.1 GFlops    8.5 GB/s


Opt1: 64 thrds/block
                        worst case              best case
                   GFlps  GB/s ulps         GFlps  GB/s ulps
 PS+SuMx(     8)    1.1    4.8 121.7 [OK]    1.5    6.8 121.7
 PS+SuMx(    16)    1.2    5.0 121.7 [OK]    1.7    6.9 121.7
 PS+SuMx(    32)    1.2    5.0 121.7 [OK]    1.5    6.1 121.7
 PS+SuMx(    64)    0.5    1.9 121.7 [OK]    1.7    7.1 121.7
 PS+SuMx(   128)    0.6    2.5 121.7 [OK]    1.8    7.2 121.7
 PS+SuMx(   256)    0.6    2.3 121.7 [OK]    2.1    8.3 121.7
 PS+SuMx(   512)    2.0    8.1 121.7 [OK]    2.5   10.1 121.7
 PS+SuMx(  1024)    1.9    7.8 121.7 [OK]    2.6   10.3 121.7
 PS+SuMx(  2048)    2.1    8.6 121.7 [OK]    2.6   10.3 121.7
 PS+SuMx(  4096)    0.5    2.1 121.7 [OK]    2.5   10.0 121.7
 PS+SuMx(  8192)    2.2    8.7 121.7 [OK]    2.8   11.1 121.7
 PS+SuMx( 16384)    2.1    8.2 121.7 [OK]    2.7   10.9 121.7
 PS+SuMx( 32768)    2.2    8.8 121.7 [OK]    2.8   11.1 121.7
 PS+SuMx( 65536)    2.2    8.9 121.7 [OK]    2.8   11.2 121.7
 PS+SuMx(131072)    2.3    9.2 121.7 [OK]    2.8   11.3 121.7



C:\test>

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #238 on: 23 Dec 2010, 12:12:55 pm »
Was able to get results for GSO9600 at last:

Ouch, not much headroom between worst & best (fast GDDR3 memory  on 9600GSO IIRC).  I reckon the 64k size is an anomaly worth looking into, as with the 128k drop-off on other cards (like ION).  Thankfully that part (larger sizes) is mostly stock, so there should be plenty of tweaking possibilities.... Even if only for a GFlop here and there.
« Last Edit: 23 Dec 2010, 12:26:17 pm by Jason G »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #239 on: 23 Dec 2010, 12:22:38 pm »
Okay, here's test 8. Figured it would be better for me to post it rather than try to explain what I don't understand.  :8

Thanks, A couple of sizes choking there for whatever reason.  I think I'm going to have to improve everything from size 64&128 upward before moving onto the FFTs ... Nice that it's working with all '[OK]'

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 355
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 45
Total: 45
Powered by EzPortal