+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: [Split] PowerSpectrum Unit Test  (Read 162613 times)

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: [Split] PowerSpectrum Unit Test
« Reply #180 on: 06 Dec 2010, 03:57:33 pm »
Thanks for the tolerances.  Being largely memory bound, the FLops tolerances are more than enough, and indicate +/- 10% variation of worst case on that.  I presume that's driving a display, so that's reasonable.

You're welcome - now what exactly makes you think the mobile GPU of a laptop might be driving a display? ;D
No bluescreens with the lastest driver yet - touch wood...

I'll do statistics on all the numbers next time round then.
The road to hell is paved with good intentions

Offline PatrickV2

  • Knight o' The Round Table
  • ***
  • Posts: 139
Re: [Split] PowerSpectrum Unit Test
« Reply #181 on: 06 Dec 2010, 05:36:07 pm »
OK, I ran version 6 of the tool on my system (Q6600/8GB/8800GTX) under both WinXP32 as well as Win7-64. If you want me to (re-)run other versions of the tool, let me know. ;)

Both loggings below each-other, first the oldest, WinXP32:

Device: GeForce 8800 GTX, 1350 MHz clock, 768 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   18.3 GFlops   73.1 GB/s 1183.3ulps

 SumMax (    64)    1.3 GFlops    5.5 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    4.3 GFlops   17.6 GB/s


GetPowerSpectrum() choice for Opt1: 64 thrds/block
     64 threads:       18.3 GFlops   73.1 GB/s 121.7ulps


Opt1 (PSmod3+SM): 64 thrds/block
   64 threads, fftlen 64: (worst case: full summax copy)
         6.4 GFlops   26.1 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         8.1 GFlops   32.7 GB/s 121.7ulps



Then Win7-64:

Device: GeForce 8800 GTX, 1350 MHz clock, 731 MB memory.
Compute capability 1.0
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   18.1 GFlops   72.5 GB/s 1183.3ulps

 SumMax (    64)    1.1 GFlops    4.8 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    3.8 GFlops   15.4 GB/s


GetPowerSpectrum() choice for Opt1: 64 thrds/block
     64 threads:       18.1 GFlops   72.6 GB/s 121.7ulps


Opt1 (PSmod3+SM): 64 thrds/block
   64 threads, fftlen 64: (worst case: full summax copy)
         5.4 GFlops   21.9 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         6.6 GFlops   26.8 GB/s 121.7ulps


Regards, Patrick.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #182 on: 06 Dec 2010, 06:19:49 pm »
Ahhh, hi Patrick.  Looks like your card  should still be able to use pinned host memory, but isn't  :( .  It indeed doesn't support mapped memory (a different kind), but didn't engage the pinned memory improvement because I need to change how I detect that feature.  I'm checking the wrong feature flags it looks like.... ooops  ::)

Will make a #7 end of week, and pay special attention to making sure that engages properly on compute capability 1.0 cards (that don't support mapped memory).

Cheers for finding the problem  ;)
« Last Edit: 06 Dec 2010, 06:24:18 pm by Jason G »

Offline PatrickV2

  • Knight o' The Round Table
  • ***
  • Posts: 139
Re: [Split] PowerSpectrum Unit Test
« Reply #183 on: 06 Dec 2010, 08:21:04 pm »
Ahhh, hi Patrick.  Looks like your card  should still be able to use pinned host memory, but isn't  :( .  It indeed doesn't support mapped memory (a different kind), but didn't engage the pinned memory improvement because I need to change how I detect that feature.  I'm checking the wrong feature flags it looks like.... ooops  ::)

Will make a #7 end of week, and pay special attention to making sure that engages properly on compute capability 1.0 cards (that don't support mapped memory).

Cheers for finding the problem  ;)

I have no idea what I did, but you're quite welcome. ;)

Regards, Patrick.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #184 on: 06 Dec 2010, 08:41:10 pm »
Thanks,

    It's what you (the test #6 anyway) didn't do  :D

This line's missing:
Quote
Opt1 (PSmod3+SM): 64 thrds/block
PowerSpectrumSumMax array pinned in host memory.
   64 threads, fftlen 64: (worst case: full summax copy)
         1.5 GFlops    5.9 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         1.6 GFlops    6.7 GB/s 121.7ulps

When operational, that feature seems to add a touch of throughput to both XP & Vista/Win7, and seems to close the performance difference. (we've been so worried about).  You should get a boost when I fix that.

Jason
« Last Edit: 06 Dec 2010, 08:45:07 pm by Jason G »

Offline PatrickV2

  • Knight o' The Round Table
  • ***
  • Posts: 139
Re: [Split] PowerSpectrum Unit Test
« Reply #185 on: 07 Dec 2010, 03:49:52 am »
Thanks,

    It's what you (the test #6 anyway) didn't do  :D

This line's missing:
Quote
Opt1 (PSmod3+SM): 64 thrds/block
PowerSpectrumSumMax array pinned in host memory.
   64 threads, fftlen 64: (worst case: full summax copy)
         1.5 GFlops    5.9 GB/s 121.7ulps
Every ifft average & peak OK
   64 threads, fftlen 64: (best case, nothing to update)
         1.6 GFlops    6.7 GB/s 121.7ulps

When operational, that feature seems to add a touch of throughput to both XP & Vista/Win7, and seems to close the performance difference. (we've been so worried about).  You should get a boost when I fix that.

Jason

Ah, ok, thanks for the elaboration. Looking forward to test #7 then!

Regards, Patrick.

Offline Frizz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 541
Re: [Split] PowerSpectrum Unit Test
« Reply #186 on: 08 Dec 2010, 07:10:12 am »
Windows XP32. GTX 570. Nvidia Driver 263.09.


Device: GeForce GTX 570, 1464 MHz clock, 1280 MB memory.
Compute capability 2.0
Compiled with CUDA 3020.
      PowerSpectrum+summax Unit test #6 (pinned mem)
Stock:
 PwrSpec<    64>   25.6 GFlops  102.5 GB/s   0.0ulps

 SumMax (    64)    1.9 GFlops    7.9 GB/s
Every ifft average & peak OK

 PS+SuMx(    64)    6.2 GFlops   25.1 GB/s


GetPowerSpectrum() choice for Opt1: 256 thrds/block
    256 threads:       33.3 GFlops  133.3 GB/s 121.7ulps


Opt1 (PSmod3+SM): 256 thrds/block
PowerSpectrumSumMax array pinned in host memory.
  256 threads, fftlen 64: (worst case: full summax copy)
        10.9 GFlops   44.0 GB/s 121.7ulps
Every ifft average & peak OK
  256 threads, fftlen 64: (best case, nothing to update)
        13.5 GFlops   54.7 GB/s 121.7ulps
Please stop using this 1366x768 glare displays: http://www.facebook.com/home.php?sk=group_153240404724993

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #187 on: 08 Dec 2010, 08:47:31 am »
570 wooot!  ;D

Offline Frizz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 541
Re: [Split] PowerSpectrum Unit Test
« Reply #188 on: 08 Dec 2010, 08:50:00 am »
570 wooot!  ;D

Borrowed it from a friend. It's hot, almost non-overclockable, and slightly slower than 480.

I am really looking forward to AMD HD6950/6970 !

[EDIT] Seems I got a bad sample. I've seen reports where the 570 has been overclocked to 840@4250 (stock: 732@3800) with air cooling.
« Last Edit: 09 Dec 2010, 02:43:55 am by Frizz »
Please stop using this 1366x768 glare displays: http://www.facebook.com/home.php?sk=group_153240404724993

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #189 on: 08 Dec 2010, 08:51:55 am »
It's hot, almost non-overclockable

Why bother ?  harvesting faulty parts you think ?

[Edit:] that worst case is slightly better than my 480 worst case, but the best cases are inferioir.   From the powerspectrum I see the constraint is memory ( again  ::) ) ... So indeed these may not be be a good choice for seti in the short term ... probably do Batman really well though  ::)
« Last Edit: 08 Dec 2010, 08:56:17 am by Jason G »

Offline Frizz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 541
Re: [Split] PowerSpectrum Unit Test
« Reply #190 on: 08 Dec 2010, 09:16:19 am »
probably do Batman really well though  ::)

LOL

Yeah ... memory. They chopped the memory interface. I guess they did this so it's not getting to close to the 580 - and not to far ahead of the 480.

GTX570: 320bit
GTX480 & GTX580: 384bit

Please stop using this 1366x768 glare displays: http://www.facebook.com/home.php?sk=group_153240404724993

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #191 on: 08 Dec 2010, 08:35:46 pm »
@All:  In the meantime, having identified the major issues at pllay with these code areas, along with appropriate techniques to use,  I have come up with some ideas for a major redesign of the FFT->Powerspectrum->Summax(reduction)->FindSpikes pipeline, which currently accounts for around ~40%-60% of processing. 

I'll change the format of the next test quite a bit, and spend time tomorrow to get things underway toward #7.

Jason
« Last Edit: 08 Dec 2010, 09:00:40 pm by Jason G »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: [Split] PowerSpectrum Unit Test
« Reply #192 on: 09 Dec 2010, 06:17:07 am »
I would suggest to  test these samples of code at different GPU freq to mem freq ratios.
SubSpace's experiment with beta OpenCL apps showed that it's very informative approach.
(He established that HD5 wins over usual OpenCL MB if GPU engine is relative fast and memory relative slow, while if GPU clocks lowed usual app wins).
I think it's quite explains why other testers see bigger execution time on VLAR for HD5 than for usual app - their GPUs not so fast relative their memory.
Memory influence can be quite highlighted this way.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: [Split] PowerSpectrum Unit Test
« Reply #193 on: 09 Dec 2010, 03:29:04 pm »
Yes, in fact that's exactly what happened to confirm memory bound nature of what's going on, ( from yet another angle ).

Steve's 480 core is clocked considerably higher than mine, yet he was initially achieving lower throughput than my card.  He tweaked his memory throughput for some improvement. 

After that, a discrepancy between throughput on XP Vs Win7 was then noted, somewhere around the familiar 10% difference.  I added use of pinned memory for the transfers, to try hide them.  With Ghost's help, In the heavy transfer case ( worst case full summax array copy, as with stock code) the XP-Win7 difference was narrowed to ~4% or less, while the WDDM performance proved more efficient with the raw processing in best case (No transfers needed)

Now Steve's 480 achieves some 27% more throughput, in the worst case,  than mine does.  I take this as an indication that the transfer hiding is shifting the bottleneck around as intended, and that it's time to move on to more sophisticated code portions with the acquired tools & techniques.

Still learning stuff every day with these things.

Jason

Offline SciManStev

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 263
Re: [Split] PowerSpectrum Unit Test
« Reply #194 on: 09 Dec 2010, 06:33:02 pm »
I have kicked my 480 memory speed up to 1975 MHz, with plenty of room to go. The 480 cores are clocked at 860 MHz. I tried to increase my CPU memory, but 1774 MHz is as fast as I can get it. I was able to increase my CPU speed to 4.26 GHz with hypethreading enabled, while maintaining about 57°C to 60°C core temps.

Steve

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 355
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 29
Total: 29
Powered by EzPortal