+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: Better sleep on Windows - new round  (Read 44150 times)

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Better sleep on Windows - new round
« on: 17 Aug 2016, 09:12:15 am »
Here I'll collect all new attempts to free CPU for time intervals less than single millisecond.

Spinloop includes check for GPU event ready state (so some overhead implied). Code section for this experiment is:

Code: [Select]
if(use_sleep){//R: spins with Sleep(1) while readback finished
cl_event ev; clEnqueueMarker(cq,&ev);clFlush(cq);
size_t wait_time=0;cl_int ret;
do{SwitchToThread();/*nanosleep(100);*//*Sleep(use_sleep_ex);*/wait_time++;
err=clGetEventInfo(ev,CL_EVENT_COMMAND_EXECUTION_STATUS,sizeof(ret),&ret,NULL);
}while(ret>CL_COMPLETE);
cl_ulong start=0,end=0;
err=clGetEventProfilingInfo(ev,CL_PROFILING_COMMAND_QUEUED,sizeof(cl_ulong),&start,NULL);
err|=clGetEventProfilingInfo(ev,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&end,NULL);
OCL_LOG_ERR("clGetEventProfilingInfo");
float cur_quantum=(end-start)/(wait_time*1e6);
clReleaseEvent(ev);
if(use_sleep_ex==1 && wait_time>7)SleepQuantumCounter::update(cur_quantum);
if(verbose==6){
if(use_sleep_ex==1)fprintf(stderr,"current sleep quantum %2.4gms\t",cur_quantum);
fprintf(stderr,"Sleep before triplet result map: Awaited %d iterations for completion; elapsed %2.4gms\n",
wait_time,(end-start)/1e6);
}
}

While counter provide average sleep quantum, using -v 6 allows per instance results and manual averaging "by sight".
So I'll use VHAR (AR=0.75) task with SoG flavour where more than second long spins can occur on C-60 hardware.
OS is Win7 x64.

Prev results were:
Sleep(1) can be 15ms long on C-60 - too big quantum for many kernels.
Adding -high_prec_timer makes it ~1ms long - good enough but changing system-wide multimedia timer could negatively affect whole host performance.
using nanosleep() implementation for Windows based on  waitable timer (https://gist.github.com/Youka/4153f12cf2e17a77314c) gave same ~1ms quantum (though overhead of such function call expected to be higher than Sleep(1), so no advantage here).

Per Shaggie76 suggestion (http://setiathome.berkeley.edu/forum_thread.php?id=79954&postid=1809886) I'll explore SwitchToThread behavior in different host load modes.

For this experiment CPU freq of C-60 fixed to 1GHz even in P0 & P2 states by BrazosTweaker app.
exact tune line is: -period_iterations_num 4 -v 6 -use_sleep -high_prec_timer
« Last Edit: 17 Aug 2016, 10:10:18 am by Raistmer »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Better sleep on Windows - new round
« Reply #1 on: 17 Aug 2016, 09:29:00 am »
1. Tight loop (w/o any sleep attempts).

CPU idle:
typical sleep quantum is 9.57e-5ms, that is, ~100ns - quite low overhead inside spin-loop (and, of course, full core CPU usage).

CPU busy with MB (idle priority processes):
roughly same 100ns per loop and 100% core load by GPU app.

2. Sleep(0) inside loop.

CPU idle:
quantum size ~850ns and full CPU core consumption by GPU app.

CPU busy with MB:
quantum size ~2ms and ~2-3% CPU usage by GPU app - good mode.

Conclusion from this part:
Sleep(0) yields to lower-priority processes (!). GPU app process below-normal while CPU MB at idle priority (lowest possible) and CPU MB still takes almost full CPU to run.


3. Sleep(1) inside loop.

CPU idle:
quantum size 1,0ms, CPU consumption<~2%

CPU busy with CPU MB:
quantum size vary from 1.0 to 1.5ms but most readings 1.0ms still; CPU consumption by GPU app<2%.

Conclusion for this part:
Sleep(1) with high-precision multimedia timer provides better stability than Sleep(0) in both CPU idle\busy modes with CPU cycles saving and quite stable yield intervals.


4. SwitchToThread inside loop.

CPU idle:
quantum size ~660ns (less overhead than Sleep(0)); full core CPU consumption (same as Sleep(0) on idle CPU).

CPU busy:
quantum size vary from 0.01ms to ~2,7ms with most readings near 2,3ms; CPU consumption <2%.

Summary for this part: in idle CPU mode SST as useless as Sleep(0) with little less overhead. In busy CPU mode SST and Sleep(0) behavior very similar. Full task benchmarks needed to see what is better. But both seems no better than Sleep(1) currently.

Next post will compare Sleep(0), Sleep(1) and SwitchToThread() for PG-VHAR task on fully loaded CPU. It will take some time to conduct.
« Last Edit: 17 Aug 2016, 11:21:07 am by Raistmer »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Better sleep on Windows - new round
« Reply #2 on: 17 Aug 2016, 11:41:03 am »
For this test host was rebooted to restore default multimedia timer behavior.
-high_prec_time option will be added to next bench run.
No tuning line at all so fully default.
CPU fixation to 1GHz reapplied after reboot.
Binaries used for this test attached so reader can repeat it on any ATi or NV GPU FERMI+ equipped host.

And, finally, results from C-60:

CPU busy, no special changes in mm timer (and no sleep at all):

WU : AR075.wu
setiathome_8.12_windows_intelx86__opencl_ati5_sah.exe -verb -nog :
  Elapsed 17457.726 secs
      CPU 355.885 secs
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe  :
  Elapsed 16464.784 secs, speedup: 5.69%  ratio: 1.06x
      CPU 417.849 secs, speedup: -17.41%  ratio: 0.85x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe  :
  Elapsed 16283.808 secs, speedup: 6.72%  ratio: 1.07x
      CPU 413.013 secs, speedup: -16.05%  ratio: 0.86x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe  :
  Elapsed 16677.490 secs, speedup: 4.47%  ratio: 1.05x
      CPU 433.808 secs, speedup: -21.90%  ratio: 0.82x
 
WU : AR075_1.wu
setiathome_8.12_windows_intelx86__opencl_ati5_sah.exe -verb -nog :
  Elapsed 16971.441 secs
      CPU 338.897 secs
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe  :
  Elapsed 16420.791 secs, speedup: 3.24%  ratio: 1.03x
      CPU 434.338 secs, speedup: -28.16%  ratio: 0.78x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe  :
  Elapsed 16908.043 secs, speedup: 0.37%  ratio: 1.00x
      CPU 455.117 secs, speedup: -34.29%  ratio: 0.74x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe  :
  Elapsed 16489.931 secs, speedup: 2.84%  ratio: 1.03x
      CPU 437.832 secs, speedup: -29.19%  ratio: 0.77x
 
WU : PG1327_v8.wu
setiathome_8.12_windows_intelx86__opencl_ati5_sah.exe -verb -nog :
  Elapsed 880.452 secs
      CPU 59.764 secs
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe  :
  Elapsed 1050.319 secs, speedup: -19.29%  ratio: 0.84x
      CPU 79.514 secs, speedup: -33.05%  ratio: 0.75x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe  :
  Elapsed 1042.097 secs, speedup: -18.36%  ratio: 0.84x
      CPU 78.188 secs, speedup: -30.83%  ratio: 0.76x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe  :
  Elapsed 1046.668 secs, speedup: -18.88%  ratio: 0.84x
      CPU 77.891 secs, speedup: -30.33%  ratio: 0.77x
 
WU : PG1327_v8_1.wu
setiathome_8.12_windows_intelx86__opencl_ati5_sah.exe -verb -nog :
  Elapsed 1052.627 secs
      CPU 70.793 secs
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe  :
  Elapsed 1049.991 secs, speedup: 0.25%  ratio: 1.00x
      CPU 77.922 secs, speedup: -10.07%  ratio: 0.91x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe  :
  Elapsed 1040.366 secs, speedup: 1.16%  ratio: 1.01x
      CPU 77.376 secs, speedup: -9.30%  ratio: 0.91x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe  :
  Elapsed 1040.912 secs, speedup: 1.11%  ratio: 1.01x
      CPU 77.969 secs, speedup: -10.14%  ratio: 0.91x

Summary: running on busy system makes results variation too big to discriminate between these sleep versions clearly. But tendency is: current Sleep(1) is adequate approach. There is possibility to use SwitchToThread in other places to extract even more free CPU cycles from GPU app but it can't be replacement for Sleep(1) in bulk sleep areas.
This test shows noise level ONLY. Cause differing part was not used at all.
« Last Edit: 20 Aug 2016, 01:38:55 pm by Raistmer »

Offline Mike

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 2427
Re: Better sleep on Windows - new round
« Reply #3 on: 18 Aug 2016, 09:20:04 am »
Hello my name is Mike

Here is a bench of all sleep variants on my R9 380
Default settings system idle.

WU : PG0009_v7.wu
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe -verb -nog :
  Elapsed 78.072 secs
      CPU 37.861 secs
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe   :
  Elapsed 105.248 secs, speedup: -34.81%  ratio: 0.74x
      CPU 45.568 secs, speedup: -20.36%  ratio: 0.83x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3486.exe   :
  Elapsed 82.028 secs, speedup: -5.07%  ratio: 0.95x
      CPU 37.081 secs, speedup: 2.06%  ratio: 1.02x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe   :
  Elapsed 81.469 secs, speedup: -4.35%  ratio: 0.96x
      CPU 36.879 secs, speedup: 2.59%  ratio: 1.03x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe   :
  Elapsed 80.298 secs, speedup: -2.85%  ratio: 0.97x
      CPU 38.610 secs, speedup: -1.98%  ratio: 0.98x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe   :
  Elapsed 80.357 secs, speedup: -2.93%  ratio: 0.97x
      CPU 37.971 secs, speedup: -0.29%  ratio: 1.00x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe   :
  Elapsed 80.881 secs, speedup: -3.60%  ratio: 0.97x
      CPU 38.002 secs, speedup: -0.37%  ratio: 1.00x
 
WU : PG0395_v7.wu
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe -verb -nog :
  Elapsed 54.637 secs
      CPU 35.787 secs
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe   :
  Elapsed 57.780 secs, speedup: -5.75%  ratio: 0.95x
      CPU 36.208 secs, speedup: -1.18%  ratio: 0.99x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3486.exe   :
  Elapsed 57.933 secs, speedup: -6.03%  ratio: 0.94x
      CPU 36.161 secs, speedup: -1.05%  ratio: 0.99x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe   :
  Elapsed 57.769 secs, speedup: -5.73%  ratio: 0.95x
      CPU 35.459 secs, speedup: 0.92%  ratio: 1.01x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe   :
  Elapsed 56.500 secs, speedup: -3.41%  ratio: 0.97x
      CPU 38.251 secs, speedup: -6.89%  ratio: 0.94x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe   :
  Elapsed 57.102 secs, speedup: -4.51%  ratio: 0.96x
      CPU 38.064 secs, speedup: -6.36%  ratio: 0.94x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe   :
  Elapsed 56.563 secs, speedup: -3.53%  ratio: 0.97x
      CPU 38.454 secs, speedup: -7.45%  ratio: 0.93x
 
WU : PG0444_v7.wu
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe -verb -nog :
  Elapsed 53.981 secs
      CPU 35.085 secs
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe   :
  Elapsed 57.213 secs, speedup: -5.99%  ratio: 0.94x
      CPU 36.520 secs, speedup: -4.09%  ratio: 0.96x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3486.exe   :
  Elapsed 56.641 secs, speedup: -4.93%  ratio: 0.95x
      CPU 35.475 secs, speedup: -1.11%  ratio: 0.99x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe   :
  Elapsed 57.382 secs, speedup: -6.30%  ratio: 0.94x
      CPU 35.475 secs, speedup: -1.11%  ratio: 0.99x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe   :
  Elapsed 55.102 secs, speedup: -2.08%  ratio: 0.98x
      CPU 38.329 secs, speedup: -9.25%  ratio: 0.92x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe   :
  Elapsed 56.778 secs, speedup: -5.18%  ratio: 0.95x
      CPU 37.908 secs, speedup: -8.05%  ratio: 0.93x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe   :
  Elapsed 55.657 secs, speedup: -3.10%  ratio: 0.97x
      CPU 38.033 secs, speedup: -8.40%  ratio: 0.92x
 
WU : PG1327_v7.wu
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe -verb -nog :
  Elapsed 62.481 secs
      CPU 36.145 secs
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe   :
  Elapsed 66.964 secs, speedup: -7.17%  ratio: 0.93x
      CPU 36.941 secs, speedup: -2.20%  ratio: 0.98x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3486.exe   :
  Elapsed 67.738 secs, speedup: -8.41%  ratio: 0.92x
      CPU 36.941 secs, speedup: -2.20%  ratio: 0.98x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe   :
  Elapsed 68.462 secs, speedup: -9.57%  ratio: 0.91x
      CPU 36.379 secs, speedup: -0.65%  ratio: 0.99x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe   :
  Elapsed 66.963 secs, speedup: -7.17%  ratio: 0.93x
      CPU 42.323 secs, speedup: -17.09%  ratio: 0.85x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe   :
  Elapsed 66.555 secs, speedup: -6.52%  ratio: 0.94x
      CPU 42.198 secs, speedup: -16.75%  ratio: 0.86x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe   :
  Elapsed 66.071 secs, speedup: -5.75%  ratio: 0.95x
      CPU 41.917 secs, speedup: -15.97%  ratio: 0.86x
 
To me the picture is quite clear.
The faster the GPU apps are getting the more CPU it uses.
I don`t think changing high prec timer is a good idea for stock development.
Especially for hosts which are used for other things than crunching.

Now going to the hospital to see my new grand child.
Its just 10 hours old.
« Last Edit: 18 Aug 2016, 09:32:47 am by Mike »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Better sleep on Windows - new round
« Reply #4 on: 18 Aug 2016, 09:31:14 am »

Now going to the hospital to see my new grand child.
Its just 10 hours old.

Congrats, Mike! :)
I'll look test in details while.

Offline Mike

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 2427
Re: Better sleep on Windows - new round
« Reply #5 on: 18 Aug 2016, 09:33:19 am »
Uppps forgot to attach bench log.

Done.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Better sleep on Windows - new round
« Reply #6 on: 18 Aug 2016, 09:33:57 am »

I don`t think changing high prec timer is a good idea for stock development.
Especially for hosts which are used for other things than crunching.

Yep, I will not do this default, at least before quite prolonged testing. It's system-wide change... So -high_prec_timer will not be enabled by default in next release.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Better sleep on Windows - new round
« Reply #7 on: 18 Aug 2016, 09:43:15 am »
In next testing window please try with CPU busy and mandatory -use_sleep option too.

There is one very important difference between your GPU and my C-60 regarding this test. My C-60 one of slowest ATi devices so get low-perf path with sleep enabled by default.
Hence I omit it in testing (still going, BTW, on netbook).

And all changes between these builds embraced with if(use_sleep) so to enable sleep is requirement.

EDIT: and as usual these days, seems you need some full-length tasks, not PG set. GPU too fast.
For example, all CPU time you see most probably came from startup code, not icfft loop processing.
« Last Edit: 18 Aug 2016, 09:46:38 am by Raistmer »

Offline Mike

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 2427
Re: Better sleep on Windows - new round
« Reply #8 on: 18 Aug 2016, 12:58:35 pm »
In next testing window please try with CPU busy and mandatory -use_sleep option too.

There is one very important difference between your GPU and my C-60 regarding this test. My C-60 one of slowest ATi devices so get low-perf path with sleep enabled by default.
Hence I omit it in testing (still going, BTW, on netbook).

And all changes between these builds embraced with if(use_sleep) so to enable sleep is requirement.

EDIT: and as usual these days, seems you need some full-length tasks, not PG set. GPU too fast.
For example, all CPU time you see most probably came from startup code, not icfft loop processing.

You said no tuning line at all in your post above.

Thats what i did.  :(

Quote
No tuning line at all so fully default.
CPU fixation to 1GHz reapplied after reboot.
Binaries used for this test attached so reader can repeat it on any ATi GPU equipped host.

Short instruction what you want would be helpful.

Will test with CPU busy and _use_sleep.

« Last Edit: 18 Aug 2016, 01:00:36 pm by Mike »

Offline Mike

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 2427
Re: Better sleep on Windows - new round
« Reply #9 on: 18 Aug 2016, 04:06:29 pm »

Here is my bench with busy CPU using -use_sleep.

WU : PG0009_v7.wu
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe -verb -nog :
  Elapsed 78.072 secs
      CPU 37.861 secs
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe  -use_sleep :
  Elapsed 130.747 secs, speedup: -67.47%  ratio: 0.60x
      CPU 37.113 secs, speedup: 1.98%  ratio: 1.02x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3486.exe  -use_sleep :
  Elapsed 86.725 secs, speedup: -11.08%  ratio: 0.90x
      CPU 40.092 secs, speedup: -5.89%  ratio: 0.94x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe  -use_sleep :
  Elapsed 87.227 secs, speedup: -11.73%  ratio: 0.90x
      CPU 39.359 secs, speedup: -3.96%  ratio: 0.96x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe  -use_sleep :
  Elapsed 85.652 secs, speedup: -9.71%  ratio: 0.91x
      CPU 41.902 secs, speedup: -10.67%  ratio: 0.90x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe  -use_sleep :
  Elapsed 84.263 secs, speedup: -7.93%  ratio: 0.93x
      CPU 39.811 secs, speedup: -5.15%  ratio: 0.95x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe  -use_sleep :
  Elapsed 85.323 secs, speedup: -9.29%  ratio: 0.92x
      CPU 41.886 secs, speedup: -10.63%  ratio: 0.90x
 
WU : PG0395_v7.wu
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe -verb -nog :
  Elapsed 54.637 secs
      CPU 35.787 secs
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe  -use_sleep :
  Elapsed 65.286 secs, speedup: -19.49%  ratio: 0.84x
      CPU 34.679 secs, speedup: 3.10%  ratio: 1.03x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3486.exe  -use_sleep :
  Elapsed 61.540 secs, speedup: -12.63%  ratio: 0.89x
      CPU 35.038 secs, speedup: 2.09%  ratio: 1.02x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe  -use_sleep :
  Elapsed 62.675 secs, speedup: -14.71%  ratio: 0.87x
      CPU 34.913 secs, speedup: 2.44%  ratio: 1.03x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe  -use_sleep :
  Elapsed 58.698 secs, speedup: -7.43%  ratio: 0.93x
      CPU 49.218 secs, speedup: -37.53%  ratio: 0.73x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe  -use_sleep :
  Elapsed 59.481 secs, speedup: -8.87%  ratio: 0.92x
      CPU 40.966 secs, speedup: -14.47%  ratio: 0.87x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe  -use_sleep :
  Elapsed 59.418 secs, speedup: -8.75%  ratio: 0.92x
      CPU 49.094 secs, speedup: -37.18%  ratio: 0.73x
 
WU : PG0444_v7.wu
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe -verb -nog :
  Elapsed 53.981 secs
      CPU 35.085 secs
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe  -use_sleep :
  Elapsed 62.600 secs, speedup: -15.97%  ratio: 0.86x
      CPU 34.476 secs, speedup: 1.74%  ratio: 1.02x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3486.exe  -use_sleep :
  Elapsed 61.373 secs, speedup: -13.69%  ratio: 0.88x
      CPU 35.584 secs, speedup: -1.42%  ratio: 0.99x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe  -use_sleep :
  Elapsed 61.562 secs, speedup: -14.04%  ratio: 0.88x
      CPU 35.943 secs, speedup: -2.45%  ratio: 0.98x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe  -use_sleep :
  Elapsed 57.967 secs, speedup: -7.38%  ratio: 0.93x
      CPU 48.735 secs, speedup: -38.91%  ratio: 0.72x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe  -use_sleep :
  Elapsed 58.220 secs, speedup: -7.85%  ratio: 0.93x
      CPU 40.295 secs, speedup: -14.85%  ratio: 0.87x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe  -use_sleep :
  Elapsed 58.184 secs, speedup: -7.79%  ratio: 0.93x
      CPU 48.329 secs, speedup: -37.75%  ratio: 0.73x
 
WU : PG1327_v7.wu
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe -verb -nog :
  Elapsed 62.481 secs
      CPU 36.145 secs
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe  -use_sleep :
  Elapsed 69.000 secs, speedup: -10.43%  ratio: 0.91x
      CPU 39.624 secs, speedup: -9.63%  ratio: 0.91x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3486.exe  -use_sleep :
  Elapsed 69.452 secs, speedup: -11.16%  ratio: 0.90x
      CPU 38.579 secs, speedup: -6.73%  ratio: 0.94x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe  -use_sleep :
  Elapsed 68.927 secs, speedup: -10.32%  ratio: 0.91x
      CPU 37.846 secs, speedup: -4.71%  ratio: 0.96x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe  -use_sleep :
  Elapsed 67.176 secs, speedup: -7.51%  ratio: 0.93x
      CPU 44.101 secs, speedup: -22.01%  ratio: 0.82x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe  -use_sleep :
  Elapsed 67.968 secs, speedup: -8.78%  ratio: 0.92x
      CPU 43.836 secs, speedup: -21.28%  ratio: 0.82x
MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe  -use_sleep :
  Elapsed 67.928 secs, speedup: -8.72%  ratio: 0.92x
      CPU 43.477 secs, speedup: -20.28%  ratio: 0.83x
 

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Better sleep on Windows - new round
« Reply #10 on: 19 Aug 2016, 04:31:00 am »
Damn, forgot that ATi low-perf path doesn't enable sleep instead of NV one. My C-60 two-days test screwed   :-\

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Better sleep on Windows - new round
« Reply #11 on: 19 Aug 2016, 04:48:24 am »
From Mike's run:

PG1327
Sleep0: class SleepQuantum:      total=43.846642,   N=32,   <>=1.3702075,   min=0.93442535   max=1.6681368
Sleep1: class SleepQuantum:      total=43.289009,   N=31,   <>=1.3964196,   min=1.1869471   max=1.8000549
SwitchTothread:class SleepQuantum:      total=44.513672,   N=32,   <>=1.3910522,   min=0.948681   max=1.8100463

Summary: we should forget about PG set for this GPU and especially for this test. Only ~30 occuriences for whole task and even not all of them modified.


@Mike please repeat similar test on next occasion with task attached here. I hope it lasts longer and give more chances to test.

P.S. from my C-60 failed test one can make conclusion that ATi not too good for this test. ATi runtime frees CPU good enough to mix non-sleep results with sleep ones. I'll provide NV flavour version soon too.
« Last Edit: 19 Aug 2016, 04:53:24 am by Raistmer »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Better sleep on Windows - new round
« Reply #12 on: 19 Aug 2016, 06:50:08 am »
Binaries updated: http://lunatics.kwsn.info/index.php/topic,1812.msg61017.html#msg61017
-both ATi and NV flavors
-all occurencies changed so now SleepQuantum counter really represents usage of particular sleep method (Sleep0/1/STT).

All builds are SoG ones. SoG currently use 2 sleep-wait loops. These builds explore if any replacement of Sleep(1) can improve CPU consumption by GPU app in these loops. There is possibility to squize more free CPU cycles by using STT or Sleep(0) but this will be topic of separate investigation and hardly go into near release.

For testing use busy CPU (no sense to free CPU cycles if nobody use it) and -use_sleep in tuning line.
Though some of configs have sleep enabled by default it's too easy to make mistake so better provide use sleep manually always for this test.

More benchmark result will follow. I suggest to use long-enough tasks and look into SleepQuantum's counter's N parameter - it's the number of updates it has. Worth to get this number high enough to get representative data for this test.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Better sleep on Windows - new round
« Reply #13 on: 19 Aug 2016, 08:19:40 am »
Small preliminary test on GT720:

-use_sleep in tuning line
CPU busy

WU : PG1327_v8.wu
MB8_win_x64_AVX_VS2010_r3330.exe -verb -nog :
  Elapsed 226.306 secs
      CPU 223.315 secs
MB8_win_x86_SSE3_OpenCL_NV_SoG_Sleep0.exe  :
  Elapsed 260.736 secs, speedup: -15.21%  ratio: 0.87x
      CPU 19.641 secs, speedup: 91.20%  ratio: 11.37x
MB8_win_x86_SSE3_OpenCL_NV_SoG_Sleep1.exe  :
  Elapsed 259.995 secs, speedup: -14.89%  ratio: 0.87x
      CPU 18.939 secs, speedup: 91.52%  ratio: 11.79x
MB8_win_x86_SSE3_OpenCL_NV_SoG_STT.exe  :
  Elapsed 259.921 secs, speedup: -14.85%  ratio: 0.87x
      CPU 19.828 secs, speedup: 91.12%  ratio: 11.26x
setiathome_8.16_windows_intelx86__opencl_nvidia_SoG.exe  :
  Elapsed 259.128 secs, speedup: -14.50%  ratio: 0.87x
      CPU 43.602 secs, speedup: 80.48%  ratio: 5.12x
setiathome_8.17_windows_intelx86__opencl_nvidia_SoG.exe  :
  Elapsed 259.860 secs, speedup: -14.83%  ratio: 0.87x
      CPU 19.017 secs, speedup: 91.48%  ratio: 11.74x

No strong differencies between sleep methods but one thing to notice: 8.17 definitely better in use_sleep than 8.16

And SleepQuntum's values are:

Sleep0: class SleepQuantum:      total=91.016396,   N=40,   <>=2.2754099,   min=0.076670475   max=4.2926121
Sleep1: class SleepQuantum:      total=66.940231,   N=62,   <>=1.0796812,   min=0.80534756   max=8.826087
STT     class SleepQuantum:      total=162.07121,   N=33,   <>=4.9112489,   min=4.226912   max=6.0012178
default:class SleepQuantum:      total=46.431198,   N=47,   <>=0.98789783,   min=0.90345198   max=1.0177377

default actually match with Sleep1 so it shows noise level for this test - definitely more prolonged tasks required.
« Last Edit: 19 Aug 2016, 08:26:09 am by Raistmer »

Offline Mike

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 2427
Re: Better sleep on Windows - new round
« Reply #14 on: 19 Aug 2016, 10:08:34 am »
Here is bench with AR 0.75

Weakly similar on all 3 sleep variants.

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 24
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 30
Total: 30
Powered by EzPortal