Forum > Discussion Forum

Better sleep on Windows - new round

<< < (3/7) > >>

Raistmer:
Damn, forgot that ATi low-perf path doesn't enable sleep instead of NV one. My C-60 two-days test screwed   :-\

Raistmer:
From Mike's run:

PG1327
Sleep0: class SleepQuantum:      total=43.846642,   N=32,   <>=1.3702075,   min=0.93442535   max=1.6681368
Sleep1: class SleepQuantum:      total=43.289009,   N=31,   <>=1.3964196,   min=1.1869471   max=1.8000549
SwitchTothread:class SleepQuantum:      total=44.513672,   N=32,   <>=1.3910522,   min=0.948681   max=1.8100463

Summary: we should forget about PG set for this GPU and especially for this test. Only ~30 occuriences for whole task and even not all of them modified.


@Mike please repeat similar test on next occasion with task attached here. I hope it lasts longer and give more chances to test.

P.S. from my C-60 failed test one can make conclusion that ATi not too good for this test. ATi runtime frees CPU good enough to mix non-sleep results with sleep ones. I'll provide NV flavour version soon too.

Raistmer:
Binaries updated: http://lunatics.kwsn.info/index.php/topic,1812.msg61017.html#msg61017
-both ATi and NV flavors
-all occurencies changed so now SleepQuantum counter really represents usage of particular sleep method (Sleep0/1/STT).

All builds are SoG ones. SoG currently use 2 sleep-wait loops. These builds explore if any replacement of Sleep(1) can improve CPU consumption by GPU app in these loops. There is possibility to squize more free CPU cycles by using STT or Sleep(0) but this will be topic of separate investigation and hardly go into near release.

For testing use busy CPU (no sense to free CPU cycles if nobody use it) and -use_sleep in tuning line.
Though some of configs have sleep enabled by default it's too easy to make mistake so better provide use sleep manually always for this test.

More benchmark result will follow. I suggest to use long-enough tasks and look into SleepQuantum's counter's N parameter - it's the number of updates it has. Worth to get this number high enough to get representative data for this test.

Raistmer:
Small preliminary test on GT720:

-use_sleep in tuning line
CPU busy

WU : PG1327_v8.wu
MB8_win_x64_AVX_VS2010_r3330.exe -verb -nog :
  Elapsed 226.306 secs
      CPU 223.315 secs
MB8_win_x86_SSE3_OpenCL_NV_SoG_Sleep0.exe  :
  Elapsed 260.736 secs, speedup: -15.21%  ratio: 0.87x
      CPU 19.641 secs, speedup: 91.20%  ratio: 11.37x
MB8_win_x86_SSE3_OpenCL_NV_SoG_Sleep1.exe  :
  Elapsed 259.995 secs, speedup: -14.89%  ratio: 0.87x
      CPU 18.939 secs, speedup: 91.52%  ratio: 11.79x
MB8_win_x86_SSE3_OpenCL_NV_SoG_STT.exe  :
  Elapsed 259.921 secs, speedup: -14.85%  ratio: 0.87x
      CPU 19.828 secs, speedup: 91.12%  ratio: 11.26x
setiathome_8.16_windows_intelx86__opencl_nvidia_SoG.exe  :
  Elapsed 259.128 secs, speedup: -14.50%  ratio: 0.87x
      CPU 43.602 secs, speedup: 80.48%  ratio: 5.12x
setiathome_8.17_windows_intelx86__opencl_nvidia_SoG.exe  :
  Elapsed 259.860 secs, speedup: -14.83%  ratio: 0.87x
      CPU 19.017 secs, speedup: 91.48%  ratio: 11.74x

No strong differencies between sleep methods but one thing to notice: 8.17 definitely better in use_sleep than 8.16

And SleepQuntum's values are:

Sleep0: class SleepQuantum:      total=91.016396,   N=40,   <>=2.2754099,   min=0.076670475   max=4.2926121
Sleep1: class SleepQuantum:      total=66.940231,   N=62,   <>=1.0796812,   min=0.80534756   max=8.826087
STT     class SleepQuantum:      total=162.07121,   N=33,   <>=4.9112489,   min=4.226912   max=6.0012178
default:class SleepQuantum:      total=46.431198,   N=47,   <>=0.98789783,   min=0.90345198   max=1.0177377

default actually match with Sleep1 so it shows noise level for this test - definitely more prolonged tasks required.

Mike:
Here is bench with AR 0.75

Weakly similar on all 3 sleep variants.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version