Forum > Discussion Forum
Better sleep on Windows - new round
Raistmer:
Damn, forgot that ATi low-perf path doesn't enable sleep instead of NV one. My C-60 two-days test screwed :-\
Raistmer:
From Mike's run:
PG1327
Sleep0: class SleepQuantum: total=43.846642, N=32, <>=1.3702075, min=0.93442535 max=1.6681368
Sleep1: class SleepQuantum: total=43.289009, N=31, <>=1.3964196, min=1.1869471 max=1.8000549
SwitchTothread:class SleepQuantum: total=44.513672, N=32, <>=1.3910522, min=0.948681 max=1.8100463
Summary: we should forget about PG set for this GPU and especially for this test. Only ~30 occuriences for whole task and even not all of them modified.
@Mike please repeat similar test on next occasion with task attached here. I hope it lasts longer and give more chances to test.
P.S. from my C-60 failed test one can make conclusion that ATi not too good for this test. ATi runtime frees CPU good enough to mix non-sleep results with sleep ones. I'll provide NV flavour version soon too.
Raistmer:
Binaries updated: http://lunatics.kwsn.info/index.php/topic,1812.msg61017.html#msg61017
-both ATi and NV flavors
-all occurencies changed so now SleepQuantum counter really represents usage of particular sleep method (Sleep0/1/STT).
All builds are SoG ones. SoG currently use 2 sleep-wait loops. These builds explore if any replacement of Sleep(1) can improve CPU consumption by GPU app in these loops. There is possibility to squize more free CPU cycles by using STT or Sleep(0) but this will be topic of separate investigation and hardly go into near release.
For testing use busy CPU (no sense to free CPU cycles if nobody use it) and -use_sleep in tuning line.
Though some of configs have sleep enabled by default it's too easy to make mistake so better provide use sleep manually always for this test.
More benchmark result will follow. I suggest to use long-enough tasks and look into SleepQuantum's counter's N parameter - it's the number of updates it has. Worth to get this number high enough to get representative data for this test.
Raistmer:
Small preliminary test on GT720:
-use_sleep in tuning line
CPU busy
WU : PG1327_v8.wu
MB8_win_x64_AVX_VS2010_r3330.exe -verb -nog :
Elapsed 226.306 secs
CPU 223.315 secs
MB8_win_x86_SSE3_OpenCL_NV_SoG_Sleep0.exe :
Elapsed 260.736 secs, speedup: -15.21% ratio: 0.87x
CPU 19.641 secs, speedup: 91.20% ratio: 11.37x
MB8_win_x86_SSE3_OpenCL_NV_SoG_Sleep1.exe :
Elapsed 259.995 secs, speedup: -14.89% ratio: 0.87x
CPU 18.939 secs, speedup: 91.52% ratio: 11.79x
MB8_win_x86_SSE3_OpenCL_NV_SoG_STT.exe :
Elapsed 259.921 secs, speedup: -14.85% ratio: 0.87x
CPU 19.828 secs, speedup: 91.12% ratio: 11.26x
setiathome_8.16_windows_intelx86__opencl_nvidia_SoG.exe :
Elapsed 259.128 secs, speedup: -14.50% ratio: 0.87x
CPU 43.602 secs, speedup: 80.48% ratio: 5.12x
setiathome_8.17_windows_intelx86__opencl_nvidia_SoG.exe :
Elapsed 259.860 secs, speedup: -14.83% ratio: 0.87x
CPU 19.017 secs, speedup: 91.48% ratio: 11.74x
No strong differencies between sleep methods but one thing to notice: 8.17 definitely better in use_sleep than 8.16
And SleepQuntum's values are:
Sleep0: class SleepQuantum: total=91.016396, N=40, <>=2.2754099, min=0.076670475 max=4.2926121
Sleep1: class SleepQuantum: total=66.940231, N=62, <>=1.0796812, min=0.80534756 max=8.826087
STT class SleepQuantum: total=162.07121, N=33, <>=4.9112489, min=4.226912 max=6.0012178
default:class SleepQuantum: total=46.431198, N=47, <>=0.98789783, min=0.90345198 max=1.0177377
default actually match with Sleep1 so it shows noise level for this test - definitely more prolonged tasks required.
Mike:
Here is bench with AR 0.75
Weakly similar on all 3 sleep variants.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version