Pages: 1 ... 3 4 [5] 6 7 ... 10
41
« Last post by Raistmer on 19 Aug 2016, 04:48:24 am »
From Mike's run:
PG1327 Sleep0: class SleepQuantum: total=43.846642, N=32, <>=1.3702075, min=0.93442535 max=1.6681368 Sleep1: class SleepQuantum: total=43.289009, N=31, <>=1.3964196, min=1.1869471 max=1.8000549 SwitchTothread:class SleepQuantum: total=44.513672, N=32, <>=1.3910522, min=0.948681 max=1.8100463
Summary: we should forget about PG set for this GPU and especially for this test. Only ~30 occuriences for whole task and even not all of them modified.
@Mike please repeat similar test on next occasion with task attached here. I hope it lasts longer and give more chances to test.
P.S. from my C-60 failed test one can make conclusion that ATi not too good for this test. ATi runtime frees CPU good enough to mix non-sleep results with sleep ones. I'll provide NV flavour version soon too.
42
« Last post by Raistmer on 19 Aug 2016, 04:31:00 am »
Damn, forgot that ATi low-perf path doesn't enable sleep instead of NV one. My C-60 two-days test screwed 
43
« Last post by Mike on 18 Aug 2016, 04:06:29 pm »
Here is my bench with busy CPU using -use_sleep.
WU : PG0009_v7.wu MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe -verb -nog : Elapsed 78.072 secs CPU 37.861 secs MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe -use_sleep : Elapsed 130.747 secs, speedup: -67.47% ratio: 0.60x CPU 37.113 secs, speedup: 1.98% ratio: 1.02x MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3486.exe -use_sleep : Elapsed 86.725 secs, speedup: -11.08% ratio: 0.90x CPU 40.092 secs, speedup: -5.89% ratio: 0.94x MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -use_sleep : Elapsed 87.227 secs, speedup: -11.73% ratio: 0.90x CPU 39.359 secs, speedup: -3.96% ratio: 0.96x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe -use_sleep : Elapsed 85.652 secs, speedup: -9.71% ratio: 0.91x CPU 41.902 secs, speedup: -10.67% ratio: 0.90x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe -use_sleep : Elapsed 84.263 secs, speedup: -7.93% ratio: 0.93x CPU 39.811 secs, speedup: -5.15% ratio: 0.95x MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe -use_sleep : Elapsed 85.323 secs, speedup: -9.29% ratio: 0.92x CPU 41.886 secs, speedup: -10.63% ratio: 0.90x WU : PG0395_v7.wu MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe -verb -nog : Elapsed 54.637 secs CPU 35.787 secs MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe -use_sleep : Elapsed 65.286 secs, speedup: -19.49% ratio: 0.84x CPU 34.679 secs, speedup: 3.10% ratio: 1.03x MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3486.exe -use_sleep : Elapsed 61.540 secs, speedup: -12.63% ratio: 0.89x CPU 35.038 secs, speedup: 2.09% ratio: 1.02x MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -use_sleep : Elapsed 62.675 secs, speedup: -14.71% ratio: 0.87x CPU 34.913 secs, speedup: 2.44% ratio: 1.03x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe -use_sleep : Elapsed 58.698 secs, speedup: -7.43% ratio: 0.93x CPU 49.218 secs, speedup: -37.53% ratio: 0.73x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe -use_sleep : Elapsed 59.481 secs, speedup: -8.87% ratio: 0.92x CPU 40.966 secs, speedup: -14.47% ratio: 0.87x MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe -use_sleep : Elapsed 59.418 secs, speedup: -8.75% ratio: 0.92x CPU 49.094 secs, speedup: -37.18% ratio: 0.73x WU : PG0444_v7.wu MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe -verb -nog : Elapsed 53.981 secs CPU 35.085 secs MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe -use_sleep : Elapsed 62.600 secs, speedup: -15.97% ratio: 0.86x CPU 34.476 secs, speedup: 1.74% ratio: 1.02x MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3486.exe -use_sleep : Elapsed 61.373 secs, speedup: -13.69% ratio: 0.88x CPU 35.584 secs, speedup: -1.42% ratio: 0.99x MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -use_sleep : Elapsed 61.562 secs, speedup: -14.04% ratio: 0.88x CPU 35.943 secs, speedup: -2.45% ratio: 0.98x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe -use_sleep : Elapsed 57.967 secs, speedup: -7.38% ratio: 0.93x CPU 48.735 secs, speedup: -38.91% ratio: 0.72x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe -use_sleep : Elapsed 58.220 secs, speedup: -7.85% ratio: 0.93x CPU 40.295 secs, speedup: -14.85% ratio: 0.87x MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe -use_sleep : Elapsed 58.184 secs, speedup: -7.79% ratio: 0.93x CPU 48.329 secs, speedup: -37.75% ratio: 0.73x WU : PG1327_v7.wu MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe -verb -nog : Elapsed 62.481 secs CPU 36.145 secs MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe -use_sleep : Elapsed 69.000 secs, speedup: -10.43% ratio: 0.91x CPU 39.624 secs, speedup: -9.63% ratio: 0.91x MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3486.exe -use_sleep : Elapsed 69.452 secs, speedup: -11.16% ratio: 0.90x CPU 38.579 secs, speedup: -6.73% ratio: 0.94x MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe -use_sleep : Elapsed 68.927 secs, speedup: -10.32% ratio: 0.91x CPU 37.846 secs, speedup: -4.71% ratio: 0.96x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe -use_sleep : Elapsed 67.176 secs, speedup: -7.51% ratio: 0.93x CPU 44.101 secs, speedup: -22.01% ratio: 0.82x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe -use_sleep : Elapsed 67.968 secs, speedup: -8.78% ratio: 0.92x CPU 43.836 secs, speedup: -21.28% ratio: 0.82x MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe -use_sleep : Elapsed 67.928 secs, speedup: -8.72% ratio: 0.92x CPU 43.477 secs, speedup: -20.28% ratio: 0.83x
44
« Last post by Mike on 18 Aug 2016, 12:58:35 pm »
In next testing window please try with CPU busy and mandatory -use_sleep option too.
There is one very important difference between your GPU and my C-60 regarding this test. My C-60 one of slowest ATi devices so get low-perf path with sleep enabled by default. Hence I omit it in testing (still going, BTW, on netbook).
And all changes between these builds embraced with if(use_sleep) so to enable sleep is requirement.
EDIT: and as usual these days, seems you need some full-length tasks, not PG set. GPU too fast. For example, all CPU time you see most probably came from startup code, not icfft loop processing.
You said no tuning line at all in your post above. Thats what i did.  No tuning line at all so fully default. CPU fixation to 1GHz reapplied after reboot. Binaries used for this test attached so reader can repeat it on any ATi GPU equipped host. Short instruction what you want would be helpful. Will test with CPU busy and _use_sleep.
45
« Last post by Raistmer on 18 Aug 2016, 09:43:15 am »
In next testing window please try with CPU busy and mandatory -use_sleep option too.
There is one very important difference between your GPU and my C-60 regarding this test. My C-60 one of slowest ATi devices so get low-perf path with sleep enabled by default. Hence I omit it in testing (still going, BTW, on netbook).
And all changes between these builds embraced with if(use_sleep) so to enable sleep is requirement.
EDIT: and as usual these days, seems you need some full-length tasks, not PG set. GPU too fast. For example, all CPU time you see most probably came from startup code, not icfft loop processing.
46
« Last post by Raistmer on 18 Aug 2016, 09:33:57 am »
I don`t think changing high prec timer is a good idea for stock development. Especially for hosts which are used for other things than crunching.
Yep, I will not do this default, at least before quite prolonged testing. It's system-wide change... So -high_prec_timer will not be enabled by default in next release.
47
« Last post by Mike on 18 Aug 2016, 09:33:19 am »
Uppps forgot to attach bench log.
Done.
48
« Last post by Raistmer on 18 Aug 2016, 09:31:14 am »
Now going to the hospital to see my new grand child. Its just 10 hours old.
Congrats, Mike!  I'll look test in details while.
49
« Last post by Mike on 18 Aug 2016, 09:20:04 am »
Hello my name is Mike
Here is a bench of all sleep variants on my R9 380 Default settings system idle.
WU : PG0009_v7.wu MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe -verb -nog : Elapsed 78.072 secs CPU 37.861 secs MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe : Elapsed 105.248 secs, speedup: -34.81% ratio: 0.74x CPU 45.568 secs, speedup: -20.36% ratio: 0.83x MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3486.exe : Elapsed 82.028 secs, speedup: -5.07% ratio: 0.95x CPU 37.081 secs, speedup: 2.06% ratio: 1.02x MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe : Elapsed 81.469 secs, speedup: -4.35% ratio: 0.96x CPU 36.879 secs, speedup: 2.59% ratio: 1.03x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe : Elapsed 80.298 secs, speedup: -2.85% ratio: 0.97x CPU 38.610 secs, speedup: -1.98% ratio: 0.98x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe : Elapsed 80.357 secs, speedup: -2.93% ratio: 0.97x CPU 37.971 secs, speedup: -0.29% ratio: 1.00x MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe : Elapsed 80.881 secs, speedup: -3.60% ratio: 0.97x CPU 38.002 secs, speedup: -0.37% ratio: 1.00x WU : PG0395_v7.wu MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe -verb -nog : Elapsed 54.637 secs CPU 35.787 secs MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe : Elapsed 57.780 secs, speedup: -5.75% ratio: 0.95x CPU 36.208 secs, speedup: -1.18% ratio: 0.99x MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3486.exe : Elapsed 57.933 secs, speedup: -6.03% ratio: 0.94x CPU 36.161 secs, speedup: -1.05% ratio: 0.99x MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe : Elapsed 57.769 secs, speedup: -5.73% ratio: 0.95x CPU 35.459 secs, speedup: 0.92% ratio: 1.01x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe : Elapsed 56.500 secs, speedup: -3.41% ratio: 0.97x CPU 38.251 secs, speedup: -6.89% ratio: 0.94x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe : Elapsed 57.102 secs, speedup: -4.51% ratio: 0.96x CPU 38.064 secs, speedup: -6.36% ratio: 0.94x MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe : Elapsed 56.563 secs, speedup: -3.53% ratio: 0.97x CPU 38.454 secs, speedup: -7.45% ratio: 0.93x WU : PG0444_v7.wu MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe -verb -nog : Elapsed 53.981 secs CPU 35.085 secs MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe : Elapsed 57.213 secs, speedup: -5.99% ratio: 0.94x CPU 36.520 secs, speedup: -4.09% ratio: 0.96x MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3486.exe : Elapsed 56.641 secs, speedup: -4.93% ratio: 0.95x CPU 35.475 secs, speedup: -1.11% ratio: 0.99x MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe : Elapsed 57.382 secs, speedup: -6.30% ratio: 0.94x CPU 35.475 secs, speedup: -1.11% ratio: 0.99x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe : Elapsed 55.102 secs, speedup: -2.08% ratio: 0.98x CPU 38.329 secs, speedup: -9.25% ratio: 0.92x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe : Elapsed 56.778 secs, speedup: -5.18% ratio: 0.95x CPU 37.908 secs, speedup: -8.05% ratio: 0.93x MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe : Elapsed 55.657 secs, speedup: -3.10% ratio: 0.97x CPU 38.033 secs, speedup: -8.40% ratio: 0.92x WU : PG1327_v7.wu MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe -verb -nog : Elapsed 62.481 secs CPU 36.145 secs MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe : Elapsed 66.964 secs, speedup: -7.17% ratio: 0.93x CPU 36.941 secs, speedup: -2.20% ratio: 0.98x MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3486.exe : Elapsed 67.738 secs, speedup: -8.41% ratio: 0.92x CPU 36.941 secs, speedup: -2.20% ratio: 0.98x MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3500.exe : Elapsed 68.462 secs, speedup: -9.57% ratio: 0.91x CPU 36.379 secs, speedup: -0.65% ratio: 0.99x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe : Elapsed 66.963 secs, speedup: -7.17% ratio: 0.93x CPU 42.323 secs, speedup: -17.09% ratio: 0.85x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe : Elapsed 66.555 secs, speedup: -6.52% ratio: 0.94x CPU 42.198 secs, speedup: -16.75% ratio: 0.86x MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe : Elapsed 66.071 secs, speedup: -5.75% ratio: 0.95x CPU 41.917 secs, speedup: -15.97% ratio: 0.86x To me the picture is quite clear. The faster the GPU apps are getting the more CPU it uses. I don`t think changing high prec timer is a good idea for stock development. Especially for hosts which are used for other things than crunching.
Now going to the hospital to see my new grand child. Its just 10 hours old.
50
« Last post by Raistmer on 17 Aug 2016, 11:41:03 am »
For this test host was rebooted to restore default multimedia timer behavior. -high_prec_time option will be added to next bench run. No tuning line at all so fully default. CPU fixation to 1GHz reapplied after reboot. Binaries used for this test attached so reader can repeat it on any ATi or NV GPU FERMI+ equipped host.
And, finally, results from C-60:
CPU busy, no special changes in mm timer (and no sleep at all):
WU : AR075.wu setiathome_8.12_windows_intelx86__opencl_ati5_sah.exe -verb -nog : Elapsed 17457.726 secs CPU 355.885 secs MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe : Elapsed 16464.784 secs, speedup: 5.69% ratio: 1.06x CPU 417.849 secs, speedup: -17.41% ratio: 0.85x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe : Elapsed 16283.808 secs, speedup: 6.72% ratio: 1.07x CPU 413.013 secs, speedup: -16.05% ratio: 0.86x MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe : Elapsed 16677.490 secs, speedup: 4.47% ratio: 1.05x CPU 433.808 secs, speedup: -21.90% ratio: 0.82x WU : AR075_1.wu setiathome_8.12_windows_intelx86__opencl_ati5_sah.exe -verb -nog : Elapsed 16971.441 secs CPU 338.897 secs MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe : Elapsed 16420.791 secs, speedup: 3.24% ratio: 1.03x CPU 434.338 secs, speedup: -28.16% ratio: 0.78x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe : Elapsed 16908.043 secs, speedup: 0.37% ratio: 1.00x CPU 455.117 secs, speedup: -34.29% ratio: 0.74x MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe : Elapsed 16489.931 secs, speedup: 2.84% ratio: 1.03x CPU 437.832 secs, speedup: -29.19% ratio: 0.77x WU : PG1327_v8.wu setiathome_8.12_windows_intelx86__opencl_ati5_sah.exe -verb -nog : Elapsed 880.452 secs CPU 59.764 secs MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe : Elapsed 1050.319 secs, speedup: -19.29% ratio: 0.84x CPU 79.514 secs, speedup: -33.05% ratio: 0.75x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe : Elapsed 1042.097 secs, speedup: -18.36% ratio: 0.84x CPU 78.188 secs, speedup: -30.83% ratio: 0.76x MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe : Elapsed 1046.668 secs, speedup: -18.88% ratio: 0.84x CPU 77.891 secs, speedup: -30.33% ratio: 0.77x WU : PG1327_v8_1.wu setiathome_8.12_windows_intelx86__opencl_ati5_sah.exe -verb -nog : Elapsed 1052.627 secs CPU 70.793 secs MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep0.exe : Elapsed 1049.991 secs, speedup: 0.25% ratio: 1.00x CPU 77.922 secs, speedup: -10.07% ratio: 0.91x MB8_win_x86_SSE2_OpenCL_ATi_HD5_Sleep1.exe : Elapsed 1040.366 secs, speedup: 1.16% ratio: 1.01x CPU 77.376 secs, speedup: -9.30% ratio: 0.91x MB8_win_x86_SSE2_OpenCL_ATi_HD5_SwitchTothread.exe : Elapsed 1040.912 secs, speedup: 1.11% ratio: 1.01x CPU 77.969 secs, speedup: -10.14% ratio: 0.91x
Summary: running on busy system makes results variation too big to discriminate between these sleep versions clearly. But tendency is: current Sleep(1) is adequate approach. There is possibility to use SwitchToThread in other places to extract even more free CPU cycles from GPU app but it can't be replacement for Sleep(1) in bulk sleep areas. This test shows noise level ONLY. Cause differing part was not used at all.
Pages: 1 ... 3 4 [5] 6 7 ... 10
|