Seti@Home optimized science apps and information

Optimized Seti@Home apps => Windows => GPU crunching => Topic started by: Raistmer on 30 Oct 2013, 04:55:23 am

Title: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Raistmer on 30 Oct 2013, 04:55:23 am
The problem with small kernels and current GPU drivers implementation that looks like they don't use hardware interrupts to inform CPU that GPU is ready, they use polling instead.  And that polling consumes CPU.

There was shouln for Linux that when one use substitute library to make possible to sleep for fraction of ms CPU load for NV OpenCL GPU app drops considerably.
So, looks like we need something that could disable CPU-based polling for fraction of ms under Windows too.
Usual Sleep(1); call will disable worker thread for much longer time that results in big performance drop  (though it will reduce CPU usage as well and could be used if kernels were bigger).

So, any proposals how to make nanosleep under Windows are welcomed.

Few references: http://www.geisswerks.com/ryan/FAQS/timing.html
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Richard Haselgrove on 30 Oct 2013, 05:36:13 am
This is all way above my head, but I suspect you may need some lateral thinking here.

Rather than looking for a better sleep, is sleeping the right thing to do in the first place?

Hasn't it been suggested that an asynchronous callback (http://en.wikipedia.org/wiki/Callback_(computer_programming)) would be better than either an interrupt or a sleep?
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Raistmer on 30 Oct 2013, 06:00:24 am
async callback is very limiting in what it can do inside itself.
In general, we do GPU some work, wait when work completes, readback result and give another work.
callback function can't read back result and give another work. So, it should set some flag. But how to check that flag then? Polling? then 100% CPU usage again. Sleeping? Then performance loss if sleep too long.

And being completely asynchronous (i.e. absolutely all processing in callback w/o any synching point) is essentially different programm. Maybe it's good to have such programm indeed but it doesn't help with current one.

Quote
Callbacks must return promptly. The behavior of
calling expensive system routines, OpenCL API calls to create contexts or command-queues, or
blocking OpenCL operations from the following list below, in a callback is undefined.
clFinish,
clWaitForEvents,
blocking calls to clEnqueueReadBuffer, clEnqueueReadBufferRect,
clEnqueueWriteBuffer, clEnqueueWriteBufferRect,
blocking calls to clEnqueueReadImage and clEnqueueWriteImage,
blocking calls to clEnqueueMapBuffer and clEnqueueMapImage,
blocking calls to clBuildProgram, clCompileProgram or clLinkProgram
If an application needs to wait for completion of a routine from the above list in a callback,
please use the non-blocking form of the function, and assign a completion callback to it to do the
remainder of your work.
(OpenCL 1.2 manual, section 5.9)

EDIT: and i would not antagonize callbacks and interrupts. They are just on different levels of hierarchy. Most probably callbacks are implemented via interrupts. AFAIK it's not possible for GPU device to make CPU execute some code "directly". CPU should be notified somehow before. So either CPU asks GPU "should I start?" - polling or GPU "gives kick" to CPU - interrupt. Interrupt handler is low-level async callback actually.
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Richard Haselgrove on 30 Oct 2013, 08:23:32 am
The "other side" seem to have a rather longer list: CUPTI Callback API (http://docs.nvidia.com/cuda/cupti/r_main.html#r_callback_api)

That's a link to the CUDA 5.5 toolkit profiling API documentation. I believe there may be a CUDA 6.0 toolkit available to developers under NDA, either 'now' or 'real soon now' - haven't heard any details, because of NDA, obviously. But the 5.5 toolkit is the current public version, released 1 August 2013 - and the profiling callback tools are flagged as 'new in this release', so I think callbacks are on the active development pathway.

Of course, that's for CUDA only, and says nothing about whether the tools are exposed via the OpenCL middleware. We got a steer, didn't we, that NVidia was cooling on OpenCL support? Maybe those are questions better directed at the Khronos group and the OpenCL development community.
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Raistmer on 30 Oct 2013, 09:05:43 am
What we already learn is each OpenCL implementation is different.
What helps Intel doesn't help NV and not needed for AMD. So lets leave Khronos untouched ;)
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Josef W. Segur on 30 Oct 2013, 07:55:01 pm
There's http://stackoverflow.com/questions/85122/sleep-less-than-one-millisecond/11456112#11456112 which has some attempts. I do not know if that works.
                                                    Joe
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Urs Echternacht on 30 Oct 2013, 08:11:39 pm
http://www.tutorials.de/c-c/229782-c-sleep-kleiner-als-ms-2.html#post1197886
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: arkayn on 30 Oct 2013, 08:17:38 pm
http://www.tutorials.de/c-c/229782-c-sleep-kleiner-als-ms-2.html#post1197886

And a translated page for Raistmer
http://translate.google.com/translate?sl=auto&tl=ru&js=n&prev=_t&hl=en&ie=UTF-8&u=http%3A%2F%2Fwww.tutorials.de%2Fc-c%2F229782-c-sleep-kleiner-als-ms-2.html%23post1197886&act=url

and Joe
http://translate.google.com/translate?hl=en&sl=de&tl=en&u=http%3A%2F%2Fwww.tutorials.de%2Fc-c%2F229782-c-sleep-kleiner-als-ms-2.html%23post1197886
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Urs Echternacht on 30 Oct 2013, 08:34:03 pm
Thanks Arkayn. (german does not translate very good into english when done by a machine! In this case "security" should read "computer"!  ;) )

Here is some POSIX source : nanosleep.c (http://code.google.com/p/libpthread/source/browse/src/nanosleep.c)

Here is same for apple : nanosleep.c (http://www.opensource.apple.com/source/Libc/Libc-320/gen/nanosleep.c)

Additionally there is a function "clock_nanosleep()" which allows to choose between different clocks on Linux. (real time, monotonic, other)
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Raistmer on 31 Oct 2013, 06:14:43 am
There's http://stackoverflow.com/questions/85122/sleep-less-than-one-millisecond/11456112#11456112 which has some attempts. I do not know if that works.
                                                    Joe
No, Joe, I did some "google" at glance - it will not work.
The aim is not just wait for fraction of millisecond but _sleep_ (i.e., don't load CPU) for that fraction. I'm starting to do it's just impossible for windows to do cause required time less than system quantum ...
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Raistmer on 31 Oct 2013, 06:17:16 am
http://www.tutorials.de/c-c/229782-c-sleep-kleiner-als-ms-2.html#post1197886
Oh, I'm not too strong in Deutch now (unfortunately), but code sample looks familiar - similar code was in first references review. Again. Yes, it will wait fraction of ms, but with 100% or so CPU load cause constantly querying performance counter.
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Raistmer on 31 Oct 2013, 06:18:49 am
http://www.tutorials.de/c-c/229782-c-sleep-kleiner-als-ms-2.html#post1197886

And a translated page for Raistmer
http://translate.google.com/translate?sl=auto&tl=ru&js=n&prev=_t&hl=en&ie=UTF-8&u=http%3A%2F%2Fwww.tutorials.de%2Fc-c%2F229782-c-sleep-kleiner-als-ms-2.html%23post1197886&act=url

and Joe
http://translate.google.com/translate?hl=en&sl=de&tl=en&u=http%3A%2F%2Fwww.tutorials.de%2Fc-c%2F229782-c-sleep-kleiner-als-ms-2.html%23post1197886

thanks, but conclusion is no good:
"
Forget it under Windows
You can only access to the timer, the OS makes available the you. On Windows, the minimum 1ms
"

EDIT: LoL, they recall QNX... yeah, nice OS, but we need same on Windows....
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Raistmer on 31 Oct 2013, 06:22:20 am
Thanks Arkayn. (german does not translate very good into english when done by a machine! In this case "security" should read "computer"!  ;) )

Here is some POSIX source : nanosleep.c (http://code.google.com/p/libpthread/source/browse/src/nanosleep.c)

Here is same for apple : nanosleep.c (http://www.opensource.apple.com/source/Libc/Libc-320/gen/nanosleep.c)

Additionally there is a function "clock_nanosleep()" which allows to choose between different clocks on Linux. (real time, monotonic, other)

And again, saw same code in first searching trial (my comments in color):

Quote
  want = u64 = request->tv_sec * POW10_3 + request->tv_nsec / POW10_6;
    while (u64 > 0 && rc == 0) {
        if (u64 >= MAX_SLEEP_IN_MS) ms = MAX_SLEEP_IN_MS;
        else ms = (unsigned long) u64;

        u64 -= ms;
        rc = SleepEx(ms, TRUE); //R: Sleep, but in ms scale
    }

    if (rc != 0) { /* WAIT_IO_COMPLETION (192) */
        if (remain != NULL) {
            GetSystemTimeAsFileTime(&_end.ft);
            real = (_end.ns100 - _start.ns100) / POW10_4;

            if (real >= want) u64 = 0;
            else u64 = want - real;

            remain->tv_sec = u64 / POW10_3;
            remain->tv_nsec = (long) (u64 % POW10_3) * POW10_6; //R: just report how many ns to sleep w/o real way to do such sleep
        }
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Raistmer on 31 Oct 2013, 06:44:38 am
And initial suggestion that started this search:

Quote
1) export LD_PRELOAD=libsleep.so

You would not have to reserve any physical or logigal cores for AP.
-- The 100% usage is only for yield() - an idle loop inside NVIDIA openCl driver. Libsleep.so replaces yield() with nanosleep. This gives lower proirity tasks (CPU tasks) an opportunity to run.

Obviously, we can't directly do the same cause can't use nanosleep Windows port to sleep lass than 1ms. Sleep(0) will do same as yield(). But attempting to do "nanosleep(0.1ms)" instead will lead to even more CPU-demanding loop cause prev nanosleep on Windows code will just return immediately with remaining sleep fraction of 0.1ms and loop will reiterate giving no context switch opportunity at all.
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Urs Echternacht on 31 Oct 2013, 06:02:51 pm
Read your own reference from first post a little bit more detailed. That person solved the problem of 100% CPU activity by combining QueryPerformanceCounters() with Sleep(0) and some other little tricks.

And, by the way, if it is possible to reduce the "active" wait time to some fraction of a millisecond, wouldn't that reduce the totals CPU and elapsed time, too ?
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Raistmer on 09 Nov 2013, 07:18:18 am
Though initial goal looks unreachable I would like to collect here some useful info about windows time slices management.

This utility: http://technet.microsoft.com/en-us/sysinternals/bb897568
gives this result for my dev netbook:

C:\bin>Clockres.exe


ClockRes v2.0 - View the system clock resolution
Copyright (C) 2009 Mark Russinovich
SysInternals - www.sysinternals.com

Maximum timer interval: 15.600 ms
Minimum timer interval: 0.500 ms
Current timer interval: 1.000 ms

It could explain why I did not see any changes when used timeBeginPeriod(1); inside app.
What about yours PCs ? What value they use ?

WinXP AthlonXP, no GPGPU-enabled GPU inside:
Maximum timer interval: 15.625 ms
Minimum timer interval: 1.000 ms
Current timer interval: 15.625 ms
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Raistmer on 09 Nov 2013, 07:55:44 am
Read your own reference from first post a little bit more detailed. That person solved the problem of 100% CPU activity by combining QueryPerformanceCounters() with Sleep(0) and some other little tricks.

And, by the way, if it is possible to reduce the "active" wait time to some fraction of a millisecond, wouldn't that reduce the totals CPU and elapsed time, too ?

Sleep(0) in high-priority thread (and our worker GPU thread have higher priority than worker thread in CPU app) should "return" to higher priority thread, not to idle-priority CPU thread. So, with Sleep(0) we "make call" to OS scheduler, scheduler looks at current situation, sees GPU thread non-blocked, sees CPU thread non-blocked too but with idle priority and decides to run GPU thread again. Maybe I'm wrong though about GPU thread state that will be after calling Sleep(0).
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Raistmer on 09 Nov 2013, 09:01:07 am
ooops... more to CPU BOINC apps but should be known:

Quote
Note that the threads part of a process running in the idle process priority class always receive a
single thread quantum (2 clock ticks), ignoring any sort of quantum configuration settings, whether
set by default or set through the registry.
(Windows Internals, 6-th edition, part 1, p. 428)

This means that running ALL BOINC CPU applications as idle-priority class processes we deliberately degrade their performace! Even on dedicated crunchers OS will have to make scheduling decisions much more often than it could do for non-idle priority classes, especially on server OS.
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: Josef W. Segur on 10 Nov 2013, 12:16:05 am
Though initial goal looks unreachable I would like to collect here some useful info about windows time slices management.

This utility: http://technet.microsoft.com/en-us/sysinternals/bb897568
...

Windows 7 SP1 x64, AMD A10-4600M quad core system:

ClockRes v2.0 - View the system clock resolution
Copyright (C) 2009 Mark Russinovich
SysInternals - www.sysinternals.com

Maximum timer interval: 15.600 ms
Minimum timer interval: 0.500 ms
Current timer interval: 15.600 ms



Windows 2000 SP4, Pentium-M single core system:

ClockRes v2.0 - View the system clock resolution
Copyright (C) 2009 Mark Russinovich
SysInternals - www.sysinternals.com

Maximum timer interval: 10.014 ms
Minimum timer interval: 1.003 ms
Current timer interval: 10.014 ms

                                                   Joe
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: mr.mac52 on 10 Nov 2013, 11:06:00 am
Windows 7 SP1 x64 Core 2 Quad Q9300 @2.5GHz

ClockRes v2.0 - View the system clock resolution
Copyright (C) 2009 Mark Russinovich
SysInternals - www.sysinternals.com

Maximum timer interval: 15.600 ms
Minimum timer interval: 0.500 ms
Current timer interval: 1.000 ms

Windows 7 SP1 x64 Core i7-4770K @3.5GHz

ClockRes v2.0 - View the system clock resolution
Copyright (C) 2009 Mark Russinovich
SysInternals - www.sysinternals.com

Maximum timer interval: 15.600 ms
Minimum timer interval: 0.500 ms
Current timer interval: 1.000 ms
Title: Re: Sleeping for less than 1ms in Windows - is it possible and how?
Post by: arkayn on 10 Nov 2013, 12:10:21 pm
FX-4100 Windows 7 SP1 x64

ClockRes v2.0 - View the system clock resolution
Copyright (C) 2009 Mark Russinovich
SysInternals - www.sysinternals.com

Maximum timer interval: 15.600 ms
Minimum timer interval: 0.500 ms
Current timer interval: 1.000 ms