+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: Sleeping for less than 1ms in Windows - is it possible and how?  (Read 27576 times)

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
The problem with small kernels and current GPU drivers implementation that looks like they don't use hardware interrupts to inform CPU that GPU is ready, they use polling instead.  And that polling consumes CPU.

There was shouln for Linux that when one use substitute library to make possible to sleep for fraction of ms CPU load for NV OpenCL GPU app drops considerably.
So, looks like we need something that could disable CPU-based polling for fraction of ms under Windows too.
Usual Sleep(1); call will disable worker thread for much longer time that results in big performance drop  (though it will reduce CPU usage as well and could be used if kernels were bigger).

So, any proposals how to make nanosleep under Windows are welcomed.

Few references: http://www.geisswerks.com/ryan/FAQS/timing.html

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: Sleeping for less than 1ms in Windows - is it possible and how?
« Reply #1 on: 30 Oct 2013, 05:36:13 am »
This is all way above my head, but I suspect you may need some lateral thinking here.

Rather than looking for a better sleep, is sleeping the right thing to do in the first place?

Hasn't it been suggested that an asynchronous callback would be better than either an interrupt or a sleep?

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Sleeping for less than 1ms in Windows - is it possible and how?
« Reply #2 on: 30 Oct 2013, 06:00:24 am »
async callback is very limiting in what it can do inside itself.
In general, we do GPU some work, wait when work completes, readback result and give another work.
callback function can't read back result and give another work. So, it should set some flag. But how to check that flag then? Polling? then 100% CPU usage again. Sleeping? Then performance loss if sleep too long.

And being completely asynchronous (i.e. absolutely all processing in callback w/o any synching point) is essentially different programm. Maybe it's good to have such programm indeed but it doesn't help with current one.

Quote
Callbacks must return promptly. The behavior of
calling expensive system routines, OpenCL API calls to create contexts or command-queues, or
blocking OpenCL operations from the following list below, in a callback is undefined.
clFinish,
clWaitForEvents,
blocking calls to clEnqueueReadBuffer, clEnqueueReadBufferRect,
clEnqueueWriteBuffer, clEnqueueWriteBufferRect,
blocking calls to clEnqueueReadImage and clEnqueueWriteImage,
blocking calls to clEnqueueMapBuffer and clEnqueueMapImage,
blocking calls to clBuildProgram, clCompileProgram or clLinkProgram
If an application needs to wait for completion of a routine from the above list in a callback,
please use the non-blocking form of the function, and assign a completion callback to it to do the
remainder of your work.
(OpenCL 1.2 manual, section 5.9)

EDIT: and i would not antagonize callbacks and interrupts. They are just on different levels of hierarchy. Most probably callbacks are implemented via interrupts. AFAIK it's not possible for GPU device to make CPU execute some code "directly". CPU should be notified somehow before. So either CPU asks GPU "should I start?" - polling or GPU "gives kick" to CPU - interrupt. Interrupt handler is low-level async callback actually.
« Last Edit: 30 Oct 2013, 06:31:55 am by Raistmer »

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: Sleeping for less than 1ms in Windows - is it possible and how?
« Reply #3 on: 30 Oct 2013, 08:23:32 am »
The "other side" seem to have a rather longer list: CUPTI Callback API

That's a link to the CUDA 5.5 toolkit profiling API documentation. I believe there may be a CUDA 6.0 toolkit available to developers under NDA, either 'now' or 'real soon now' - haven't heard any details, because of NDA, obviously. But the 5.5 toolkit is the current public version, released 1 August 2013 - and the profiling callback tools are flagged as 'new in this release', so I think callbacks are on the active development pathway.

Of course, that's for CUDA only, and says nothing about whether the tools are exposed via the OpenCL middleware. We got a steer, didn't we, that NVidia was cooling on OpenCL support? Maybe those are questions better directed at the Khronos group and the OpenCL development community.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Sleeping for less than 1ms in Windows - is it possible and how?
« Reply #4 on: 30 Oct 2013, 09:05:43 am »
What we already learn is each OpenCL implementation is different.
What helps Intel doesn't help NV and not needed for AMD. So lets leave Khronos untouched ;)

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: Sleeping for less than 1ms in Windows - is it possible and how?
« Reply #5 on: 30 Oct 2013, 07:55:01 pm »
There's http://stackoverflow.com/questions/85122/sleep-less-than-one-millisecond/11456112#11456112 which has some attempts. I do not know if that works.
                                                    Joe

Offline Urs Echternacht

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 4121
  • ++
_\|/_
U r s


Offline Urs Echternacht

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 4121
  • ++
Re: Sleeping for less than 1ms in Windows - is it possible and how?
« Reply #8 on: 30 Oct 2013, 08:34:03 pm »
Thanks Arkayn. (german does not translate very good into english when done by a machine! In this case "security" should read "computer"!  ;) )

Here is some POSIX source : nanosleep.c

Here is same for apple : nanosleep.c

Additionally there is a function "clock_nanosleep()" which allows to choose between different clocks on Linux. (real time, monotonic, other)
« Last Edit: 30 Oct 2013, 08:42:15 pm by Urs Echternacht »
_\|/_
U r s

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Sleeping for less than 1ms in Windows - is it possible and how?
« Reply #9 on: 31 Oct 2013, 06:14:43 am »
There's http://stackoverflow.com/questions/85122/sleep-less-than-one-millisecond/11456112#11456112 which has some attempts. I do not know if that works.
                                                    Joe
No, Joe, I did some "google" at glance - it will not work.
The aim is not just wait for fraction of millisecond but _sleep_ (i.e., don't load CPU) for that fraction. I'm starting to do it's just impossible for windows to do cause required time less than system quantum ...

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Sleeping for less than 1ms in Windows - is it possible and how?
« Reply #10 on: 31 Oct 2013, 06:17:16 am »
http://www.tutorials.de/c-c/229782-c-sleep-kleiner-als-ms-2.html#post1197886
Oh, I'm not too strong in Deutch now (unfortunately), but code sample looks familiar - similar code was in first references review. Again. Yes, it will wait fraction of ms, but with 100% or so CPU load cause constantly querying performance counter.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Sleeping for less than 1ms in Windows - is it possible and how?
« Reply #12 on: 31 Oct 2013, 06:22:20 am »
Thanks Arkayn. (german does not translate very good into english when done by a machine! In this case "security" should read "computer"!  ;) )

Here is some POSIX source : nanosleep.c

Here is same for apple : nanosleep.c

Additionally there is a function "clock_nanosleep()" which allows to choose between different clocks on Linux. (real time, monotonic, other)

And again, saw same code in first searching trial (my comments in color):

Quote
  want = u64 = request->tv_sec * POW10_3 + request->tv_nsec / POW10_6;
    while (u64 > 0 && rc == 0) {
        if (u64 >= MAX_SLEEP_IN_MS) ms = MAX_SLEEP_IN_MS;
        else ms = (unsigned long) u64;

        u64 -= ms;
        rc = SleepEx(ms, TRUE); //R: Sleep, but in ms scale
    }

    if (rc != 0) { /* WAIT_IO_COMPLETION (192) */
        if (remain != NULL) {
            GetSystemTimeAsFileTime(&_end.ft);
            real = (_end.ns100 - _start.ns100) / POW10_4;

            if (real >= want) u64 = 0;
            else u64 = want - real;

            remain->tv_sec = u64 / POW10_3;
            remain->tv_nsec = (long) (u64 % POW10_3) * POW10_6; //R: just report how many ns to sleep w/o real way to do such sleep
        }

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: Sleeping for less than 1ms in Windows - is it possible and how?
« Reply #13 on: 31 Oct 2013, 06:44:38 am »
And initial suggestion that started this search:

Quote
1) export LD_PRELOAD=libsleep.so

You would not have to reserve any physical or logigal cores for AP.
-- The 100% usage is only for yield() - an idle loop inside NVIDIA openCl driver. Libsleep.so replaces yield() with nanosleep. This gives lower proirity tasks (CPU tasks) an opportunity to run.

Obviously, we can't directly do the same cause can't use nanosleep Windows port to sleep lass than 1ms. Sleep(0) will do same as yield(). But attempting to do "nanosleep(0.1ms)" instead will lead to even more CPU-demanding loop cause prev nanosleep on Windows code will just return immediately with remaining sleep fraction of 0.1ms and loop will reiterate giving no context switch opportunity at all.

Offline Urs Echternacht

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 4121
  • ++
Re: Sleeping for less than 1ms in Windows - is it possible and how?
« Reply #14 on: 31 Oct 2013, 06:02:51 pm »
Read your own reference from first post a little bit more detailed. That person solved the problem of 100% CPU activity by combining QueryPerformanceCounters() with Sleep(0) and some other little tricks.

And, by the way, if it is possible to reduce the "active" wait time to some fraction of a millisecond, wouldn't that reduce the totals CPU and elapsed time, too ?
_\|/_
U r s

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 6
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 168
Total: 168
Powered by EzPortal