Sleeping for less than 1ms in Windows - is it possible and how?

Forum > GPU crunching

(1/5) > >>

Raistmer:
The problem with small kernels and current GPU drivers implementation that looks like they don't use hardware interrupts to inform CPU that GPU is ready, they use polling instead. And that polling consumes CPU.

There was shouln for Linux that when one use substitute library to make possible to sleep for fraction of ms CPU load for NV OpenCL GPU app drops considerably.
So, looks like we need something that could disable CPU-based polling for fraction of ms under Windows too.
Usual Sleep(1); call will disable worker thread for much longer time that results in big performance drop (though it will reduce CPU usage as well and could be used if kernels were bigger).

So, any proposals how to make nanosleep under Windows are welcomed.

Few references: http://www.geisswerks.com/ryan/FAQS/timing.html

Richard Haselgrove:
This is all way above my head, but I suspect you may need some lateral thinking here.

Rather than looking for a better sleep, is sleeping the right thing to do in the first place?

Hasn't it been suggested that an asynchronous callback would be better than either an interrupt or a sleep?

Raistmer:
async callback is very limiting in what it can do inside itself.
In general, we do GPU some work, wait when work completes, readback result and give another work.
callback function can't read back result and give another work. So, it should set some flag. But how to check that flag then? Polling? then 100% CPU usage again. Sleeping? Then performance loss if sleep too long.

And being completely asynchronous (i.e. absolutely all processing in callback w/o any synching point) is essentially different programm. Maybe it's good to have such programm indeed but it doesn't help with current one.

--- Quote ---Callbacks must return promptly. The behavior of
calling expensive system routines, OpenCL API calls to create contexts or command-queues, or
blocking OpenCL operations from the following list below, in a callback is undefined.
clFinish,
clWaitForEvents,
blocking calls to clEnqueueReadBuffer, clEnqueueReadBufferRect,
clEnqueueWriteBuffer, clEnqueueWriteBufferRect,
blocking calls to clEnqueueReadImage and clEnqueueWriteImage,
blocking calls to clEnqueueMapBuffer and clEnqueueMapImage,
blocking calls to clBuildProgram, clCompileProgram or clLinkProgram
If an application needs to wait for completion of a routine from the above list in a callback,
please use the non-blocking form of the function, and assign a completion callback to it to do the
remainder of your work.
--- End quote ---
(OpenCL 1.2 manual, section 5.9)

EDIT: and i would not antagonize callbacks and interrupts. They are just on different levels of hierarchy. Most probably callbacks are implemented via interrupts. AFAIK it's not possible for GPU device to make CPU execute some code "directly". CPU should be notified somehow before. So either CPU asks GPU "should I start?" - polling or GPU "gives kick" to CPU - interrupt. Interrupt handler is low-level async callback actually.

Richard Haselgrove:
The "other side" seem to have a rather longer list: CUPTI Callback API

That's a link to the CUDA 5.5 toolkit profiling API documentation. I believe there may be a CUDA 6.0 toolkit available to developers under NDA, either 'now' or 'real soon now' - haven't heard any details, because of NDA, obviously. But the 5.5 toolkit is the current public version, released 1 August 2013 - and the profiling callback tools are flagged as 'new in this release', so I think callbacks are on the active development pathway.

Of course, that's for CUDA only, and says nothing about whether the tools are exposed via the OpenCL middleware. We got a steer, didn't we, that NVidia was cooling on OpenCL support? Maybe those are questions better directed at the Khronos group and the OpenCL development community.

Raistmer:
What we already learn is each OpenCL implementation is different.
What helps Intel doesn't help NV and not needed for AMD. So lets leave Khronos untouched ;)

Navigation

[0] Message Index

[#] Next page

Go to full version