AK V8 + CUDA MB team work mod

Forum > GPU crunching

<< < (23/51) > >>

Raistmer:
yes....
But I thought in such "overloaded" condition where many procasses in ready state OS should just use round-robin shceduling (for example) on each priority level and give full quantum for each app. And it will do that except it can't realize that if one core busy with higher priority process on one core it could still execute low priority ones on another cores... It fails to do that.
Priorities were: non-BOINC thread - normal (16 ?), 2 AK_v8 and 2 einstein - 4 worker threads each with priority of 1. CUDA worker thread - priority of 3.
So, CUDA should preempt all other BOINC threads (and it do this usually) but can't fight with non-BOINC thread unless explicit affinity is setted...

Jason G:

--- Quote from: Raistmer on 05 Feb 2009, 10:06:07 am ---yes....
But I thought in such "overloaded" condition where many procasses in ready state OS should just use round-robin shceduling (for example) on each priority level and give full quantum for each app. And it will do that except it can't realize that if one core busy with higher priority process on one core it could still execute low priority ones on another cores... It fails to do that.
Priorities were: non-BOINC thread - normal (16 ?), 2 AK_v8 and 2 einstein - 4 worker threads each with priority of 1. CUDA worker thread - priority of 3.
So, CUDA should preempt all other BOINC threads (and it do this usually) but can't fight with non-BOINC thread unless explicit affinity is setted...

--- End quote ---

Correct, oversubscribed condition means there is not enough to go around no matter what, so it gives 'Even Stevens' to 'try' to be fair and try to minimise the user experience impact (via round robin). It doesn;t magically provide extra slices. But subscribing 5.04 cores worth of full time slices to 4 cores is still 'only' oversubscribed by 25%+, which I don't think would normally be enough to visibly impact the user (except games & RthDribl Etc) enough to notice, but is probably enough to make the cuda feeder miss its window repeatedly? Dunno, that's your department ;) I only know what I see in non-gpu mode running RthDribl, and probably still applies when running the GPU app (maybe even moreso).

Raistmer:
My point is (why I call it "bug") that if I manually constrain resources available for CUDA app (exclude some cores) it works better (with the same load on system, I don't create free time slices that way, right?). What prevent OS to do the same on its own level?...

Jason G:

--- Quote from: Raistmer on 05 Feb 2009, 10:25:42 am ---My point is (why I call it "bug") that if I manually constrain resources available for CUDA app (exclude some cores) it works better (with the same load on system, I don't create free time slices that way, right?). What prevent OS to do the same on its own level?...

--- End quote ---

Ah, you're getting to the crux of the problem there IMO.

Here you have intelligent knowledge of the programs running, so you can say "I want this program to go over onto this other core, where it won't clash with this other program". A CPU scheduler can't do that, it is dumb, and that is Intel's point in that remark that they would like to see OS's get out of trying to manage threads. To make the scheduler too much smarter will just load the system more, so it is probably trying to be too smart already.

As soon as the cores are oversubscribed, Windows will treat the threads all equally, So to it there is no difference where it puts the Cuda App or anything else, because it is full nomatter what. But we know there are differences between the three other kinds of apps, and that in many cases the AK app, and probably the Einstein one will yield, but the cuda one likely relies on switching to kernel mode (Which is slow) and has prioritised interrupt events and callbacks, but is still user mode so get's treated equally. Thankfully AKv8 for example has very little I/O to check, and probably has some times to yield waiting for memory, and almost never makes kernel calls.

Jason

Raistmer:
Agreed :)
Well, attached app little more "smart" in this sense and will leave worker thread on single (first) core. It could give some performance degradation indeed in general case (it can't fill idle window of another core) but it really helped in my situation. And achieved GPU temp is highest (the same was when no non-BOINC app runs).

It's can't be considered as upgrade cause it will help in some cases (when CUDA app appers freezing w/o reason) and can decrease performance (slightly if will do it) in other cases.

ADDON:
And some suggestion: If you experience some delays (PC behave sluggish) when running CUDA app and browsing Inet, playing game or watching video on multicore system, try exclude first CPU (By setting affinity for process in task manager) for non-BOINC app that experience delays (i.e. browser, game, media player). You could get better experience that way. Don't forget to upgrade to attached build of course in this case.

[attachment deleted by admin]

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version