Author Topic: AK V8 + CUDA MB team work mod (Read 140748 times)

Jason G · « **Reply #105 on:** 05 Feb 2009, 09:09:42 am »

Quote from: Raistmer on 05 Feb 2009, 08:58:50 am

Pure example of CPU scheduling bug in Vista IMHO.

Nope, It's not a bug, it's on purpose. If it fills up the first processors first, then the other three can power down and save energy (probably intended for notebooks I suppose). You might want to double check that power saving is disabled in the Bios (&Vista settings?) , and do exactly what you did, assign the single threaded program to the last core. You shouldn't need to place affinity controls on the cuda feeder then, but it would be interesting to know if it then migrates to the last core.

XP Rules!

Raistmer · « **Reply #106 on:** 05 Feb 2009, 09:14:22 am »

Quote from: Jason G on 05 Feb 2009, 09:09:42 am

Quote from: Raistmer on 05 Feb 2009, 08:58:50 am
Pure example of CPU scheduling bug in Vista IMHO.

Nope, It's not a bug, it's on purpose. If it fills up the first processors first, then the other three can power down and save energy. You might want to double check that power saving is disabled in the Bios, and do exactly what you did, assign the single threaded program to the last core. You shouldn't need to place affinity controls on the ciuda feeder then, but it would be interesting to know if it then migrates to the last core.

XP Rules!

Sorry, not the case. Power saving disabled, moreover, you forgot that all CPUs still busy with another 4 CPU-based tasks (2 Ak_v8 2 einstein on moment of observation). So there is absolutely no power saving could be done there.

And change affinity only for non-BOINC app is not enough. After I did that situation remains the same. And only when I exclude that core for CUDA app it begin work as usual.
I repeat this experiment few times so pretty sure in that.

Jason G · « **Reply #107 on:** 05 Feb 2009, 09:24:20 am »

Quote from: Raistmer on 05 Feb 2009, 09:14:22 am

Power saving disabled, moreover, you forgot that all CPUs still busy with another 4 CPU-based tasks (2 Ak_v8 2 einstein on moment of observation). So there is absolutely no power saving could be done there.

LoL fair enough , you didn't say you were running other apps, and the scheduling algorithm will still attempt to oversubscribe the first core before moving on, So it is still the Windows strategy at work, despite all cores running.

Quote

And change affinity only for non-BOINC app is not enough. After I did that situation remains the same. And only when I exclude that core for CUDA app it begin work as usual.
I repeat this experiment few times so pretty sure in that.

Yep, so you are running these apps: Single thread chem model, + 2xAKv8, 2xEinstein & 1xCuda, Which equals full subscription + 1.04 cores. Unfortunately Windows scheduler on any version isn't very good with that. For proof , run RthDribl (GPU test App) without other apps or Boinc Running (Smooth), Then Run Boinc with no Cuda app, and look again at RthDribl (jerky). Boinc alone interacting with Windows CPU Scheduler is oversubscribed already (Using up all time slices).

Beter If oversubscribed, when you have some app that interacts to cause crowded scheduling, might be to reduce Boinc allocation by 1 core. [Hmm... Might be nice if Boinc adjusted this automatically on the fly ....]

Jason

Richard Haselgrove · « **Reply #108 on:** 05 Feb 2009, 09:54:40 am »

Quote from: Jason G on 05 Feb 2009, 09:24:20 am

[Hmm... Might be nice if Boinc adjusted this automatically on the fly ....]

Did you see the trac tickets #841 and #842 that Jord made us write the other day?

Sounds like we need to extend <exclusive_app> to include <exclusive_app_resource_count_n> - "My game takes 2 cpu cores and a video card, please".

Jason G · « **Reply #109 on:** 05 Feb 2009, 10:05:13 am »

I didn't see it no, but it is one thing that makes sense, and I'll take a closer look. I would kindof like to see Boinc detect the oversubscription status, and throttle itself back a core (or more) without intervention (as obviously Windows doesn't quite manage).

Some good discussion on Intel's TBB wiki recongnises the problems:
from Here

Quote

...In the future, we hope to see additional interfaces in operating systems to coordinate threaded applications including those built with TBB. We agree with those who have called for OSes to get out of the business of scheduling threads and focus instead on allocation of processors to applications. It’s an interesting topic to say the least.

Raistmer · « **Reply #110 on:** 05 Feb 2009, 10:06:07 am »

yes....
But I thought in such "overloaded" condition where many procasses in ready state OS should just use round-robin shceduling (for example) on each priority level and give full quantum for each app. And it will do that except it can't realize that if one core busy with higher priority process on one core it could still execute low priority ones on another cores... It fails to do that.
Priorities were: non-BOINC thread - normal (16 ?), 2 AK_v8 and 2 einstein - 4 worker threads each with priority of 1. CUDA worker thread - priority of 3.
So, CUDA should preempt all other BOINC threads (and it do this usually) but can't fight with non-BOINC thread unless explicit affinity is setted...

Jason G · « **Reply #111 on:** 05 Feb 2009, 10:13:50 am »

Quote from: Raistmer on 05 Feb 2009, 10:06:07 am

yes....
But I thought in such "overloaded" condition where many procasses in ready state OS should just use round-robin shceduling (for example) on each priority level and give full quantum for each app. And it will do that except it can't realize that if one core busy with higher priority process on one core it could still execute low priority ones on another cores... It fails to do that.
Priorities were: non-BOINC thread - normal (16 ?), 2 AK_v8 and 2 einstein - 4 worker threads each with priority of 1. CUDA worker thread - priority of 3.
So, CUDA should preempt all other BOINC threads (and it do this usually) but can't fight with non-BOINC thread unless explicit affinity is setted...

Correct, oversubscribed condition means there is not enough to go around no matter what, so it gives 'Even Stevens' to 'try' to be fair and try to minimise the user experience impact (via round robin). It doesn;t magically provide extra slices. But subscribing 5.04 cores worth of full time slices to 4 cores is still 'only' oversubscribed by 25%+, which I don't think would normally be enough to visibly impact the user (except games & RthDribl Etc) enough to notice, but is probably enough to make the cuda feeder miss its window repeatedly? Dunno, that's your department

I only know what I see in non-gpu mode running RthDribl, and probably still applies when running the GPU app (maybe even moreso).

Raistmer · « **Reply #112 on:** 05 Feb 2009, 10:25:42 am »

My point is (why I call it "bug") that if I manually constrain resources available for CUDA app (exclude some cores) it works better (with the same load on system, I don't create free time slices that way, right?). What prevent OS to do the same on its own level?...

Jason G · « **Reply #113 on:** 05 Feb 2009, 10:37:31 am »

Quote from: Raistmer on 05 Feb 2009, 10:25:42 am

My point is (why I call it "bug") that if I manually constrain resources available for CUDA app (exclude some cores) it works better (with the same load on system, I don't create free time slices that way, right?). What prevent OS to do the same on its own level?...

Ah, you're getting to the crux of the problem there IMO.

Here you have intelligent knowledge of the programs running, so you can say "I want this program to go over onto this other core, where it won't clash with this other program". A CPU scheduler can't do that, it is dumb, and that is Intel's point in that remark that they would like to see OS's get out of trying to manage threads. To make the scheduler too much smarter will just load the system more, so it is probably trying to be too smart already.

As soon as the cores are oversubscribed, Windows will treat the threads all equally, So to it there is no difference where it puts the Cuda App or anything else, because it is full nomatter what. But we know there are differences between the three other kinds of apps, and that in many cases the AK app, and probably the Einstein one will yield, but the cuda one likely relies on switching to kernel mode (Which is slow) and has prioritised interrupt events and callbacks, but is still user mode so get's treated equally. Thankfully AKv8 for example has very little I/O to check, and probably has some times to yield waiting for memory, and almost never makes kernel calls.

Jason

Raistmer · « **Reply #114 on:** 05 Feb 2009, 10:50:51 am »

Agreed

Well, attached app little more "smart" in this sense and will leave worker thread on single (first) core. It could give some performance degradation indeed in general case (it can't fill idle window of another core) but it really helped in my situation. And achieved GPU temp is highest (the same was when no non-BOINC app runs).

It's can't be considered as upgrade cause it will help in some cases (when CUDA app appers freezing w/o reason) and can decrease performance (slightly if will do it) in other cases.

ADDON:
And some suggestion: If you experience some delays (PC behave sluggish) when running CUDA app and browsing Inet, playing game or watching video on multicore system, try exclude first CPU (By setting affinity for process in task manager) for non-BOINC app that experience delays (i.e. browser, game, media player). You could get better experience that way. Don't forget to upgrade to attached build of course in this case.

[attachment deleted by admin]

Jason G · « **Reply #115 on:** 05 Feb 2009, 10:59:32 am »

Will see what it does on dual core & 9600GSO OC's 20%, as I've run out of AstroPulse

Raistmer · « **Reply #116 on:** 05 Feb 2009, 11:07:01 am »

Quote from: Jason G on 05 Feb 2009, 10:59:32 am

Will see what it does on dual core & 9600GSO OC's 20%, as I've run out of AstroPulse

Well, I do production run with CUDA "team" combo for few days already (that's why so many updates last 2 days - I start to notice flaws and ways they could be eliminated

) Will see how fast my RAC will climb back and higher.

Richard Haselgrove · « **Reply #117 on:** 05 Feb 2009, 11:11:09 am »

Quote from: Raistmer on 05 Feb 2009, 11:07:01 am

Well, I do production run with CUDA "team" combo for few days already (that's why so many updates last 2 days - I start to notice flaws and ways they could be eliminated ) Will see how fast my RAC will climb back and higher.

Careful - you'll get drummed out of the BOINC Union if you start doing things like that. Noticing flaws, indeed? And fixing them? Unheard of!

Raistmer · « **Reply #118 on:** 05 Feb 2009, 11:12:48 am »

LoL

Well, here my current RAC history (for Quad with 9600GSO)

Jason G · « **Reply #119 on:** 05 Feb 2009, 11:18:54 am »

LoL ... You need less cores, and to run AstroPulse

Let me see if I can remember my photobucket account.

[Now for this new app, I have my AstroPulse section, and Enhanced with this new exe with the ciuda * fftw DLL's right , Anything else needed ? ]

Author Topic: AK V8 + CUDA MB team work mod (Read 140748 times)

Jason G

Re: AK V8 + CUDA MB team work mod

Raistmer

Re: AK V8 + CUDA MB team work mod

Jason G

Re: AK V8 + CUDA MB team work mod

Richard Haselgrove

Re: AK V8 + CUDA MB team work mod

Jason G

Re: AK V8 + CUDA MB team work mod

Raistmer

Re: AK V8 + CUDA MB team work mod

Jason G

Re: AK V8 + CUDA MB team work mod

Raistmer

Re: AK V8 + CUDA MB team work mod

Jason G

Re: AK V8 + CUDA MB team work mod

Raistmer

Re: AK V8 + CUDA MB team work mod

Jason G

Re: AK V8 + CUDA MB team work mod

Raistmer

Re: AK V8 + CUDA MB team work mod

Richard Haselgrove

Re: AK V8 + CUDA MB team work mod

Raistmer

Re: AK V8 + CUDA MB team work mod

Jason G

Re: AK V8 + CUDA MB team work mod