Author Topic: CPU <-> GPU rebranding (Read 280329 times)

Geek@Play · « **Reply #150 on:** 05 Jul 2009, 09:59:39 am »

Marius.........This morning found 2 of my host computers had run out of CUDA work. CPU's had lot's of work. I am incommunicado with Berkeley due to bandwidth maxed out at Berkeley. I set the config file as follows.

[Settings]
Position=50
OnlyVLarVHar=0
TrueAngleRate=1
DataPath=D:\BOINC\
BoincBinPath=C:\Program Files\BOINC\

[Automatic]
Automatic=0
Interval=4
CPUPerInterval=50
GPUPerInterval=150

After running ReSchedule I observed several work units were aborted by the AutoKill function of V11 CUDA app. Is it possible that in the reassignment of work from CPU's to CUDA that some VLARVHAR work was moved but should not have been?

[edit]
Actually more than several. 20 spread out on 4 computers.

Samuel · « **Reply #151 on:** 05 Jul 2009, 10:15:16 am »

Quote from: Marius on 05 Jul 2009, 08:49:45 am

The only thing that is kind of suspicious to me is that you suspend boinc while running the tool. In the basic configuration boinc leaves the applications in memory confronting those with unexpected changes in the client_state.xml. Advise is to stop boinc completely and not suspend it. But i don't think this will cause the disk-full errors though....

But surely your program stops BOINC first thus removing the (suspended) apps from memory. I certainly saw the manager window empty of tasks.

The client's managed to download some new units, I'll try different approaches after Federer and Roddick have finished.

Marius · « **Reply #152 on:** 05 Jul 2009, 10:17:17 am »

Quote from: Geek@Play on 05 Jul 2009, 09:59:39 am

Marius.........This morning found 2 of my host computers had run out of CUDA work. CPU's had lot's of work. I am incommunicado with Berkeley due to bandwidth maxed out at Berkeley. I set the config file as follows.

[Settings]
...
TrueAngleRate=1
...

After running ReSchedule I observed several work units were aborted by the AutoKill function of V11 CUDA app. Is it possible that in the reassignment of work from CPU's to CUDA that some VLARVHAR work was moved but should not have been?

Please uncheck the TrueAngleRate checkbox and do another reschedule. I think you will notice an akward rise in the V*AR unts.

With TrueAngleRate checked it is using a different VLAR/VHAR detection method (see reschedule.txt file). Some of those could be aborted by vlarkill. I will remove that detection method in 1.9 as this is causing to much problems with vlarkill

Greetings,
Marius

Marius · « **Reply #153 on:** 05 Jul 2009, 10:18:32 am »

Quote from: samuel7 on 05 Jul 2009, 10:15:16 am

But surely your program stops BOINC first thus removing the (suspended) apps from memory. I certainly saw the manager window empty of tasks.

You are absolutely right, my mistake

MarkJ · « **Reply #154 on:** 05 Jul 2009, 10:25:51 am »

Quote from: Richard Haselgrove on 04 Jul 2009, 08:18:51 am

Quote from: MarkJ on 04 Jul 2009, 07:51:49 am

I did notice that the rebranded work units immediately went into "running, High Priority" mode. Seems their estimated time was 20+ hours. That estimate is dropping like a stone as it starts crunching them so it will work it out, but they will be finished by the time the TSI has come up (60 mins).

If tasks are estimated at 20 hours, but finishing in less than 1 hour, you have a horribly out-of-kilter Duration Correction Factor. With no VLAR handler in place, I would have expected a sawtooth waveform with maybe a factor of 4x between peak and valley, but 20x? No wonder you were seeing preemptions and EDF on boinc_alpha.

I suggest a sanity-check on the FLOPs estimates in your app_info: once the after-effects of your CUDA/VLAR processing have worked their way out of your system, estimates for all SETI work (MB/CPU, MB/CUDA, and AP) should be accurate within a few %.

The i7 I referred to in Boinc_alpha (still has about 10 tasks waiting), says the Seti DCF is 3.18. I did calculate the flops for the app_info but its always possible that I got them wrong.

There are a bunch of 608's that have completed with 4 hour elapsed times, quite likely because they were V*AR and didn't get changed over before starting. These have probably skewed the DCF value.

I only keep a 1 day cache so its possible to flush it and reset debts/dcf and recalc flops for the app_info. I was kinda hoping the optimized AP 505 would be available then I could do the app_info and check the other values at the same time.

Geek@Play · « **Reply #155 on:** 05 Jul 2009, 10:39:27 am »

This config...........

[Settings]
Position=50
OnlyVLarVHar=0
TrueAngleRate=0
DataPath=D:\BOINC\
BoincBinPath=C:\Program Files\BOINC\

[Automatic]
Automatic=0
Interval=4
CPUPerInterval=50
GPUPerInterval=150

Resulted with this.........

---------------------------
Reschedule version 1.8
Time: 05-07-2009 09:22:59
User forcing a reschedule

Stopping BOINC service
BOINC service is stopped
CPU tasks: 255 (255 VLAR/VHAR)
GPU tasks: 230 (215 VLAR/VHAR)
Boinc applications
setiathome_enhanced 603 windows_intelx86
setiathome_enhanced 608 windows_intelx86 cuda
After reschedule:
CPU tasks: 470 (470 VLAR/VHAR)
GPU tasks: 15 (0 VLAR/VHAR)
Starting BOINC service
BOINC service started

and I'm trying to get a 50-50 split on the work units.
All this is run from a batch file with this commmand.........

ReSchedule.exe /Autorun 50

Marius · « **Reply #156 on:** 05 Jul 2009, 10:54:21 am »

Quote from: Geek@Play on 05 Jul 2009, 10:39:27 am

CPU tasks: 470 (470 VLAR/VHAR)
GPU tasks: 15 (0 VLAR/VHAR)
Starting BOINC service
BOINC service started

and I'm trying to get a 50-50 split on the work units.
All this is run from a batch file with this commmand.........

Yes, didn't i tell you you would see an akward rise in V*AR

. V*AR are always forced on the cpu no matter what happends to avoid vlarkill, then as a bonus it tries to confirm to the 50% rule but it cannot do this unfortunately.

Compare your's with mine. i have over 1k of f<beep> V*AR! (9 day's queue)
CPU tasks: 1035 (1035 VLAR/VHAR)
GPU tasks: 942 (0 VLAR/VHAR)
The numbers of V*ar are so bad lately i just reschedule with 100% to gpu.

Geek@Play · « **Reply #157 on:** 05 Jul 2009, 11:03:00 am »

OK.....thanks for the explanation.

Marius · « **Reply #158 on:** 05 Jul 2009, 11:15:18 am »

Quote from: Geek@Play on 05 Jul 2009, 11:03:00 am

OK.....thanks for the explanation.

If you reschedule 100% to GPU does that not leave the CPU's cold??

Yes, normally it would, but with the current overflow of VLAR/VHAR units i push every unit to cuda that becomes available (and isn't v*ar), this keeps my cuda going for the moment.

Samuel · « **Reply #159 on:** 05 Jul 2009, 11:29:36 am »

Quote from: Marius on 05 Jul 2009, 10:54:21 am

Yes, didn't i tell you you would see an akward rise in V*AR . V*AR are always forced on the cpu no matter what happends to avoid vlarkill, then as a bonus it tries to confirm to the 50% rule but it cannot do this unfortunately.

Suggestions for future versions:
Perhaps only VLAR should be forced to CPU. Allow VHAR to go either way, default to CPU however. Maybe the test function could show the task counts by AR group (VLAR/VHAR/other) before and after with current settings.

Thanks again for your work.

BANZAI56 · « **Reply #160 on:** 05 Jul 2009, 11:33:51 am »

Am I the only one having trouble downloading this?

Can't get 1.7 or 1.8..says IE can't d/l from site...

Any other link to it available?

Marius · « **Reply #161 on:** 05 Jul 2009, 11:48:41 am »

Quote from: samuel7 on 05 Jul 2009, 11:29:36 am

Suggestions for future versions:
Perhaps only VLAR should be forced to CPU. Allow VHAR to go either way, default to CPU however. Maybe the test function could show the task counts by AR group (VLAR/VHAR/other) before and after with current settings.

Internally i know exactly those groups, but if you are using Raistmer's MB_6.08_mod_CUDA_V11_VLARKill_refined.exe al vhar's will be automaticly killed AFAIK (true angle rate > 1.127), so i assume you are running a different tool what can handle a VHAR?

Or i'm i wrong here? (i use Raistmer's tool for several years now and don't even know what the original tools are any more)

Richard Haselgrove · « **Reply #162 on:** 05 Jul 2009, 12:00:54 pm »

There's no actual need to rebrand VHAR. They run adequately on the GPU, and they don't cause any screen stuttering or other computer-use problems. Raistmer has just observed a slight efficiency gain (i.e. optimisation) by using the CPU instead of the GPU for VHAR, but it's not enough to lose any sleep over. In any case, much of the efficiency gain is lost on Quads and above when there are lots of VHAR about, because they suffer from memory bus contention.

VLAR are different entirely. Their efficiency on GPU is abysmal, and the general screen delay makes the rest of the computer unusable - it's driven away many potential crunchers. Anything you can do to keep them off the GPU is a big step forward - but I prefer rebranding to killing.

Samuel · « **Reply #163 on:** 05 Jul 2009, 12:03:54 pm »

Quote from: Marius on 05 Jul 2009, 11:48:41 am

Quote from: samuel7 on 05 Jul 2009, 11:29:36 am
Suggestions for future versions:
Perhaps only VLAR should be forced to CPU. Allow VHAR to go either way, default to CPU however. Maybe the test function could show the task counts by AR group (VLAR/VHAR/other) before and after with current settings.
Internally i know exactly those groups, but if you are using Raistmer's MB_6.08_mod_CUDA_V11_VLARKill_refined.exe al vhar's will be automaticly killed AFAIK (true angle rate > 1.127), so i assume you are running a different tool what can handle a VHAR?

Or i'm i wrong here? (i use Raistmer's tool for several years now and don't even know what the original tools are any more)

No no no, VHAR runs fine with all Raistmer's apps. Only VLARs are killed (by apps so named). Discussion here

Edit - Richard beat me to it

Marius · « **Reply #164 on:** 05 Jul 2009, 12:18:06 pm »

Quote from: Richard Haselgrove on 05 Jul 2009, 12:00:54 pm

There's no actual need to rebrand VHAR. They run adequately on the GPU, and they don't cause any screen stuttering or other computer-use problems. Raistmer has just observed a slight efficiency gain (i.e. optimisation) by using the CPU instead of the GPU for VHAR, but it's not enough to lose any sleep over. In any case, much of the efficiency gain is lost on Quads and above when there are lots of VHAR about, because they suffer from memory bus contention.

VLAR are different entirely. Their efficiency on GPU is abysmal, and the general screen delay makes the rest of the computer unusable - it's driven away many potential crunchers. Anything you can do to keep them off the GPU is a big step forward - but I prefer rebranding to killing.

Thanks for clearing this up, this could/will save lots of rebranding compared to the current version. And thank you samuel7 for reminding me about it! (so logical when you hear it afterwards

)

Author Topic: CPU <-> GPU rebranding (Read 280329 times)

Geek@Play

Re: CPU <-> GPU rebranding perl script

Samuel

Re: CPU <-> GPU rebranding perl script

Marius

Re: CPU <-> GPU rebranding perl script

Marius

Re: CPU <-> GPU rebranding perl script

MarkJ

Re: CPU <-> GPU rebranding perl script

Geek@Play

Re: CPU <-> GPU rebranding perl script

Marius

Re: CPU <-> GPU rebranding perl script

Geek@Play

Re: CPU <-> GPU rebranding perl script

Marius

Re: CPU <-> GPU rebranding perl script

Samuel

Re: CPU <-> GPU rebranding perl script

BANZAI56

Re: CPU <-> GPU rebranding perl script

Marius

Re: CPU <-> GPU rebranding perl script

Richard Haselgrove

Re: CPU <-> GPU rebranding perl script

Samuel

Re: CPU <-> GPU rebranding perl script

Marius

Re: CPU <-> GPU rebranding perl script