Forum > GPU crunching
CPU <-> GPU rebranding
Geek@Play:
Marius.........This morning found 2 of my host computers had run out of CUDA work. CPU's had lot's of work. I am incommunicado with Berkeley due to bandwidth maxed out at Berkeley. I set the config file as follows.
[Settings]
Position=50
OnlyVLarVHar=0
TrueAngleRate=1
DataPath=D:\BOINC\
BoincBinPath=C:\Program Files\BOINC\
[Automatic]
Automatic=0
Interval=4
CPUPerInterval=50
GPUPerInterval=150
After running ReSchedule I observed several work units were aborted by the AutoKill function of V11 CUDA app. Is it possible that in the reassignment of work from CPU's to CUDA that some VLARVHAR work was moved but should not have been?
[edit]
Actually more than several. 20 spread out on 4 computers.
Samuel:
--- Quote from: Marius on 05 Jul 2009, 08:49:45 am ---The only thing that is kind of suspicious to me is that you suspend boinc while running the tool. In the basic configuration boinc leaves the applications in memory confronting those with unexpected changes in the client_state.xml. Advise is to stop boinc completely and not suspend it. But i don't think this will cause the disk-full errors though....
--- End quote ---
But surely your program stops BOINC first thus removing the (suspended) apps from memory. I certainly saw the manager window empty of tasks.
The client's managed to download some new units, I'll try different approaches after Federer and Roddick have finished.
Marius:
--- Quote from: Geek@Play on 05 Jul 2009, 09:59:39 am ---Marius.........This morning found 2 of my host computers had run out of CUDA work. CPU's had lot's of work. I am incommunicado with Berkeley due to bandwidth maxed out at Berkeley. I set the config file as follows.
[Settings]
...
TrueAngleRate=1
...
After running ReSchedule I observed several work units were aborted by the AutoKill function of V11 CUDA app. Is it possible that in the reassignment of work from CPU's to CUDA that some VLARVHAR work was moved but should not have been?
--- End quote ---
Please uncheck the TrueAngleRate checkbox and do another reschedule. I think you will notice an akward rise in the V*AR unts.
With TrueAngleRate checked it is using a different VLAR/VHAR detection method (see reschedule.txt file). Some of those could be aborted by vlarkill. I will remove that detection method in 1.9 as this is causing to much problems with vlarkill
Greetings,
Marius
Marius:
--- Quote from: samuel7 on 05 Jul 2009, 10:15:16 am ---But surely your program stops BOINC first thus removing the (suspended) apps from memory. I certainly saw the manager window empty of tasks.
--- End quote ---
You are absolutely right, my mistake ;)
MarkJ:
--- Quote from: Richard Haselgrove on 04 Jul 2009, 08:18:51 am ---
--- Quote from: MarkJ on 04 Jul 2009, 07:51:49 am ---
I did notice that the rebranded work units immediately went into "running, High Priority" mode. Seems their estimated time was 20+ hours. That estimate is dropping like a stone as it starts crunching them so it will work it out, but they will be finished by the time the TSI has come up (60 mins).
--- End quote ---
If tasks are estimated at 20 hours, but finishing in less than 1 hour, you have a horribly out-of-kilter Duration Correction Factor. With no VLAR handler in place, I would have expected a sawtooth waveform with maybe a factor of 4x between peak and valley, but 20x? ??? No wonder you were seeing preemptions and EDF on boinc_alpha.
I suggest a sanity-check on the FLOPs estimates in your app_info: once the after-effects of your CUDA/VLAR processing have worked their way out of your system, estimates for all SETI work (MB/CPU, MB/CUDA, and AP) should be accurate within a few %.
--- End quote ---
The i7 I referred to in Boinc_alpha (still has about 10 tasks waiting), says the Seti DCF is 3.18. I did calculate the flops for the app_info but its always possible that I got them wrong. :o
There are a bunch of 608's that have completed with 4 hour elapsed times, quite likely because they were V*AR and didn't get changed over before starting. These have probably skewed the DCF value.
I only keep a 1 day cache so its possible to flush it and reset debts/dcf and recalc flops for the app_info. I was kinda hoping the optimized AP 505 would be available then I could do the app_info and check the other values at the same time.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version