Seti@Home optimized science apps and information

Optimized Seti@Home apps => Windows => GPU crunching => Topic started by: Yellow_Horror on 24 Feb 2009, 04:50:31 am

Title: It works!
Post by: Yellow_Horror on 24 Feb 2009, 04:50:31 am
I delete my cc_config.xml and feed the BOINC v.6.6.10 with such app_info.xml:

Quote
<app_info>

<app>
    <name>astropulse</name>
</app>
<file_info>
    <name>ap_5.00r103_SSE3.exe</name>
    <executable/>
</file_info>
<app_version>
    <app_name>astropulse</app_name>
    <version_num>500</version_num>
    <file_ref>
        <file_name>ap_5.00r103_SSE3.exe</file_name>
        <main_program/>
    </file_ref>
</app_version>

<app>
    <name>astropulse_v5</name>
</app>
<file_info>
    <name>ap_5.03r112_SSE3.exe</name>
    <executable/>
</file_info>
<app_version>
    <app_name>astropulse_v5</app_name>
    <version_num>503</version_num>
    <file_ref>
          <file_name>ap_5.03r112_SSE3.exe</file_name>
        <main_program/>
    </file_ref>
</app_version>

<app>
    <name>setiathome_enhanced</name>
</app>
<file_info>
    <name>MB_6.08_mod_CUDA_V9.exe</name>
    <executable/>
</file_info>
<file_info>
    <name>cudart.dll</name>
    <executable/>
</file_info>
<file_info>
    <name>cufft.dll</name>
    <executable/>
</file_info>
<file_info>
    <name>libfftw3f-3-1-1a_upx.dll</name>
    <executable/>
</file_info>
<file_info>
    <name>AK_v8_win_SSE3.exe</name>
    <executable/>
</file_info>

<app_version>
    <app_name>setiathome_enhanced</app_name>
    <version_num>603</version_num>
    <file_ref>
        <file_name>AK_v8_win_SSE3.exe</file_name>
        <main_program/>
    </file_ref>
</app_version>

<app_version>
    <app_name>setiathome_enhanced</app_name>
    <version_num>608</version_num>
    <plan_class>cuda</plan_class>
    <avg_ncpus>0.040000</avg_ncpus>
    <max_ncpus>0.040000</max_ncpus>
    <coproc>
        <type>CUDA</type>
   <count>1</count>
    </coproc>
    <file_ref>
   <file_name>MB_6.08_mod_CUDA_V9.exe</file_name>
   <main_program/>
    </file_ref>
    <file_ref>
   <file_name>cudart.dll</file_name>
    </file_ref>
    <file_ref>
   <file_name>cufft.dll</file_name>
    </file_ref>
    <file_ref>
   <file_name>libfftw3f-3-1-1a_upx.dll</file_name>
    </file_ref>
</app_version>

</app_info>

Now it crunch 2 Seti MB 6.03 on the CPU and one Seti MB 6.08 on CUDA. I even can use "Use GPU while computer is in use" switch! This is a workaround i needed for all time since CUDA version of Seti first appears (because i don't like video lags while i work or play games). The only trouble remains is to suspend CUDA while i watch DVD video, so BOINC detect no "user activity".

Any suggestions about how to fix this issue (and any other suggestions) are welcome.

P.S. Still hope to see non-laggy CUDA app in the future, even if it will be not so fast.
Title: Re: It works!
Post by: Jason G on 24 Feb 2009, 04:58:57 am
Now it crunch 2 Seti MB 6.03 on the CPU and one Seti MB 6.08 on CUDA. I even can use "Use GPU while computer is in use" switch! This is a workaround i needed for all time since CUDA version of Seti first appears (because i don't like video lags while i work or play games). The only trouble remains is to suspend CUDA while i watch DVD video, so BOINC detect no "user activity".

Any suggestions about how to fix this issue (and any other suggestions) are welcome.

P.S. Still hope to see non-laggy CUDA app in the future, even if it will be not so fast.

Hi Yellow_Horror,
   Can you confirm that it now, when downloads new work, isn't constrained to marking the highest version only with optimised app info you show?  Will give it a try shortly to see for myself,  but definite progress.

My suggestion for lagging during video, would be to add your media player programs, and games, to the exclusive apps entries in cc_config, though that would of course stop processing on CPUs too.

Jason
Title: Re: It works!
Post by: Yellow_Horror on 24 Feb 2009, 05:29:31 am
   Can you confirm that it now, when downloads new work, isn't constrained to marking the highest version only with optimised app info you show?  Will give it a try shortly to see for myself,  but definite progress.

Sorry, can't check it now because now i have lot of WUs downloaded by stock apps when i first run new BOINC. Maybe few hours later i can answer the question.

Thank you for your suggestion also, but i never wish to suspend Seti CPU applications. They always are user-gentle since i enter the project at june 2003.
Title: Re: It works!
Post by: Jason G on 24 Feb 2009, 05:32:18 am
Yeah, tough new growing pains! ... Now I have it set up similar to yours, but with no multibeams in cache, so I'll find out when I can get some new cuda work, exactly what it does.

Jason

[Edit:]
Just checked the Boinc logs, and no mentions of anonymous platform behaviour changes, so my suspicion is the operation will be the same as 6.6.9 and earlier, which had stock separate CPU & GPU work fetches working properly, but when switching to app_info will mark all [newly fetched] work as 6.08  (but I'll wait and see)

The one major change that IS helpful was applied in 6.6.9 .  When tasks marked with an older revision run out, it no longer deletes the old app, if it is marked with a different plan_class .  That is good, because for my own use it enabled me to manually give my cpu app a new application name in app_info,  then modify client_state to allocate chosen tasks to CPU to work on simualtaneously with cuda ones.  Works very well for me, but not for the beginner or faint of heart!  :)
Title: Re: It works!
Post by: Yellow_Horror on 24 Feb 2009, 06:36:30 am
I notice that in my new config the CUDA app freeze sometimes at the beginning of a WU. The CPU time stops at 3 second and the GPU is totally idle. Suspending the freezed task don't help - no new CUDA task starts. Stop and restart BOINC help - until next freeze. I don't have such trouble with "V8a team" mod on BOINC 6.4.5 but remember something like with the early stock CUDA apps (but IIRC suspending a task helps with stock).

What may be the cause?
Title: Re: It works!
Post by: Jason G on 24 Feb 2009, 06:46:25 am
What may be the cause?

Not sure, but I've seen some kindof doscussion about DLL's, maybe that's it.  I'm not sure which ones I'm running anymore, but as I updated to v9 recently, I may get the same issue.  Will see.

Jason
Title: Re: It works!
Post by: Yellow_Horror on 24 Feb 2009, 08:18:24 am
When tasks marked with an older revision run out, it no longer deletes the old app, if it is marked with a different plan_class .  That is good, because for my own use it enabled me to manually give my cpu app a new application name in app_info,  then modify client_state to allocate chosen tasks to CPU to work on simualtaneously with cuda ones.  Works very well for me, but not for the beginner or faint of heart!  :)

It isn't an option for me, because while i like an idea to share my free CPU/GPU cycles to SETI science, i really hate an idea to constantly do the same with my own free time. I already spend some time to play with CUDA versions and options because it was an interesting innovation, but i grade to get tired from it. The SETI CUDA application was not beta-tested enough from the beginning (maybe nVidia pushed public release before the New Year due to advertising interests?), it is still wery user-annoying when running, and the BOINC user options to manage CUDA work are still (after 2 months!) deficient even in development versions... Where are the "Run And Forget" spirit of the old good distributed computing projects like original SETI@home?
Title: Re: It works!
Post by: Jason G on 24 Feb 2009, 08:24:16 am
It isn't an option for me, because while i like an idea to share my free CPU/GPU cycles to SETI science, i really hate an idea to constantly do the same with my own free time. I already spend some time to play with CUDA versions and options because it was an interesting innovation, but i grade to get tired from it. The SETI CUDA application was not beta-tested enough from the beginning (maybe nVidia pushed public release before the New Year due to advertising interests?), it is still wery user-annoying when running, and the BOINC user options to manage CUDA work are still (after 2 months!) deficient even in development versions... Where are the "Run And Forget" spirit of the old good distributed computing projects like original SETI@home?

Oh all agreed! Having to mess with workarounds and settings and special configs is annoying and error prone.  Hopefully the gradual movement in Boinc code will fix the scheduling issues so that anonymous platform will work 'out of the box' . There seems to be downsides to any particular workaround  approach at the moment, which I hope will be resolved.  We shall see.

Jason
Title: Re: It works!
Post by: Jason G on 24 Feb 2009, 08:31:33 am
Any Ideas what's stopping this from asking for Cuda work ?

Quote
2/24/2009 11:55:25 PM      [wfd] ------- start work fetch state -------
2/24/2009 11:55:25 PM      [wfd] CPU: shortfall 0.00 nidle 0.00 est. delay 376335.23 RS fetchable 100.00 runnable 100.00
2/24/2009 11:55:25 PM   SETI@home   [wfd] CPU: runshare 1.00 debt 0.00 backoff dt 0.00 int 0.00
2/24/2009 11:55:25 PM      [wfd] CUDA: shortfall 371520.00 nidle 1.00 est. delay 0.00 RS fetchable 0.00 runnable 0.00
2/24/2009 11:55:25 PM   SETI@home   [wfd] CUDA: runshare 0.00 debt 0.00 backoff dt 81174.41 int 86400.00
2/24/2009 11:55:25 PM   SETI@home   [wfd] overall_debt 0
2/24/2009 11:55:25 PM      [wfd] ------- end work fetch state -------
2/24/2009 11:55:25 PM      No project chosen for work fetch

Both 6.6.9 & 6.6.10 seem to be doing this for me, and I don't quite know what it means.
Title: Re: It works!
Post by: Yellow_Horror on 24 Feb 2009, 10:18:33 am
it no longer deletes the old app, if it is marked with a different plan_class

Is there any clues how to define plan_class for CPU version of Seti MB app?

P.S. I switch to V7 CUDA app to see if the CUDA freezes still persist (to diff-diag the trouble between the new BOINC and the new app).
Title: Re: It works!
Post by: Yellow_Horror on 24 Feb 2009, 03:27:17 pm
No one freeze with V7 until now. Think, the V9 app from the multi-GPU package is the one thing to blame.
Title: Re: It works!
Post by: Raistmer on 24 Feb 2009, 03:31:35 pm
No one freeze with V7 until now. Think, the V9 app from the multi-GPU package is the one thing to blame.
Replace DLLs to older versions.
Title: Re: It works!
Post by: Yellow_Horror on 25 Feb 2009, 10:50:33 am
Replace DLLs to older versions.
Got a freeze with the same symptoms using MB_6.08_mod_CUDA_V9.exe with old DLLs.
Title: Re: It works!
Post by: Jason G on 25 Feb 2009, 11:07:59 am
Yeah.  I've confirmed this behaviour now on my machine also (v9 with old DLL's).  At about 6.4% it decided to spontaneously pause & ran for an hour at full GPU use (by temperature) with no progess, marked as 'Waiting to Run' (Normally finishes <30mins or so).  Increasing ncpus by 1 (&reread config file) started up another astropulse instead  :-\  so I reset ncpus, restarted boinc and it resumed normally, but similarly stuck on the next task.   No obvious complaints in stderr, and angle ranges were both ~0.44.

I've switched back to v7vlarkill to check everything else is OK, and all is running normally, (2xAPs + 1xCuda, with Maik's script monitoring the show (modified to restart boinc in case of stuck WU., instead of terminating the process)

Important note: I am using a development Boinc 6.6.9 at this time (after getting the same response from 6.6.10), which I had 'work_fetch_debug' turned on and I believe some mechanism in these new versions may be causing the waiting.  I get the impression it may be some 'twiddling' of the scheduling operations going on behind the scenes interacting with the app trying to get things 'right' for the normal user, but don't know for sure, partly as I don't know what some of the more detailed log messages mean.

Jason
Title: Re: It works!
Post by: Raistmer on 25 Feb 2009, 11:43:57 am
Yeah.  I've confirmed this behaviour now on my machine also (v9 with old DLL's).  At about 6.4% it decided to spontaneously pause & ran for an hour at full GPU use (by temperature) with no progess, marked as 'Waiting to Run' (Normally finishes <30mins or so).  Increasing ncpus by 1 (&reread config file) started up another astropulse instead  :-\  so I reset ncpus, restarted boinc and it resumed normally, but similarly stuck on the next task.   No obvious complaints in stderr, and angle ranges were both ~0.44.


Waiting to run - it's BOINC mark. App refused to wait and continue crunching. Pity that you didn't look at state.sah for that task - did it progress or not...
Title: Re: It works!
Post by: Jason G on 25 Feb 2009, 11:48:04 am
Well I waited an hour, and it never finsihed. I can certainly plug the app in again and watch the state.dat when it freezes.  It seems to be a regular occurance at least with this particular app, and Boinc combination (And I do suspect Boinc's scheduling behaviour is to blame pretty strongly).

Shouldn't take long.

Title: Re: It works!
Post by: Raistmer on 25 Feb 2009, 11:55:49 am
I need to know if "refuse to die" mod works or not. If it not work indeed I remove it from CUDA MB.

So I need next info on this topic:

1) When CUDA MB processed task enters in "waiting to run" mode what tasks are in "running" mode ?
2) What access state of GPU_lock file (is it accessible or not)
3) Is any progress inside state.sah of task being in "waiting" state.
4) How many CUDA MB processes running in system at this time
5) temp of GPU (busy or idle).

Title: Re: It works!
Post by: Jason G on 25 Feb 2009, 12:04:20 pm
I can answer most of those now, will update remaining ones at next lockup

....
So I need next info on this topic:

1) When CUDA MB processed task enters in "waiting to run" mode what tasks are in "running" mode ?
    2 x AP 5.03 on CPU cores
2) What access state of GPU_lock file (is it accessible or not)
     ...
3) Is any progress inside state.sah of task being in "waiting" state.
     ...
4) How many CUDA MB processes running in system at this time
     1 on only GPU, 'Waiting to Run'
5) temp of GPU (busy or idle).
     Busy, Full load temp  ~75C  ( not overheating for this GPU either, no screen jerkyness or other apparent issue, can still run ATI Tool with no Artifacts)
Title: Re: It works!
Post by: Raistmer on 25 Feb 2009, 12:19:10 pm

4) How many CUDA MB processes running in system at this time
     1 on only GPU, 'Waiting to Run'

I meant OS processes, not BOINC ones.

And it seems you don't use ncpus field? why only 2 AP tasks on dual core + GPU ?
Title: Re: It works!
Post by: Jason G on 25 Feb 2009, 12:21:14 pm
Read original my post again.  I tried both ways.  If you want me to do this with ncpus=3 .. will do.
Title: Re: It works!
Post by: Geek@Play on 25 Feb 2009, 12:43:12 pm
1) When CUDA MB processed task enters in "waiting to run" mode what tasks are in "running" mode ?
   At this time I would have 4 AstroPulse work units running on 4 CPU cores.

2) What access state of GPU_lock file (is it accessible or not)
   I do not have this file or where it is at.  Only have "boinc_lockfile" located in slots folders.

3) Is any progress inside state.sah of task being in "waiting" state.
   Included here are 2 state files captured several minutes apart.

4) How many CUDA MB processes running in system at this time
   Only the one (1) that is showing "waiting to run" in Boinc Manager.

5) temp of GPU (busy or idle).
   65C busy, 56C idle.

[attachment deleted by admin]
Title: Re: It works!
Post by: Jason G on 25 Feb 2009, 12:54:25 pm
Okay,

 switched to ncpus = 3, and whammo! .. stuck at 1.712% ... ( I reckon some kindof oversubscription mechanism is at play here )
[Filling in answers as I gather the data]
[Edit:] Now while collecting info, this one jumped from 1.712% to 100%, looks completed
 - AR is 0.443445
-  Claimed credit says ~1.94 credit  ???
-  wall time in stderr ~18mins 20 secs.  (~Typical full length run)
http://setiathome.berkeley.edu/result.php?resultid=1172313302

Quote
1) When CUDA MB processed task enters in "waiting to run" mode what tasks are in "running" mode ?
  3 x AP 5.03 (1 per CPU core +1 )
2) What access state of GPU_lock file (is it accessible or not)
 Opens as empty file in notepad without complaint.
3) Is any progress inside state.sah of task being in "waiting" state.
  Not visibly, though timestamp was incrementing every checkpoint period etc
4) How many CUDA MB processes running in system at this time
 Only the 1 'Waiting to run"
5) temp of GPU (busy or idle).
Busy 75C (normal), if run ATITool at same time, no artefacts.
Title: Re: It works!
Post by: Raistmer on 25 Feb 2009, 01:14:01 pm
1) When CUDA MB processed task enters in "waiting to run" mode what tasks are in "running" mode ?
   At this time I would have 4 AstroPulse work units running on 4 CPU cores.

2) What access state of GPU_lock file (is it accessible or not)
   I do not have this file or where it is at.  Only have "boinc_lockfile" located in slots folders.

3) Is any progress inside state.sah of task being in "waiting" state.
   Included here are 2 state files captured several minutes apart.

4) How many CUDA MB processes running in system at this time
   Only the one (1) that is showing "waiting to run" in Boinc Manager.

5) temp of GPU (busy or idle).
   65C busy, 56C idle.

state.sah1
<prog>0.23420485</prog>

state.sah2
<prog>0.29913810</prog>

That is, task in that slot continue to make progress. It seems all as intended to be.
Now it would be interesting to watch until <prog> value reaches to 1 - what will be with BOINC in this case.
Title: Re: It works!
Post by: Jason G on 25 Feb 2009, 01:19:56 pm
Okay, so why the massive 1.94 credit claim on my full length no progress run ? (See updated run info)

It seems there are no function calls to accumulate the flops while it is running. (compute_fraction_done or whatever Cuda app uses) [Edit: It will be interesting to see what the Wingman Claims).
Title: Re: It works!
Post by: Geek@Play on 25 Feb 2009, 01:24:14 pm
Ageless posted here............

http://setiathome.berkeley.edu/forum_thread.php?id=52090&nowrap=true#869394

This may be a Boinc problem?  If so my apologies to Raistmer.
Title: Re: It works!
Post by: Jason G on 25 Feb 2009, 01:29:09 pm
Ageless posted here............

http://setiathome.berkeley.edu/forum_thread.php?id=52090&nowrap=true#869394

This may be a Boinc problem?  If so my apologies to Raistmer.

That was the first suspicion I mentioned (Boinc cpu scheduling foulups) But after experiencing it I'm not so sure.  It may simply be not receiving the updated flops counts for whatever reason.  Hopefully it is their problem not ours  ;)
Title: Re: It works!
Post by: Raistmer on 25 Feb 2009, 01:33:51 pm
Okay, so why the massive 1.94 credit claim on my full length no progress run ? (See updated run info)

It seems there are no function calls to accumulate the flops while it is running. (compute_fraction_done or whatever Cuda app uses) [Edit: It will be interesting to see what the Wingman Claims).
LoL  You know , I never cared about credits much ;D
But it's interesting indeed why it can't accumulate flops in this case... very interesting.
But except credit question all just work as intended! BOINC tried to leave your GPU IDLE (it started only CPU apps ) but CUDA app resisted and made use GPU at least until it finished its work. All possibilities to keep GPU busy were used!
Title: Re: It works!
Post by: Jason G on 25 Feb 2009, 01:44:03 pm
LoL... Yeah, something's busted though for this weird behaviour.  Boinc or app doesn't matter.

Running 3 AP's has my system rather oversubscribed it seems, so throughput was quite a bit better running 2 APs rather than 3.

Will be reverting to my 'classic' setup, and if that doesn't work, will consider pawning my PC for beer money  :P  [Damn it's working no beer for me  ;) ]

Jason
Title: Re: It works!
Post by: Richard Haselgrove on 25 Feb 2009, 02:00:51 pm

[Damn it's working no beer for me  ;) ]

Jason


I'm sure we can sort something out if you join Matt on his tour of Lancashire, Yorkshire, Nottinghamshire, Cambridgeshire.... :P
Title: Re: It works!
Post by: Jason G on 25 Feb 2009, 02:09:12 pm
I'm sure we can sort something out if you join Matt on his tour of Lancashire, Yorkshire, Nottinghamshire, Cambridgeshire.... :P

If that's the tour where we all travel through townships cross country in a yellow combi-van wearing poncho's and funny hats, stopping at every pub along the way, Count Me In!  ;D
Title: Re: It works!
Post by: Yellow_Horror on 25 Feb 2009, 02:26:58 pm
I need to know if "refuse to die" mod works or not. If it not work indeed I remove it from CUDA MB.

So I need next info on this topic:

1) When CUDA MB processed task enters in "waiting to run" mode what tasks are in "running" mode ?
2) What access state of GPU_lock file (is it accessible or not)
3) Is any progress inside state.sah of task being in "waiting" state.
4) How many CUDA MB processes running in system at this time
5) temp of GPU (busy or idle).

It seems to be two different types of "wrong" behaviour on my system.

The first one is the app freezes at 3 seconds of CPU time with progress 0%. The GPU is idle (as shown by nVidia System Monitor by "GPU Usage" and "GPU Temp" parameters). Suspend/Resume the task don't help (no new task is started, no progress of the freezed task after resume). Stop/Restart BOINC help - the previously freezed task run normally and crunching continues... until next freeze. Sorry, no info about gpu_file_lock and state.sah at this moment. Will see at the next freeze.

The second variant is a task suddenly receive "waiting to run" state in the BOINC Manager. The GPU seems to be working, gpu_file_lock is locked, state.sah is renewed timely. Now i wait if such task will be ended in time.
Title: Re: It works!
Post by: Yellow_Horror on 25 Feb 2009, 02:54:54 pm
The "Waiting to run" task is ended (suddenly to BOINC) and reported.
http://setiathome.berkeley.edu/workunit.php?wuid=404255137
Claimed credits seems to be low :(
Title: Re: It works!
Post by: Jason G on 25 Feb 2009, 03:23:56 pm
...
Claimed credits seems to be low :(

Better than my lousy 1.94 credits, LoL ;D
Title: Re: It works!
Post by: Yellow_Horror on 25 Feb 2009, 04:10:09 pm
Claimed credits seems to be low :(

Better than my lousy 1.94 credits, LoL ;D

My task stuck at about 30%, your at 1.712%. Probable it makes sense.
Title: Re: It works!
Post by: Yellow_Horror on 26 Feb 2009, 06:28:23 am
I notice another sign of strange behaviour:

Quote
26/02/2009 16:13:14   SETI@home   [task_debug] result 16ja09ab.7620.1708.12.8.217_2 checkpointed
26/02/2009 16:13:16   SETI@home   [cpu_sched] Preempting 16ja09ab.7620.1708.12.8.217_2 (removed from memory)
26/02/2009 16:13:16   SETI@home   [task_debug] task_state=QUIT_PENDING for 16ja09ab.7620.1708.12.8.217_2 from preempt
26/02/2009 16:13:17   SETI@home   [task_debug] Process for 16ja09ab.7620.1708.12.8.217_2 exited
26/02/2009 16:13:17   SETI@home   [task_debug] task_state=UNINITIALIZED for 16ja09ab.7620.1708.12.8.217_2 from handle_premature_exit
26/02/2009 16:13:17      [coproc_debug] freeing 1 of coproc CUDA
26/02/2009 16:13:17      [work_fetch_debug] Request work fetch: application exited
26/02/2009 16:13:17   SETI@home   [cpu_sched] Starting 16ja09ab.7620.1708.12.8.217_2(resume)
26/02/2009 16:13:17      [coproc_debug] reserving 1 of coproc CUDA
26/02/2009 16:13:17   SETI@home   [task_debug] task_state=EXECUTING for 16ja09ab.7620.1708.12.8.217_2 from start
26/02/2009 16:13:17   SETI@home   Restarting task 16ja09ab.7620.1708.12.8.217_2 using setiathome_enhanced version 608


This is repeated every few minutes, dropping overall performance.

I use BOINC 6.6.10. Is such behaviour normal or not? Must i report it to developers as a bug?
Title: Re: It works!
Post by: Richard Haselgrove on 26 Feb 2009, 06:57:16 am
It probably depends what other projects you have the host attached to, and what resource share SETI has been given.

BOINC should only 'preempt' a SETI task (second line of log) if processing time is owed to another project, and it wants to start (or re-start) a task for that other project. Was any other project invoked, below the end of the log you've posted?

I haven't risked testing any of the v6.6.x line yet, but I see this quite regularly with v6.4.5 - particularly since I also run Astropulse (sometimes intensively testing AP optimisations), and AP counts towards SETI resource share. Although I haven't looked at task debug, what I've seen is entirely compatible with:

Another project needs a turn --> preempt CUDA --> start Einstein (or whatever) --> oops, CUDA is idle --> start CUDA: I end up with 5 running CPU tasks on a 4-core CPU.

Yes, I think this should be reported to the developers, but I've been holding off while they've been wrestling with the comprehensive re-write of all the work-fetch issues. But I've seen signs that they might be about to make the v6.6 range 'recommended' - should we push them into reviewing task switching first, or wait for v6.8?

One simple but, I suspect, possibly overlooked point: if your resource share is less than your CUDA hardware proportion, BOINC is always going to have a work allocation dilemma. For a quad plus one GPU, resource share needs to be at least 20%: duo plus 2 GPU, 50%: and so on.
Title: Re: It works!
Post by: Yellow_Horror on 26 Feb 2009, 10:51:49 am
It probably depends what other projects you have the host attached to, and what resource share SETI has been given.

BOINC should only 'preempt' a SETI task (second line of log) if processing time is owed to another project, and it wants to start (or re-start) a task for that other project. Was any other project invoked, below the end of the log you've posted?

There is no other projects on this host.
Title: Re: It works!
Post by: Richard Haselgrove on 26 Feb 2009, 11:59:55 am
It probably depends what other projects you have the host attached to, and what resource share SETI has been given.

BOINC should only 'preempt' a SETI task (second line of log) if processing time is owed to another project, and it wants to start (or re-start) a task for that other project. Was any other project invoked, below the end of the log you've posted?

There is no other projects on this host.

The only other reason I can think of for a pre-empt is to start a task in 'High Priority' because it might be in deadline trouble. I have a box at the moment which is alternating: it does a VLAR task, which drives up TDCF - runs 3 or 4 shorties in EDF, which drives it back down - goes back for another VLAR - panics and does a couple of shorties - etc. etc. Anything like that on your system?

If it's preempting with no discernable reason, that should certainly be reported to the developers, with extended logs (four or five full cycles) attached.
Title: Re: It works!
Post by: Yellow_Horror on 26 Feb 2009, 12:32:50 pm
I don't know if it correspond with previous problem or not, but i notice one more strange thing: this morning my BOINC download some new WUs that have typical Seti MB names. But these tasks have empty "application" field in the BOINC "Tasks" panel. If i suspend all other tasks, one of this "no-app" tasks starts to be crunched with V9 CUDA application, but its "application" field remains empty.

Maybe i miss something critical in my app_info.xml (it remains unchanged from the time i start this topic)?
Title: Re: It works!
Post by: Leopoldo on 26 Feb 2009, 12:58:23 pm
But these tasks have empty "application" field in the BOINC "Tasks" panel.

Each downloaded workunit is registered in file client_state.xml at BOINC-data folder. Record there contains info about application name and version.
Title: Re: It works!
Post by: Yellow_Horror on 26 Feb 2009, 01:38:53 pm
There is the part of client_state.xml corresponding to one of the "app-empty" tasks.
Nothing weird as far as i can see:
Quote

<workunit>
    <name>11ja09af.31819.74374.10.8.178</name>
    <app_name>setiathome_enhanced</app_name>
    <version_num>608</version_num>
    <rsc_fpops_est>78918495483347.094000</rsc_fpops_est>
    <rsc_fpops_bound>789184954833471.000000</rsc_fpops_bound>
    <rsc_memory_bound>33554432.000000</rsc_memory_bound>
    <rsc_disk_bound>33554432.000000</rsc_disk_bound>
    <file_ref>
        <file_name>11ja09af.31819.74374.10.8.178</file_name>
        <open_name>work_unit.sah</open_name>
    </file_ref>
</workunit>
Title: Re: It works!
Post by: Jason G on 26 Feb 2009, 01:46:14 pm
Probably cosmetics related to the juggling in process to get the app scheduling 'right' .  I think they might have quite some playing to do to get the mechanism to work with app_info file the way we'd all like to see, so I'm expecting some related things to break and change, then be fixed again afterwards.
Title: Re: It works!
Post by: Yellow_Horror on 26 Feb 2009, 03:54:36 pm
As far as i can see at Seti@Home message board, randomly stop/resume a task is a bug of 6.6.10. Will see if it is fixed in 6.6.11... or downgrade to 6.6.9.

Can someone tell me the exactly address of BOINC "wishlist"? I wish an option to suspend CUDA while a fullscreen application (other than screensaver of course) is executed. IMHO, it is the simplest way to solve "laggy DVD-player" issue.
Title: Re: It works!
Post by: Richard Haselgrove on 26 Feb 2009, 04:20:29 pm
The official route is to make out a trac ticket.

You could add a comment to reinforce http://boinc.berkeley.edu/trac/ticket/842, which has just been postponed from 'v6.6' to 'Undetermined' (i.e. 'far future')
Title: Re: It works!
Post by: Richard Haselgrove on 26 Feb 2009, 06:02:46 pm
Sorry, that seems to have backfired. David Anderson has replied, following Yellow_Horror's addition:

Quote from: David Anderson
resolution set to wontfix.

Currently, there's a preference to not do GPU computing while the computer is in use, and a config option to not do any computing while particular applications are running. That's about it for now; I don't know of a way to find out if another graphics-intensive app is running.
Title: Re: It works!
Post by: Yellow_Horror on 10 Mar 2009, 05:16:23 am
BOINC v.6.6.12 just download some "setiathome_enhanced 6.3" tasks to feed my CPU. It fall into "high priority" mode afterward so i can't check now if it can download 6.8 tasks correctly to feed the GPU (there are some "application-empty GPU WU" left to do from the previous version of BOINC).
Title: Re: It works!
Post by: Claggy on 10 Mar 2009, 06:17:54 am
BOINC v.6.6.12 just download some "setiathome_enhanced 6.3" tasks to feed my CPU. It fall into "high priority" mode afterward so i can't check now if it can download 6.8 tasks correctly to feed the GPU (there are some "application-empty GPU WU" left to do from the previous version of BOINC).

If you put <sched_ops>1</sched_ops> in a cc_config.xml file you can see what type of work request you get, eithier CPU or GPU,
With Boinc 6.6.12 CPU work requests works do work, But GPU work requests didn't, you just got a 24hour backoff, but Seti Main now has a new scheduler,
so GPU requests might work now, on Boinc 6.6.14 if the CPU tasks are in "high priority", it'll still do a GPU request to get Cuda work, 6.6.12 might be the same.

Claggy
Title: Re: It works!
Post by: Yellow_Horror on 10 Mar 2009, 06:59:40 am
"High priority" state is cleared for now. So i experiment to increase my "additional work buffer" step by step. First, my BOINC gets few 6.3 WUs with each increase, just as i suppose. But at the last step (to 8.8 days) it suddenly gets twenty 6.3 WUs then one additional 6.8 (cuda) WU. That outcome seems strange to me. Let see how it will be continued.

P.S. It seems that 6.6.12 gets GPU work when asking for GPU work only. Asking for CPU+GPU work it gets CPU work only. Will try 6.6.14 later today.