Forum > GPU crunching

VBscript Fights Cuda

<< < (13/17) > >>

Jason G:
I finished with AP testing, and loaded up the 6.08 cuda app with VLAR Kill mod, and still have been getting frozen tasks, So I've just now enabled the script again.

Since they fixed a lot of things, and the app itself will be terminating the Sluggish VLAR's anyway, one thing that seems to work for me is restarting Boinc.  Is it possible to modify the script such that it restarts Boinc on detection of a stuck task,(with perhaps a few seconds in between stopping then starting the service... for us with service install anyway  ;) ) rather than terminating the task?

Anyway, seeing how it will go.  Pretty good if I can stop having to manually check the machine every few minutes  ;).

Jason G:
I've had one stick so far that the script seemed to not be able to terminate?

Relevant Log fragment: (Sorry for long post  :() [Edit: Not happened again since this one time, a fluke ?]


--- Quote ---1/26/2009 1:25:48 PM > -watching new task: 16dc08ad.22822.1708.5.8.113_1
1/26/2009 1:25:48 PM > -true angle range: 0.43389603140378
1/26/2009 1:25:48 PM >
1/26/2009 1:25:48 PM > MB_6.08_mod_VLAR_kill_CUDA.exe at 0% (1. time)
1/26/2009 1:26:34 PM > ID: 2260 0%, running: 46s
1/26/2009 1:26:34 PM > MB_6.08_mod_VLAR_kill_CUDA.exe at 0% (2. time)
1/26/2009 1:27:19 PM > ID: 2260 0%, running: 91s
1/26/2009 1:27:19 PM > MB_6.08_mod_VLAR_kill_CUDA.exe at 0% (3. time)
1/26/2009 1:28:05 PM > ID: 2260 0%, running: 137s
1/26/2009 1:28:05 PM > MB_6.08_mod_VLAR_kill_CUDA.exe at 0% (4. time)
1/26/2009 1:28:05 PM >
1/26/2009 1:28:05 PM > ! terminating process MB_6.08_mod_VLAR_kill_CUDA.exe
1/26/2009 1:28:05 PM > ! - crashed WU-file: 16dc08ad.22822.1708.5.8.113_1
1/26/2009 1:28:05 PM >  --- > RunningTime: 137 --- MinRunTime: 120
1/26/2009 1:28:05 PM >
1/26/2009 1:28:05 PM > trying to copy wu
1/26/2009 1:28:05 PM > File: 16dc08ad.22822.1708.5.8.113.wu copied ...
1/26/2009 1:28:05 PM > --> trying copy of stderr ...
1/26/2009 1:28:05 PM > --> WU_copy_log.txt found
1/26/2009 1:28:05 PM > --> client_state.xml found
1/26/2009 1:28:50 PM > ID: 2260 0%, running: 182s
1/26/2009 1:28:50 PM > MB_6.08_mod_VLAR_kill_CUDA.exe at 0% (1. time)
1/26/2009 1:29:36 PM > ID: 2260 0%, running: 228s
1/26/2009 1:29:36 PM > MB_6.08_mod_VLAR_kill_CUDA.exe at 0% (2. time)
1/26/2009 1:30:21 PM > ID: 2260 0%, running: 273s
1/26/2009 1:30:21 PM > MB_6.08_mod_VLAR_kill_CUDA.exe at 0% (3. time)
1/26/2009 1:31:07 PM > ID: 2260 0%, running: 319s
1/26/2009 1:31:07 PM > MB_6.08_mod_VLAR_kill_CUDA.exe at 0% (4. time)
1/26/2009 1:31:07 PM >
1/26/2009 1:31:07 PM > ! terminating process MB_6.08_mod_VLAR_kill_CUDA.exe
1/26/2009 1:31:07 PM > ! - crashed WU-file: 16dc08ad.22822.1708.5.8.113_1
1/26/2009 1:31:07 PM >  --- > RunningTime: 319 --- MinRunTime: 120
1/26/2009 1:31:07 PM >
1/26/2009 1:31:07 PM > trying to copy wu
1/26/2009 1:31:07 PM > 16dc08ad.22822.1708.5.8.113.wu already exists, no copy ne
eded
1/26/2009 1:31:07 PM > --> trying copy of stderr ...
1/26/2009 1:31:07 PM > --> WU_copy_log.txt found
1/26/2009 1:31:07 PM > --> client_state.xml found
1/26/2009 1:31:52 PM > ID: 2260 0%, running: 364s
1/26/2009 1:31:52 PM > MB_6.08_mod_VLAR_kill_CUDA.exe at 0% (1. time)
1/26/2009 1:32:38 PM > ID: 2260 0%, running: 410s
1/26/2009 1:32:38 PM > MB_6.08_mod_VLAR_kill_CUDA.exe at 0% (2. time)
1/26/2009 1:33:23 PM > ID: 2260 0%, running: 455s
1/26/2009 1:33:23 PM > MB_6.08_mod_VLAR_kill_CUDA.exe at 0% (3. time)
1/26/2009 1:34:09 PM > ID: 2260 0%, running: 501s
1/26/2009 1:34:09 PM > MB_6.08_mod_VLAR_kill_CUDA.exe at 0% (4. time)
1/26/2009 1:34:09 PM >
1/26/2009 1:34:09 PM > ! terminating process MB_6.08_mod_VLAR_kill_CUDA.exe
1/26/2009 1:34:09 PM > ! - crashed WU-file: 16dc08ad.22822.1708.5.8.113_1
1/26/2009 1:34:09 PM >  --- > RunningTime: 501 --- MinRunTime: 120
1/26/2009 1:34:09 PM >
1/26/2009 1:34:09 PM > trying to copy wu
1/26/2009 1:34:09 PM > 16dc08ad.22822.1708.5.8.113.wu already exists, no copy ne
eded

--- End quote ---

 Then I manually aborted the task (actually killed it it task manager),  Though a restart of the Boinc service would have allowed it to process, more than likely:


--- Quote ---1/26/2009 1:34:09 PM > --> trying copy of stderr ...
1/26/2009 1:34:09 PM > --> WU_copy_log.txt found
1/26/2009 1:34:09 PM > --> client_state.xml found
1/26/2009 1:34:54 PM > ID: 2260 0%, running: 546s
1/26/2009 1:34:54 PM > MB_6.08_mod_VLAR_kill_CUDA.exe at 0% (1. time)
1/26/2009 1:35:40 PM >
1/26/2009 1:35:40 PM > -watching new task: 16dc08ad.22822.1708.5.8.74_1
1/26/2009 1:35:40 PM > -true angle range: 0.43389603140378
1/26/2009 1:35:40 PM >
1/26/2009 1:35:40 PM > resetting counter
1/26/2009 1:35:40 PM > -increasing BreakPerCycle to 43 sec
1/26/2009 1:35:40 PM > -increasing CpuReadTime to 4 sec
1/26/2009 1:36:27 PM > ID: 1864 11%, running: 47s
--- End quote ---

Maik:
@Jason: I cant reconstruct the error, sorry.
I see the script wasn't termianting the task because of the process-ID did not change.
Its the same error like RandyC reported.

@RandyC: What OS are you using? The 'VB-Script terminate command' does not work on all OS.
And no. You don not need -sam to make the script terminating idleing taks processes.

Maybe booth could attach a full (not snipped) log.txt next time?
I wasn't really thinking that the script is needed such a long time so I did not add any routine to print debug informations with internal variable settings ^^

Jason G:
Sure, will attach a full log if it happens again.  Have run out of MB work since then, and it never happened again.

randyconk:

--- Quote from: Maik on 27 Jan 2009, 12:32:01 am ---@Jason: I cant reconstruct the error, sorry.
I see the script wasn't termianting the task because of the process-ID did not change.
Its the same error like RandyC reported.

@RandyC: What OS are you using? The 'VB-Script terminate command' does not work on all OS.
And no. You don not need -sam to make the script terminating idleing taks processes.

Maybe booth could attach a full (not snipped) log.txt next time?
I wasn't really thinking that the script is needed such a long time so I did not add any routine to print debug informations with internal variable settings ^^

--- End quote ---

I am using XP Pro 32b SP3. I don't know if that's my problem or not. One thing I thought of a while ago: I am using a Service install for BOINC. It may be that with a Service install, the userid running your script is not allowed to terminate BOINC's tasks.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version