SETI MB CUDA for Linux

Forum > Linux

<< < (145/162) > >>

Metod, S56RKO:
Here are my experiences after some days: it works ;)

My observations:

* settings of <avg_ncpus> and <max_ncpus> don't matter much (if at all)
* settings of <flops> should not be too high or else BOINC bails out due to excessive resources (read: CPU/GPU cycles) being used
* helps a lot if one sets 'run GPU tasks while computer in use' ... that's why I never observed GPU task being run however it has been run whule I was away. Don't do it blindly, think about interactive use of system.
* setting niceness of CPU-part of GPU task to 0 (normal priority) doesn't seem to affect things a lot, but doesn't hurt. One thing, not really by-the-book: one needs to run script as root or else setting higher priority actually fails (only root can increase priority). Which opens potential security hole.
* At least the app I'm running (x86, 2.2, vlar-kill) has a nasty habit of complaining:
--- Code: ---Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 497 : invalid configuration argument.

--- End code ---
Seems benign though as most results have validated. Is there any particular reason for this error being reported and app seemingly still operating OK?
[/list]

Sunu, thank you for all advice.!

Metod, S56RKO:

--- Quote from: riofl on 31 Aug 2010, 05:09:44 am ---i'm currently using nvidia-drivers-195.36.31. noticed an upgrade available to nvidia-drivers-256.52.
i'm always a bit suspicious of large jumps in upgrade versions. worth it? avoid it?

--- End quote ---

I can't say anything about worthiness, however the new one works for me.

sunu:

--- Quote from: riofl on 31 Aug 2010, 05:09:44 am ---i'm currently using nvidia-drivers-195.36.31. noticed an upgrade available to nvidia-drivers-256.52.
i'm always a bit suspicious of large jumps in upgrade versions. worth it? avoid it?

--- End quote ---
There have been quite a few releases between them, so not really a big jump. You can try it and if you don't like it, revert back.

--- Quote from: Metod, S56RKO on 31 Aug 2010, 07:07:35 am ---settings of <avg_ncpus> and <max_ncpus> don't matter much (if at all)

--- End quote ---
Wrong

--- Quote from: Metod, S56RKO on 31 Aug 2010, 07:07:35 am ---settings of <flops> should not be too high or else BOINC bails out due to excessive resources (read: CPU/GPU cycles) being used

--- End quote ---
Wrong

--- Quote from: Metod, S56RKO on 31 Aug 2010, 07:07:35 am ---setting niceness of CPU-part of GPU task to 0 (normal priority) doesn't seem to affect things a lot, but doesn't hurt.

--- End quote ---
It seems to depend on the kernel/distro used. Other systems seem to highly benefit from it, others not so much.

Metod, S56RKO:

--- Quote from: sunu on 31 Aug 2010, 07:27:09 am ---
--- Quote from: Metod, S56RKO on 31 Aug 2010, 07:07:35 am ---settings of <avg_ncpus> and <max_ncpus> don't matter much (if at all)

--- End quote ---
Wrong

--- End quote ---
How so? I've tried some values between 0.00 and 0.15 and I haven't noticed any difference. The only time that I could imagine the difference to pop up is if there are multiple (probably more than 3-4) GPUs installed and used.

--- Quote from: sunu on 31 Aug 2010, 07:27:09 am ---
--- Quote from: Metod, S56RKO on 31 Aug 2010, 07:07:35 am ---settings of <flops> should not be too high or else BOINC bails out due to excessive resources (read: CPU/GPU cycles) being used

--- End quote ---
Wrong

--- End quote ---
If not, what then? My estimates are currently way too high (around 4 days) so I tried to fix it by changing <flops> value. If I set it 10-times larger, WUs erred out due to excessive resources used. Run time (wall) was roughly the same as for successful WUs, so I can attribute the error only to too high <flops> value.

Josef W. Segur:

--- Quote from: Metod, S56RKO on 31 Aug 2010, 07:07:35 am ---settings of <avg_ncpus> and <max_ncpus> don't matter much (if at all)

--- End quote ---

--- Quote from: sunu on 31 Aug 2010, 07:27:09 am ---Wrong

--- End quote ---

--- Quote from: Metod, S56RKO on 31 Aug 2010, 02:09:49 pm ---How so? I've tried some values between 0.00 and 0.15 and I haven't noticed any difference. The only time that I could imagine the difference to pop up is if there are multiple (probably more than 3-4) GPUs installed and used.
--- End quote ---

Set 1 and BOINC will reserve a full CPU for each GPU. Set 0.71 as the project app_plan is doing for some hosts running stock Windows builds and if the system has 2 GPUs one CPU will be reserved, etc. You're right that small fractional settings are generally insignificant.

--- Quote from: Metod ---settings of <flops> should not be too high or else BOINC bails out due to excessive resources (read: CPU/GPU cycles) being used

--- End quote ---

--- Quote from: sunu ---Wrong

--- End quote ---

--- Quote from: Metod ---If not, what then? My estimates are currently way too high (around 4 days) so I tried to fix it by changing <flops> value. If I set it 10-times larger, WUs erred out due to excessive resources used. Run time (wall) was roughly the same as for successful WUs, so I can attribute the error only to too high <flops> value.
--- End quote ---

The relationships are : rsc_fpops_bound/flops = elapsed time limit. DCF*rsc_fpops_est/flops = estimated runtime. rsc_fpops_bound = 10*rsc_fpops_est.

If DCF is near or greater than 10 as sometimes happens, the estimated runtime is longer than the allowed runtime. Reducing DCF can reduce the estimates and thereby allow work fetch, without changing the allowed runtime. Adjusting flops to more than a realistic value for the host is not a very good idea, but adjusting rsc_fpops_bound values higher can protect against those errors.

With the servers attempting to provide rsc_fpops_est and _bound values which are about right for DCF 1.0, we can hope things will settle down after they have enough data to know how fast the applications are. Unfortunately the initial transitions are painful.
Joe

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version