Author Topic: SETI MB CUDA for Linux (Read 543504 times)

Metod, S56RKO · « **Reply #720 on:** 31 Aug 2010, 07:07:35 am »

Here are my experiences after some days: it works

My observations:

settings of <avg_ncpus> and <max_ncpus> don't matter much (if at all)
settings of <flops> should not be too high or else BOINC bails out due to excessive resources (read: CPU/GPU cycles) being used
helps a lot if one sets 'run GPU tasks while computer in use' ... that's why I never observed GPU task being run however it has been run whule I was away. Don't do it blindly, think about interactive use of system.
setting niceness of CPU-part of GPU task to 0 (normal priority) doesn't seem to affect things a lot, but doesn't hurt. One thing, not really by-the-book: one needs to run script as root or else setting higher priority actually fails (only root can increase priority). Which opens potential security hole.
At least the app I'm running (x86, 2.2, vlar-kill) has a nasty habit of complaining:

Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 497 : invalid configuration argument.

Seems benign though as most results have validated. Is there any particular reason for this error being reported and app seemingly still operating OK?
[/list]

Sunu, thank you for all advice.!

Metod, S56RKO · « **Reply #721 on:** 31 Aug 2010, 07:09:01 am »

Quote from: riofl on 31 Aug 2010, 05:09:44 am

i'm currently using nvidia-drivers-195.36.31. noticed an upgrade available to nvidia-drivers-256.52.
i'm always a bit suspicious of large jumps in upgrade versions. worth it? avoid it?

I can't say anything about worthiness, however the new one works for me.

sunu · « **Reply #722 on:** 31 Aug 2010, 07:27:09 am »

Quote from: riofl on 31 Aug 2010, 05:09:44 am

i'm currently using nvidia-drivers-195.36.31. noticed an upgrade available to nvidia-drivers-256.52.
i'm always a bit suspicious of large jumps in upgrade versions. worth it? avoid it?

There have been quite a few releases between them, so not really a big jump. You can try it and if you don't like it, revert back.

Quote from: Metod, S56RKO on 31 Aug 2010, 07:07:35 am

settings of <avg_ncpus> and <max_ncpus> don't matter much (if at all)

Wrong

Quote from: Metod, S56RKO on 31 Aug 2010, 07:07:35 am

settings of <flops> should not be too high or else BOINC bails out due to excessive resources (read: CPU/GPU cycles) being used

Wrong

Quote from: Metod, S56RKO on 31 Aug 2010, 07:07:35 am

setting niceness of CPU-part of GPU task to 0 (normal priority) doesn't seem to affect things a lot, but doesn't hurt.

It seems to depend on the kernel/distro used. Other systems seem to highly benefit from it, others not so much.

Metod, S56RKO · « **Reply #723 on:** 31 Aug 2010, 02:09:49 pm »

Quote from: sunu on 31 Aug 2010, 07:27:09 am

Quote from: Metod, S56RKO on 31 Aug 2010, 07:07:35 am
settings of <avg_ncpus> and <max_ncpus> don't matter much (if at all)
Wrong

How so? I've tried some values between 0.00 and 0.15 and I haven't noticed any difference. The only time that I could imagine the difference to pop up is if there are multiple (probably more than 3-4) GPUs installed and used.

Quote from: sunu on 31 Aug 2010, 07:27:09 am

Quote from: Metod, S56RKO on 31 Aug 2010, 07:07:35 am
settings of <flops> should not be too high or else BOINC bails out due to excessive resources (read: CPU/GPU cycles) being used
Wrong

If not, what then? My estimates are currently way too high (around 4 days) so I tried to fix it by changing <flops> value. If I set it 10-times larger, WUs erred out due to excessive resources used. Run time (wall) was roughly the same as for successful WUs, so I can attribute the error only to too high <flops> value.

Josef W. Segur · « **Reply #724 on:** 31 Aug 2010, 03:05:46 pm »

Quote from: Metod, S56RKO on 31 Aug 2010, 07:07:35 am

settings of <avg_ncpus> and <max_ncpus> don't matter much (if at all)

Quote from: sunu on 31 Aug 2010, 07:27:09 am

Wrong

Quote from: Metod, S56RKO on 31 Aug 2010, 02:09:49 pm

How so? I've tried some values between 0.00 and 0.15 and I haven't noticed any difference. The only time that I could imagine the difference to pop up is if there are multiple (probably more than 3-4) GPUs installed and used.

Set 1 and BOINC will reserve a full CPU for each GPU. Set 0.71 as the project app_plan is doing for some hosts running stock Windows builds and if the system has 2 GPUs one CPU will be reserved, etc. You're right that small fractional settings are generally insignificant.

Quote from: Metod

settings of <flops> should not be too high or else BOINC bails out due to excessive resources (read: CPU/GPU cycles) being used

Quote from: sunu

Wrong

Quote from: Metod

If not, what then? My estimates are currently way too high (around 4 days) so I tried to fix it by changing <flops> value. If I set it 10-times larger, WUs erred out due to excessive resources used. Run time (wall) was roughly the same as for successful WUs, so I can attribute the error only to too high <flops> value.

The relationships are : rsc_fpops_bound/flops = elapsed time limit. DCF*rsc_fpops_est/flops = estimated runtime. rsc_fpops_bound = 10*rsc_fpops_est.

If DCF is near or greater than 10 as sometimes happens, the estimated runtime is longer than the allowed runtime. Reducing DCF can reduce the estimates and thereby allow work fetch, without changing the allowed runtime. Adjusting flops to more than a realistic value for the host is not a very good idea, but adjusting rsc_fpops_bound values higher can protect against those errors.

With the servers attempting to provide rsc_fpops_est and _bound values which are about right for DCF 1.0, we can hope things will settle down after they have enough data to know how fast the applications are. Unfortunately the initial transitions are painful.
Joe

riofl · « **Reply #725 on:** 01 Oct 2010, 10:49:19 pm »

either today's batch of downloads is supposed to take a very long time for a gpu to complete or i have something going wrong. my fastest gpu is taking 3 hours 2 minutes to reach 85%! and even the others are taking 10 to 15 minutes longer on the other 2 gpus. all 3 gpu temps are also much lower than normal. typically they run 58-65c max load and i have not seen them rise above 50c in several hours.

is this a 'common' experience others are having too today or am i facing something going haywire?

Claggy · « **Reply #726 on:** 02 Oct 2010, 07:23:28 am »

Quote from: riofl on 01 Oct 2010, 10:49:19 pm

either today's batch of downloads is supposed to take a very long time for a gpu to complete or i have something going wrong. my fastest gpu is taking 3 hours 2 minutes to reach 85%! and even the others are taking 10 to 15 minutes longer on the other 2 gpus. all 3 gpu temps are also much lower than normal. typically they run 58-65c max load and i have not seen them rise above 50c in several hours.

is this a 'common' experience others are having too today or am i facing something going haywire?

Check out your results:

resultid=1717782169 on hostid=4166601

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>

SETI@home MB CUDA 3.0 6.09 Linux 64bit - r16 by Crunch3r :p 
- thread priority mod 

setiathome_CUDA: Found 3 CUDA device(s):
   Device 1 : GeForce GTX 285 
           totalGlobalMem = 1073020928 
           sharedMemPerBlock = 16384 
           regsPerBlock = 16384 
           warpSize = 32 
           memPitch = 2147483647 
           maxThreadsPerBlock = 512 
           clockRate = 1476000 
           totalConstMem = 65536 
           major = 1 
           minor = 3 
           textureAlignment = 256 
           deviceOverlap = 1 
           multiProcessorCount = 30 
   Device 2 : GeForce GTX 295 
           totalGlobalMem = 939327488 
           sharedMemPerBlock = 16384 
           regsPerBlock = 16384 
           warpSize = 32 
           memPitch = 2147483647 
           maxThreadsPerBlock = 512 
           clockRate = 1345500 
           totalConstMem = 65536 
           major = 1 
           minor = 3 
           textureAlignment = 256 
           deviceOverlap = 1 
           multiProcessorCount = 30 
   Device 3 : GeForce GTX 295 
           totalGlobalMem = 939327488 
           sharedMemPerBlock = 16384 
           regsPerBlock = 16384 
           warpSize = 32 
           memPitch = 2147483647 
           maxThreadsPerBlock = 512 
           clockRate = 1345500 
           totalConstMem = 65536 
           major = 1 
           minor = 3 
           textureAlignment = 256 
           deviceOverlap = 1 
           multiProcessorCount = 30 
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: GeForce GTX 285 is okay
SETI@home using CUDA accelerated device GeForce GTX 285
Cuda error 'cufftPlan1d(&fft_analysis_plans[FftNum], FftLen, CUFFT_C2C, NumDataPoints / FftLen)' in file './cudaAcc_fft.cu' in line 49 : no CUDA-capable device is available.
Cuda error 'cufftPlan1d(&fft_analysis_plans[FftNum], FftLen, CUFFT_C2C, NumDataPoints / FftLen)' in file './cudaAcc_fft.cu' in line 49 : no CUDA-capable device is available.
setiathome_CUDA: CUDA runtime ERROR in plan FFT. Falling back to HOST CPU processing...
setiathome_enhanced 6.01 Revision: 737 g++ (GCC) 4.2.1 (SUSE Linux)
libboinc: BOINC 6.11.0

Work Unit Info:
...............
WU true angle range is :  1.433000

Flopcounter: 11714606392639.039062

Spike count:    1
Pulse count:    0
Triplet count:  0
Gaussian count: 0
05:22:35 (16178): called boinc_finish

</stderr_txt>

I suggest you try first restarting Boinc, then your computer.

Claggy

riofl · « **Reply #727 on:** 02 Oct 2010, 12:35:17 pm »

i noticed it was using 100% cpu and that is what tipped me off as well..
i shut down for about 1 min then restarted and that seems to have cured it.
i am wondering though if this is a symptom of something going bad or if it
was just the occasional 'fluke'

sunu · « **Reply #728 on:** 02 Oct 2010, 05:19:48 pm »

Do you still make heavy use of your main graphics card?

What driver do you use?

riofl · « **Reply #729 on:** 02 Oct 2010, 10:32:30 pm »

Quote from: sunu on 02 Oct 2010, 05:19:48 pm

Do you still make heavy use of your main graphics card?

What driver do you use?

im using 256.53 driver with cuda 3.1

yes. i have a monitor off both ports of the 285 and one monitor off the 295 and i make heavy use of them though mostly it is in ssh, browser, email , instant msg, text editor windows. the monitors are set up in a zinerama/twinview mixture to get 3 on one desktop.

this is the first time this problem has h appened, and since i power cycled the machine it has not happened since.. although i did do something out of the ordinary yesterday. tried to watch a training seminar video but it wouldnt play. had some wrong version codecs somehow since it did work 2 weeks ago. that may have tossed the vid card into a strange state since i had to kill the player. wound up eating all available memory.i think when something like this happens in the future like with the vid player, ill just power off and start up again to be safe.

sunu · « **Reply #730 on:** 02 Oct 2010, 11:22:27 pm »

Why cuda 3.1? I think you shouldn't use it. Cuda 3.x is intended for different software and hardware.

riofl · « **Reply #731 on:** 03 Oct 2010, 09:11:29 am »

i forget who told me but they said it was backward compatible and that performance was better.

riofl · « **Reply #732 on:** 03 Oct 2010, 09:20:21 am »

Quote from: sunu on 02 Oct 2010, 11:22:27 pm

Why cuda 3.1? I think you shouldn't use it. Cuda 3.x is intended for different software and hardware.

that must have been in an upgrade done yesterday or the day before. the list was long and i really didnt look carefully at it.
i have reinstalled cuda-toolkit 2.1. it appears device 2 started causing issues now and was done with each workunit as it began working on it.
this happened in the past hour i think... hopefully this will cure the problems.

sunu · « **Reply #733 on:** 03 Oct 2010, 05:09:57 pm »

Cuda 2.3 would be the best choice.

riofl · « **Reply #734 on:** 03 Oct 2010, 06:39:29 pm »

argh. i didnt even notice the typo.. yes i installed 2.3 not 2.1.. sorry

Author Topic: SETI MB CUDA for Linux (Read 543504 times)

Metod, S56RKO

Re: SETI MB CUDA for Linux

Metod, S56RKO

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

Metod, S56RKO

Re: SETI MB CUDA for Linux

Josef W. Segur

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

Claggy

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux