Author Topic: SETI MB CUDA for Linux (Read 656078 times)

riofl · « **Reply #330 on:** 15 Aug 2009, 11:03:59 pm »

well we shall see. i compiled 6.9.0 and it properly uses the 2 devices but the reporting is broken. it tells me in the msg log that i have 2 teslas. i have not looked at the code at all but it seems to me that the devices would be kept in an array and it should be a very simple thing to transverse the array reporting the proper string in each. seems like the index is broken. minor issue but i would think it would only take a few seconds for someone familiar with the code to fix that.

the message log also states it cannot load library libcal.so. too late tonight but ill look to see if it is supposed to be created by the boinc make and try to track down what happened. if not then i dont know where it comes from.

boincmgr would not compile for me so im still using 6.6.37 manager but it works.

if it continues to work like it has in the past 5 min it will be nice to run a new version for a change

riofl · « **Reply #331 on:** 15 Aug 2009, 11:20:22 pm »

hmm either that libcal has something to do with workunit calculations or i just got an entire cache full of big units. not one time is under 2:45 and watching boinc process it is EXTREMELY slow both on the cpus and gpus. system load and all other things are normal. unless since 6.6.11 calculations were severely broken that it will take a while for this version to fix that up and get it right. dunno. will see what it looks like in the morning.

sunu · « **Reply #332 on:** 16 Aug 2009, 05:11:30 am »

Yes, trunk is the one to get.

What boinc reports is a minor cosmetic bug. The important thing is to use all gpus properly.

libcal.so is for ATI cards (something like libcudart.so for NVIDIA cards). ATI card support was added a couple of days ago for milkyway@home. It should be irrelevant to us.

I've never bothered with boincmgr while compiling from source. I use the released ones. As long as boinc works properly, we're ok.

Lately there was an increase in sensitivity so most of the recent workunits are big ones. In my pc they take about 12-15 min for the gpu and about 1:45-2:00 hours for the cpu. Boinc doesn't have anything to do with the speed of computations, unless it uses 100% of the CPU slowing things down.

riofl · « **Reply #333 on:** 16 Aug 2009, 06:38:19 am »

Quote from: sunu on 16 Aug 2009, 05:11:30 am

Yes, trunk is the one to get.

What boinc reports is a minor cosmetic bug. The important thing is to use all gpus properly.

libcal.so is for ATI cards (something like libcudart.so for NVIDIA cards). ATI card support was added a couple of days ago for milkyway@home. It should be irrelevant to us.

I've never bothered with boincmgr while compiling from source. I use the released ones. As long as boinc works properly, we're ok.

Lately there was an increase in sensitivity so most of the recent workunits are big ones. In my pc they take about 12-15 min for the gpu and about 1:45-2:00 hours for the cpu. Boinc doesn't have anything to do with the speed of computations, unless it uses 100% of the CPU slowing things down.

was concerned since previously i have never had a cuda work unit take more than 15min to process with typical 9 to 13 min, they are now taking approx 30 min for each card. and my rac has dropped for this machine by more than 400 points. ill just keep plugging away for a while to let things settle out. nothing was changed in the 'backend' applications so it must be the larger workunits presented.

overall boinc seems to be managing things nicely. it no longer keeps a backlog of completed units to report which is refreshing.

sunu · « **Reply #334 on:** 16 Aug 2009, 08:20:23 am »

Quote from: riofl on 16 Aug 2009, 06:38:19 am

was concerned since previously i have never had a cuda work unit take more than 15min to process with typical 9 to 13 min, they are now taking approx 30 min for each card. and my rac has dropped for this machine by more than 400 points. ill just keep plugging away for a while to let things settle out. nothing was changed in the 'backend' applications so it must be the larger workunits presented.

No,,this is not good. Check how boinc handles the tasks. When a cuda workunit finishes, does it also stop the other one running to start a new pair?

riofl · « **Reply #335 on:** 16 Aug 2009, 08:55:36 am »

Quote from: sunu on 16 Aug 2009, 08:20:23 am

Quote from: riofl on 16 Aug 2009, 06:38:19 am
was concerned since previously i have never had a cuda work unit take more than 15min to process with typical 9 to 13 min, they are now taking approx 30 min for each card. and my rac has dropped for this machine by more than 400 points. ill just keep plugging away for a while to let things settle out. nothing was changed in the 'backend' applications so it must be the larger workunits presented.

No,,this is not good. Check how boinc handles the tasks. When a cuda workunit finishes, does it also stop the other one running to start a new pair?

i think it may be because it is asking for new gpu workunits and keeps getting no work available so when it uploads a finished unit it asks for more work and reports at the same time. dunno..

no when one finishes it starts a new one and the one that was in progress continues uninterrupted.

riofl · « **Reply #336 on:** 16 Aug 2009, 09:21:07 am »

Quote from: sunu on 16 Aug 2009, 08:20:23 am

Quote from: riofl on 16 Aug 2009, 06:38:19 am
was concerned since previously i have never had a cuda work unit take more than 15min to process with typical 9 to 13 min, they are now taking approx 30 min for each card. and my rac has dropped for this machine by more than 400 points. ill just keep plugging away for a while to let things settle out. nothing was changed in the 'backend' applications so it must be the larger workunits presented.

No,,this is not good. Check how boinc handles the tasks. When a cuda workunit finishes, does it also stop the other one running to start a new pair?

hmm found a few interesting things in wandering thru the workunits on the web. found a few of this one:

Work Unit Info:
...............
Credit multiplier is : 2.85
WU true angle range is : 2.715027
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected exceeds the storage space allocated.

i am assuming it did not have enough allocated ram so i increased the allocation substantially. plenty of disk space allocation (40G available to boinc, 385mb used). also my pending credit is higher than ever at 80k+ so maybe that is also why my rac has dropped. it simply needs to catch up to itself.

all this simultaneously makes it impossible to point a finger

especially since i also this week replaced my screwy ballistix ram with ocz blade ram.. went from 4x1gb 2.0v sticks to 2x2gb 1.8v sticks . this ocz should be reliable. had 11 RMAs on ballistix in 16 months and just got tired of it. the ocz has given the machine a slightly smoother personality so i am hopeful there but i have no clue how the raw performance is ... technically it should be better since i went from 4pcs dual channel to 2pcs dual channel which is supposed to be an improvement, plus the lower voltage is better as well.

sunu · « **Reply #337 on:** 16 Aug 2009, 12:25:06 pm »

Unless you have a faulty card (if it is cuda) or cpu/ram (if it is a cpu workunit), result overflows are pretty much "normal" and they don't have anything to do about your memory/storage allocations.

Check why your pending cache has increased. Is it genuine "waiting for validation" or is it suspicious "validation inconclusive" If it is the latter, check those workunits if you have returned very strange and different to your wingman results. Check also in the invalid category of your tasks page if there are any there.

30min for a CUDA wu seem too much. Unless you have a lower end card.

riofl · « **Reply #338 on:** 16 Aug 2009, 11:05:15 pm »

Quote from: sunu on 16 Aug 2009, 12:25:06 pm

Unless you have a faulty card (if it is cuda) or cpu/ram (if it is a cpu workunit), result overflows are pretty much "normal" and they don't have anything to do about your memory/storage allocations.

Check why your pending cache has increased. Is it genuine "waiting for validation" or is it suspicious "validation inconclusive" If it is the latter, check those workunits if you have returned very strange and different to your wingman results. Check also in the invalid category of your tasks page if there are any there.

30min for a CUDA wu seem too much. Unless you have a lower end card.

there were only 2 or maybe 3 overflow errors out of 8 or 9 pages i looked through. there are a lot of waiting for validation for inconclusive but the majority are processed and validated. card device 0 is a gtx 285, a xfx overclocked black edition (127gflops by boinc) and the device1 is a prerelease tesla c1060 which has 1gb ram instead of the 4 in production and a bit slower clock speeds (74gflops by boinc)..

it seems that the workunits are very large. boinc is showing time to completion for those waiting to process of about 2:40 in this current cache, including the cuda workunits. the only changes made besides new downloads since the 13min workunits for cuda and 50min to 1.5hr workunits for cpu has been the boinc upgrade from 6.6.11 and the change in system ram on monday. the gpus both are running a satisfactory temp of 62-67c under load and cpus under load betw 52 and 59c with averages around 55c so its all running cool enough. the bios diagnostics show nothing wrong so my only guess is the kind of workunits i am getting now.

glxgears is showing around 10kFPS which is where the gtx 285 has run since i first got it and nvidia-settings shows both cards running at their maximum performance level although the wording changed for the gtx. it used to say maximum performance now it says desktop, but the numbers are still the same and i suspect it is a change of driver versions that changed it. i have been running the same driver for weeks now.

also desktop performance is as good as it always was.. so i am at a loss to explain the sudden 30min cuda processing unless it is the workunits supplied. the script is making sure there are no vlar/vhar fed to the cuda devices. in fact, lately the cpu workunits have been nothing but vlar/vhar units with whatever normal ones they may have been assigned being changed to cuda.

my pending credits have always been around 40k but it jumped to 80k i guess recently. i cannot say for sure because i rarely check it so there has been maybe a month or two between those numbers.

i know my rac drops when i have boinc shut down for several hours and that is normal, but over the past week i have lost now nearly 500 points in average on this one machine. i wonder if running that script and stopping/restarting boinc with an 8 second delay 3 times an hour may be causing the drop?

the only other thing that may be affecting it is the ambient temp of the room which has been considerably higher this week raising the ambient of the case. gtx ambient is running around 55c now and previous weeks it has run closer to 48c but none of this is anywhere close to limits that would cause any kind of power/speed controls kicking in to cool things down.

what is interesting is i just thought of looking at the boinc cpu benchmarks which i largely ignore so i just ran them a few seconds ago. the floating point is within normal range it has always been but the interesting thing is the integer benchmark is just under 4k higher than normal! that may be the new system ram configuration affecting that though.

weirdness abounds...

sunu · « **Reply #339 on:** 17 Aug 2009, 07:32:43 am »

riofl, give me a link to your host.

Compiled boinc gave me also increased benchmarks. Don't have any real importance though.

riofl · « **Reply #340 on:** 17 Aug 2009, 11:09:49 pm »

Quote from: sunu on 17 Aug 2009, 07:32:43 am

riofl, give me a link to your host.

Compiled boinc gave me also increased benchmarks. Don't have any real importance though.

ok so then it doesnt mean anything about my ram change... hope this is the right link. i took it from the details link from my computer listings page

http://setiathome.berkeley.edu/show_host_detail.php?hostid=4166601

sunu · « **Reply #341 on:** 18 Aug 2009, 12:04:40 pm »

riofl, I'm sure you know, your tesla card has some problems. It gives errors in some workunits. If you look in your errors page, all those workunits were run by the tesla card. It does run successfully though in other workunits.

Checking the reported run times, I don't see any significant difference between eg 18 August and 14 August when you were running 6.6.11.

I do see though that most of the workunits were restarted 2 or 3 or more times. The initialization phase of a cuda task takes about 30 sec. If it is restarted 2 times you lose 1 min and with a total computation time of eg. 14 min you lose 7% credit right there.

You've said that you run the rebranding script several times per hour, why? I search for vlars once per day, sometimes once per two days, and that is more than enough. If newly downloaded tasks get crunched only a few hours later, increase your cache so they are crunched after 2-3 or more days so running the script once per day will be enough.

riofl · « **Reply #342 on:** 18 Aug 2009, 12:42:54 pm »

Quote from: sunu on 18 Aug 2009, 12:04:40 pm

riofl, I'm sure you know, your tesla card has some problems. It gives errors in some workunits. If you look in your errors page, all those workunits were run by the tesla card. It does run successfully though in other workunits.

Checking the reported run times, I don't see any significant difference between eg 18 August and 14 August when you were running 6.6.11.

I do see though that most of the workunits were restarted 2 or 3 or more times. The initialization phase of a cuda task takes about 30 sec. If it is restarted 2 times you lose 1 min and with a total computation time of eg. 14 min you lose 7% credit right there.

You've said that you run the rebranding script several times per hour, why? I search for vlars once per day, sometimes once per two days, and that is more than enough. If newly downloaded tasks get crunched only a few hours later, increase your cache so they are crunched after 2-3 or more days so running the script once per day will be enough.

yes i am going to retire the tesla short.y but first im going to try to replace the existing incorrect thermal pads on all the chips. if that doesnt fix it i will ship it back to my boss who sent it to me in the first place and let him use it on a windows setup. i will then run just the single gtx285 for a month or 2 and then get another gtx285 to replace the tesla. the 285 is considerably faster than the tesla anyway. (127gflops vs 74gflops by boinc measurements)

the reason i picked every 20 min was i noticed a number of computation error results when i ran it just once an hour. i suppose the easiest way to keep it from downloading all the time is to set the cache to 10 days, get it all then run the script then turn it back to 2 days or something so it wont download more. when i ran the cpugpureport script several times i found that it showed vlar/vhar assigned to gpu sometimes several times in an hour meaning it got more workunits. a second reason for doing this is the tesla locks up sometimes as much as every hour and a restart of boinc cures it.

maybe ill just ignore all that and run the script once every few hours using a large cache so it wont get more without me making it do so. that way i can keep control of it rejecting vlars.

thanks for checking. i didnt think frequent usage of the script would cause that much of a change.

riofl · « **Reply #343 on:** 18 Aug 2009, 01:45:37 pm »

ok i changed my cache from 6 days to 10 but no workunits.. prob cause the project is in maintenance. script is set to run every 4 hrs until it gets its cache then ill change that to once a day and set the cache back to 2 days.

lets see what that does

sunu · « **Reply #344 on:** 18 Aug 2009, 02:02:43 pm »

Why play with your cache levels? Change from 6 to 10 and then back to 2, why? Pick a cache level and leave it there. I'm using 10 days. If you were using 6 days, it is fine also. 6 days cache means that the workunits downloaded now will be crunched in about 6 days, so you have 6 days to check for vlars. No need to run that script x times per hour or x times per day.

Author Topic: SETI MB CUDA for Linux (Read 656078 times)

riofl

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux