Forum > Linux
SETI MB CUDA for Linux
riofl:
--- Quote from: sunu on 16 Aug 2009, 08:20:23 am ---
--- Quote from: riofl on 16 Aug 2009, 06:38:19 am ---was concerned since previously i have never had a cuda work unit take more than 15min to process with typical 9 to 13 min, they are now taking approx 30 min for each card. and my rac has dropped for this machine by more than 400 points. ill just keep plugging away for a while to let things settle out. nothing was changed in the 'backend' applications so it must be the larger workunits presented.
--- End quote ---
No,,this is not good. Check how boinc handles the tasks. When a cuda workunit finishes, does it also stop the other one running to start a new pair?
--- End quote ---
i think it may be because it is asking for new gpu workunits and keeps getting no work available so when it uploads a finished unit it asks for more work and reports at the same time. dunno..
no when one finishes it starts a new one and the one that was in progress continues uninterrupted.
riofl:
--- Quote from: sunu on 16 Aug 2009, 08:20:23 am ---
--- Quote from: riofl on 16 Aug 2009, 06:38:19 am ---was concerned since previously i have never had a cuda work unit take more than 15min to process with typical 9 to 13 min, they are now taking approx 30 min for each card. and my rac has dropped for this machine by more than 400 points. ill just keep plugging away for a while to let things settle out. nothing was changed in the 'backend' applications so it must be the larger workunits presented.
--- End quote ---
No,,this is not good. Check how boinc handles the tasks. When a cuda workunit finishes, does it also stop the other one running to start a new pair?
--- End quote ---
hmm found a few interesting things in wandering thru the workunits on the web. found a few of this one:
Work Unit Info:
...............
Credit multiplier is : 2.85
WU true angle range is : 2.715027
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected exceeds the storage space allocated.
i am assuming it did not have enough allocated ram so i increased the allocation substantially. plenty of disk space allocation (40G available to boinc, 385mb used). also my pending credit is higher than ever at 80k+ so maybe that is also why my rac has dropped. it simply needs to catch up to itself.
all this simultaneously makes it impossible to point a finger :P especially since i also this week replaced my screwy ballistix ram with ocz blade ram.. went from 4x1gb 2.0v sticks to 2x2gb 1.8v sticks . this ocz should be reliable. had 11 RMAs on ballistix in 16 months and just got tired of it. the ocz has given the machine a slightly smoother personality so i am hopeful there but i have no clue how the raw performance is ... technically it should be better since i went from 4pcs dual channel to 2pcs dual channel which is supposed to be an improvement, plus the lower voltage is better as well.
sunu:
Unless you have a faulty card (if it is cuda) or cpu/ram (if it is a cpu workunit), result overflows are pretty much "normal" and they don't have anything to do about your memory/storage allocations.
Check why your pending cache has increased. Is it genuine "waiting for validation" or is it suspicious "validation inconclusive" If it is the latter, check those workunits if you have returned very strange and different to your wingman results. Check also in the invalid category of your tasks page if there are any there.
30min for a CUDA wu seem too much. Unless you have a lower end card.
riofl:
--- Quote from: sunu on 16 Aug 2009, 12:25:06 pm ---Unless you have a faulty card (if it is cuda) or cpu/ram (if it is a cpu workunit), result overflows are pretty much "normal" and they don't have anything to do about your memory/storage allocations.
Check why your pending cache has increased. Is it genuine "waiting for validation" or is it suspicious "validation inconclusive" If it is the latter, check those workunits if you have returned very strange and different to your wingman results. Check also in the invalid category of your tasks page if there are any there.
30min for a CUDA wu seem too much. Unless you have a lower end card.
--- End quote ---
there were only 2 or maybe 3 overflow errors out of 8 or 9 pages i looked through. there are a lot of waiting for validation for inconclusive but the majority are processed and validated. card device 0 is a gtx 285, a xfx overclocked black edition (127gflops by boinc) and the device1 is a prerelease tesla c1060 which has 1gb ram instead of the 4 in production and a bit slower clock speeds (74gflops by boinc)..
it seems that the workunits are very large. boinc is showing time to completion for those waiting to process of about 2:40 in this current cache, including the cuda workunits. the only changes made besides new downloads since the 13min workunits for cuda and 50min to 1.5hr workunits for cpu has been the boinc upgrade from 6.6.11 and the change in system ram on monday. the gpus both are running a satisfactory temp of 62-67c under load and cpus under load betw 52 and 59c with averages around 55c so its all running cool enough. the bios diagnostics show nothing wrong so my only guess is the kind of workunits i am getting now.
glxgears is showing around 10kFPS which is where the gtx 285 has run since i first got it and nvidia-settings shows both cards running at their maximum performance level although the wording changed for the gtx. it used to say maximum performance now it says desktop, but the numbers are still the same and i suspect it is a change of driver versions that changed it. i have been running the same driver for weeks now.
also desktop performance is as good as it always was.. so i am at a loss to explain the sudden 30min cuda processing unless it is the workunits supplied. the script is making sure there are no vlar/vhar fed to the cuda devices. in fact, lately the cpu workunits have been nothing but vlar/vhar units with whatever normal ones they may have been assigned being changed to cuda.
my pending credits have always been around 40k but it jumped to 80k i guess recently. i cannot say for sure because i rarely check it so there has been maybe a month or two between those numbers.
i know my rac drops when i have boinc shut down for several hours and that is normal, but over the past week i have lost now nearly 500 points in average on this one machine. i wonder if running that script and stopping/restarting boinc with an 8 second delay 3 times an hour may be causing the drop?
the only other thing that may be affecting it is the ambient temp of the room which has been considerably higher this week raising the ambient of the case. gtx ambient is running around 55c now and previous weeks it has run closer to 48c but none of this is anywhere close to limits that would cause any kind of power/speed controls kicking in to cool things down.
what is interesting is i just thought of looking at the boinc cpu benchmarks which i largely ignore so i just ran them a few seconds ago. the floating point is within normal range it has always been but the interesting thing is the integer benchmark is just under 4k higher than normal! that may be the new system ram configuration affecting that though.
weirdness abounds... :)
sunu:
riofl, give me a link to your host.
Compiled boinc gave me also increased benchmarks. Don't have any real importance though.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version