Author Topic: SETI MB CUDA for Linux (Read 649573 times)

Claggy · « **Reply #555 on:** 17 Jan 2010, 07:03:09 pm »

Page One of this thread?, and i think there might be one or two others spread about on different pages in this thread.

Claggy

riofl · « **Reply #556 on:** 17 Jan 2010, 07:22:28 pm »

page 1 points to the one i have other than the vlar killer. i didnt keep the url when i got the vlar killer app and am just looking for something newer than jan 2009.. the vlarkill was july 2009 and since it was labled -2.2- i am assuming there is a non vlarkill app also updated? searches on this site or on google show nothing.. its too hard finding updates. its like people want to hide them to make a game out of it. ill just stick with what i am running.

thanks!

sunu · « **Reply #557 on:** 17 Jan 2010, 07:41:17 pm »

Quote from: riofl on 17 Jan 2010, 06:58:19 pm

is there a link for a setiathome 6.08 app not a vlar killer for cuda 2.2+?

No, because there is no such app.

Quote from: riofl on 17 Jan 2010, 06:58:19 pm

all i can find is a 64bit app dated january 2009. anything newer?

No. If you want a non-VLAR-kill app that's the one to get. It's only a tiny bit slower than the 2.2 so you won't be missing much.

riofl · « **Reply #558 on:** 18 Jan 2010, 04:32:32 am »

ahh ok thanks... it will probably make up the difference in the fact that it will complete more units since it will not reject any... i finally got to look at some of the error units and vlar killer got to them even with the script. the sctipt considers .13 to be a vlar angle and i had changed that somewhat to give the cpu some work units that didnt meet that criteria..i had mine set to call a vlar 0.30 and move anything less than that to the cpu.. the vlar killer killed some work units with angles of 0.49 approx. so i decided to revert back to the non killer app.

Raistmer · « **Reply #559 on:** 18 Jan 2010, 07:43:21 am »

hm... what a weird "vlar killer" you have??? It's mid-range ARs where GPU is most effective!
Check again your results, probably that was different kind of error.

sunu · « **Reply #560 on:** 18 Jan 2010, 08:13:34 am »

Yes, I agree with Raistmer, that must have been something different. I don't know exactly what ARs trigger the VLAR kill but I've never seen it kill anything larger than 0.20.

riofl · « **Reply #561 on:** 18 Jan 2010, 01:04:38 pm »

the error said something to the effect of VLAR killed angle 0.49

i thought it was a bit high, but there were several work units with the same range of angles that were killed by it. the non killer app is working perfectly. i am down to using 0.25 as the cutoff angle and its just purring right along even with my desktop features enabled that i like.

i was getting ready to go out and get an ati card for my video and just use the nvidia cards for cuda only as i thought that i was using gpu resources for the desktop that caused problems with the cuda apps (desktop cube, shading and transparency features etc... experimenting with those just to see what 'glitter' was like

). especially when i turned those features off things started working again, but with this non killer app, i can keep everything enabled and it all lives together well.

riofl · « **Reply #562 on:** 18 Jan 2010, 01:14:53 pm »

ok i went back and looked at them again. i was not very awake last time i looked at them. it was another error an fft.cu error.... however i dont get that with this app unless it was those specific workunits..

here are a few. there were maybe 30 workunits errored out and i looked at 10 of them just now and all had the same error except this first one.

Device 1: GeForce GTX 285 is okay
SETI@home using CUDA accelerated device GeForce GTX 285
setiathome_enhanced 6.01 Revision: 402 g++ (GCC) 4.2.1 (SUSE Linux)
libboinc: BOINC 6.7.0

Work Unit Info:
...............
WU true angle range is : 10.416071
SETI@home error -12 Unknown error
cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel
File: ./cudaAcc_pulsefind.cu
Line: 232

--------------------------------------------

setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 285 is okay
SETI@home using CUDA accelerated device GeForce GTX 285
setiathome_enhanced 6.01 Revision: 402 g++ (GCC) 4.2.1 (SUSE Linux)
libboinc: BOINC 6.7.0

Work Unit Info:
...............
WU true angle range is : 0.437965
CUFFT error in file './cudaAcc_fft.cu' in line 62.

-----------------------------------------

setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 285 is okay
SETI@home using CUDA accelerated device GeForce GTX 285
setiathome_enhanced 6.01 Revision: 402 g++ (GCC) 4.2.1 (SUSE Linux)
libboinc: BOINC 6.7.0

Work Unit Info:
...............
WU true angle range is : 0.407435
CUFFT error in file './cudaAcc_fft.cu' in line 62.

------------------------------------------

setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 285 is okay
SETI@home using CUDA accelerated device GeForce GTX 285
setiathome_enhanced 6.01 Revision: 402 g++ (GCC) 4.2.1 (SUSE Linux)
libboinc: BOINC 6.7.0

Work Unit Info:
...............
WU true angle range is : 0.437965
CUFFT error in file './cudaAcc_fft.cu' in line 62.

sunu · « **Reply #563 on:** 18 Jan 2010, 07:15:03 pm »

riofl, is the computer 4166601 ( http://setiathome.berkeley.edu/show_host_detail.php?hostid=4166601 ) yours?

The error cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel is a "normal" one. There is nothing in it.

The errors from that computer's page are interesting. They occur right after the "preparatory" phase in the CPU and when the GPU was supposed to take over. I've checked a few and all seem to happen in your "good" GTX285 card and not in the problematic tesla card. am I right?

If I remember correctly you were experiencing unusually high run times in your GPUs, does it still happen?

There is definitely something not right with the setup of this computer.

I think I've asked you before and you have told me the brand of your motherboard, can you remind me?

riofl · « **Reply #564 on:** 19 Jan 2010, 12:16:26 pm »

Quote from: sunu on 18 Jan 2010, 07:15:03 pm

riofl, is the computer 4166601 ( http://setiathome.berkeley.edu/show_host_detail.php?hostid=4166601 ) yours?

The error cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel is a "normal" one. There is nothing in it.

The errors from that computer's page are interesting. They occur right after the "preparatory" phase in the CPU and when the GPU was supposed to take over. I've checked a few and all seem to happen in your "good" GTX285 card and not in the problematic tesla card. am I right?

If I remember correctly you were experiencing unusually high run times in your GPUs, does it still happen?

There is definitely something not right with the setup of this computer.

I think I've asked you before and you have told me the brand of your motherboard, can you remind me?

yes that is the computer... the tesla is problematic in that it simply locks up randomly. there is a hardware problem with it. restarting boinc cures it for some time. vidram test software shows a bad ram chip around the 700mb mark. i think my workstation is using more resources than i think it does and the gtx285 is simply overwhelmed if i have kde options enabled and does not have enough resources for seti.

my times now are averaging 16-18 min off the tesla and 19-22min off the gtx285 . much better than previously at around 30 min. my scores have finally climbed to near 15k like you said they should be.

the computing errors were happening just as the gpu was supposed to take over. that was when i had all the 'cute' features of kde4 enabled which included dimming of unfocused windows and cube desktop switching and several other things including sharpen desktop (all experimental to see what it was like to use a workstation that had glitz enabled). i also use dual 24" monitors each at 1920x1200 using nvidia twinview option so i am sure that takes up a bit of vid resources as well. i also use different backgrounds on each of 9 desktops, same image loaded in each monitor/desktop.

once i disabled the glitz and glitter options and did a power down restart to allow everything to clear and changed back to the older non vlar killer app, all the errors stopped.

the system is an intel q6600 quad processor overclocked to 3.0ghz using a 9 multiplier and 333mhz bus, OCZ ram is adjusted to stock frequency of ddr2-1066 . ram factory recommended timings were adjusted slightly from 5-5-5-18 to 5-5-5-15 and cpu and ram voltages are stock factory recommendations. instead of auto, the pci-e bus speed is locked at 100mhz since the gigabyte board in full auto mode tends to adjust everything as it wants which could be dangerous.

the motherboard is a gigabyte GA-P35-DS4-rev2.1

things have been stable for the past 20 hours or so since i readjusted everything back to standard dull desktop

Richard Haselgrove · « **Reply #565 on:** 19 Jan 2010, 01:23:26 pm »

This may be something we ought to remember when writing stock boilerplate answers to message board grumbles. It's already fairly standard to say "don't expect your graphics card to draw a screensaver while crunching CUDA". I think it's also sometimes mentioned, though perhaps less often than it should be, that the "Aero" effects in Vista, and whatever they call the equivalent in Windows 7, eat up VRAM - much more so than the simple frame buffer for the final output, no matter what the resolution, in my opinion (never seen any problem with the 1600 x 1200 screens I use here, even on 512MB CUDA cards).

I've also started to see reports from users of Mac OS X, who have just gained the ability to run Einstein on CUDA - or not, if they only have 512MB. One poster attributed the loss of 125MB available memory (512MB --> 387MB) to OS effects alone.

sunu · « **Reply #566 on:** 19 Jan 2010, 01:34:52 pm »

Quote from: riofl on 19 Jan 2010, 12:16:26 pm

i think my workstation is using more resources than i think it does and the gtx285 is simply overwhelmed if i have kde options enabled and does not have enough resources for seti.
...
the computing errors were happening just as the gpu was supposed to take over. that was when i had all the 'cute' features of kde4 enabled which included dimming of unfocused windows and cube desktop switching and several other things including sharpen desktop (all experimental to see what it was like to use a workstation that had glitz enabled). i also use dual 24" monitors each at 1920x1200 using nvidia twinview option so i am sure that takes up a bit of vid resources as well. i also use different backgrounds on each of 9 desktops, same image loaded in each monitor/desktop.

With 1 GB ram I think you should be safe. Still, even if it hadn't memory left it should throw an out of memory error message and switch to CPU computing, not error out completely.

Quote from: riofl on 19 Jan 2010, 12:16:26 pm

my times now are averaging 16-18 min off the tesla and 19-22min off the gtx285 . much better than previously at around 30 min. my scores have finally climbed to near 15k like you said they should be.

Maybe you could do better, 20000+ RAC

Quote from: riofl on 19 Jan 2010, 12:16:26 pm

once i disabled the glitz and glitter options and did a power down restart to allow everything to clear and changed back to the older non vlar killer app, all the errors stopped.
...
things have been stable for the past 20 hours or so

If you were looking in your errors page and didn't see new errors that was because they haven't been updated since 17th January, not because there weren't new errors.

Quote from: riofl on 19 Jan 2010, 12:16:26 pm

cpu and ram voltages are stock factory recommendations. instead of auto

Maybe this isn't enough? What are their values? For cpu voltages don't look at bios, see the real value with 100% CPU utilization under seti.

Quote from: riofl on 19 Jan 2010, 12:16:26 pm

since i readjusted everything back to standard dull desktop

Personally I don't like kde's effects now that I've seen them in sidux and I've them also switched offf. Compiz effects are way better I think.

Quote from: Richard Haselgrove on 19 Jan 2010, 01:23:26 pm

I've also started to see reports from users of Mac OS X, who have just gained the ability to run Einstein on CUDA - or not, if they only have 512MB. One poster attributed the loss of 125MB available memory (512MB --> 387MB) to OS effects alone.

1GB video RAM might not seem excessive any more but the bare minimum?

riofl · « **Reply #567 on:** 19 Jan 2010, 05:56:57 pm »

out of memory error: true. its just weird they errored out like they did. maybe the gpu did not have enough shader resources when the options were enabled for the desktop. i dont know when shader threads are enabled but i am assuming with the dimming options and cube rotation / transparency options it uses the shaders..

yeah some of the options arent bad.. i like a *slight* fade of non focus windows makes it easier to pay attention to the one in focus.. but the rest... the cube stuff started making me dizzy

i have no idea how much vidram im actually using but i am sure its quite a bit. 9x2 1 to 3mb backgrounds would use up to a good 30mb ram keeping that in vid memory if it does.. i really dont know how it interfaces with vid cards.... will have to check some of the utilities to see if they show it or find one on the net.

20k+ rac huhj? might be pushing this puppy a little bit

i have only used the voltages in the bios.. under load i have nothing that reads them properly. for some reason lm sensors and gkrellm report the voltage sensors are in error.. for example... 2.85v for the 12v line? nope.. nada... only voltage readouts that make any sense are the ram and some cpu voltages but i am guessing they are that since they are only labelled in1 in2 in3 in4 etc... the only thing i know for sure is correct is in1 as ram voltage. it matches what the bios says., and the fans and temps. temp1 is the mosfets and temp2 is the southbridge. i discovered that with a hair dryer against the chips.. and discovered what fanx belonged to which fan by unplugging the fan to see which one dropped to 0.

to give you an idea here is the sensors output:

Adapter: ISA adapter
in0: +1.22 V (min = +0.00 V, max = +4.08 V)
in1: +1.89 V (min = +0.00 V, max = +4.08 V)
in2: +3.22 V (min = +0.00 V, max = +4.08 V)
in3: +2.94 V (min = +0.00 V, max = +4.08 V)
in4: +1.84 V (min = +0.00 V, max = +4.08 V)
in5: +0.08 V (min = +0.00 V, max = +4.08 V)
in6: +1.02 V (min = +0.00 V, max = +4.08 V)
in7: +2.93 V (min = +0.00 V, max = +4.08 V)
in8: +3.30 V
fan1: 2360 RPM (min = 0 RPM)
fan2: 2102 RPM (min = 0 RPM)
fan3: 1406 RPM (min = 0 RPM)
fan4: 1415 RPM (min = 0 RPM)
temp1: +41.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor
temp2: +44.0°C (low = +127.0°C, high = +127.0°C) sensor = thermal diode
temp3: -2.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor
cpu0_vid: +1.219 V

i am going to have to find a reliable reporting tool to make sure what they are.

sunu · « **Reply #568 on:** 19 Jan 2010, 07:43:16 pm »

Quote from: riofl on 19 Jan 2010, 05:56:57 pm

20k+ rac huhj? might be pushing this puppy a little bit

Why not?

Quote from: riofl on 19 Jan 2010, 05:56:57 pm

i have only used the voltages in the bios.. under load i have nothing that reads them properly. for some reason lm sensors and gkrellm report the voltage sensors are in error.. for example... 2.85v for the 12v line? nope.. nada... only voltage readouts that make any sense are the ram and some cpu voltages but i am guessing they are that since they are only labelled in1 in2 in3 in4 etc... the only thing i know for sure is correct is in1 as ram voltage. it matches what the bios says., and the fans and temps. temp1 is the mosfets and temp2 is the southbridge. i discovered that with a hair dryer against the chips.. and discovered what fanx belonged to which fan by unplugging the fan to see which one dropped to 0.

I agree that lm-sensors reports most of the voltages incorrectly, with the rest I'll have to disagree.
There are no RAM voltages in those values, in fact I don't think there is a utility that can show them, windows, linux or whatever.
temp1 mosfets? Maybe, if you watercool them. Mosfets go high, really high, 100+ °C.

From the values you posted, the two that resemble your CPU voltage (vcore) is in0 (1.22) and cpu0_vid (1.219). The "cpu0_vid" is a very interesting name. "VID" is something like a default voltage for the chip. The lower it is the more overclockable the chip is. A Q6600 with a VID of 1.219 is very very good. My Q6600 with a VID of 1.2750 (average for this chip) has easily gone to 3.24 GHz.

The thing is I don't think lm-sensors can show the VID of a chip and it is just the vcore with a fancy name. Now if we assume that vcore=in0=cpu0_vid=1.22, I think it is a little low for 3GHz. Maybe try 1.24-1.25volts. Still it depends on the VID of the chip. If the VID is really 1.219 then 1.22 is not necessarily bad. Then again I don't think the VID can take such a value (1.219), it goes with increments.

Is this machine dual boot with windows by any chance?

riofl · « **Reply #569 on:** 19 Jan 2010, 11:08:56 pm »

Quote from: sunu on 19 Jan 2010, 07:43:16 pm

Quote from: riofl on 19 Jan 2010, 05:56:57 pm
20k+ rac huhj? might be pushing this puppy a little bit
Why not?

Quote from: riofl on 19 Jan 2010, 05:56:57 pm
i have only used the voltages in the bios.. under load i have nothing that reads them properly. for some reason lm sensors and gkrellm report the voltage sensors are in error.. for example... 2.85v for the 12v line? nope.. nada... only voltage readouts that make any sense are the ram and some cpu voltages but i am guessing they are that since they are only labelled in1 in2 in3 in4 etc... the only thing i know for sure is correct is in1 as ram voltage. it matches what the bios says., and the fans and temps. temp1 is the mosfets and temp2 is the southbridge. i discovered that with a hair dryer against the chips.. and discovered what fanx belonged to which fan by unplugging the fan to see which one dropped to 0.

I agree that lm-sensors reports most of the voltages incorrectly, with the rest I'll have to disagree.
There are no RAM voltages in those values, in fact I don't think there is a utility that can show them, windows, linux or whatever.
temp1 mosfets? Maybe, if you watercool them. Mosfets go high, really high, 100+ °C.

From the values you posted, the two that resemble your CPU voltage (vcore) is in0 (1.22) and cpu0_vid (1.219). The "cpu0_vid" is a very interesting name. "VID" is something like a default voltage for the chip. The lower it is the more overclockable the chip is. A Q6600 with a VID of 1.219 is very very good. My Q6600 with a VID of 1.2750 (average for this chip) has easily gone to 3.24 GHz.

The thing is I don't think lm-sensors can show the VID of a chip and it is just the vcore with a fancy name. Now if we assume that vcore=in0=cpu0_vid=1.22, I think it is a little low for 3GHz. Maybe try 1.24-1.25volts. Still it depends on the VID of the chip. If the VID is really 1.219 then 1.22 is not necessarily bad. Then again I don't think the VID can take such a value (1.219), it goes with increments.

Is this machine dual boot with windows by any chance?

well, the mobo has heatpipe cooling for the mosfets, north and south bridges.. when i messed with the hair dryer i aimed it at each of the heatsinks and the 2 ends, mosfetSand sourthbridge is where i got the most sensitive individual temp variations.. aiming at the northbridge (middle heatsink on the pipe) gave me hardly any variation at all and it was mostly even changes between the 2 temps.

here is the image gallery. the temps 1 and 2 are most sensitive to changes on the outer two heatsinks which the manual says is mosfet on the left and southbridge on the right of the board top image.

http://www.newegg.com/Product/ImageGallery.aspx?CurImage=13-128-064-S01&SCList=13-128-064-S01%2c13-128-064-S02%2c13-128-064-S03%2c13-128-064-S04%2c13-128-064-S05%2c13-128-064-S06%2c13-128-064-S07&S7ImageFlag=2&Item=N82E16813128064&Depa=0&WaterMark=1&Description=GIGABYTE%20GA-P35-DS4%20Rev.%202.0%20LGA%20775%20Intel%20P35%20ATX%20Ultra%20Durable%20II%20Intel%20Motherboard

i only know the processor is a "G0" chip which is the best overclockable chip of that model. the other one is a "B" something.. i can check the bios readings for the cpu voltages to see what they say. this voltage will probably be displayed in ht as well.

no. i dont have dual boot. i am exclusively linux. i do have xp in virtualbox so i can guide ppl to configs they need to change for my job, but other than that it sits there crunching numbers.. when i first got the gtx285 i did hook an ide drive ioto my system and installed windows on that so i could try out riva tuner and evga precision .. i wanted to change the calibration on the fans auto setups on both cards so they would be more aggressive but found i could not do that within 4he bios itself only by setting things with the driver. since i have no counterpart to do so in linux i could not do it so i just use nvclock to set the fans at 100% all the time. so anyway if i have to i can boot from that ide hdd. when i do i always protect every other drive in my system by unplugging them

i trust windows as far as i would actually use it for my desktop which is a very solid NEVER.

of course booting from windows will not get my system running at 100% load since boinc will not be installed. in fact, the windows installation on that ide drive does not even have networking capabilities. i made sure it was stuck to the hard drive only and anything added has to be done by cd. i will not give raw windows a chance to touch anything on my network. in my opinion it explores networks too much trying to know more than it needs to about a lan. my virtualbox windows xp has a tunnel directly to the 'outside' and has no permission to touch anything i have not specified on my system or my network , which is nothing. it has to use outside dns, time and anything else it needs since it cannot see my machine or my lan..

are there programs i would have to install on the windows drive? if so i would have to download them with linux and burn a cd for them.

Author Topic: SETI MB CUDA for Linux (Read 649573 times)

Claggy

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

Raistmer

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

Richard Haselgrove

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux