SETI MB CUDA for Linux

Forum > Linux

<< < (44/162) > >>

Raistmer:

--- Quote from: sunu on 12 Jul 2009, 04:31:18 pm ---I have a GTX280 and he a GTX295, each GTX295 core more or less the same with GTX280 under cuda. Him saying that he is not running X makes the error message much more strange.

--- End quote ---
Probably we could derive from that this error not connected with watchdog timer expiration. Some other reason...

b0b3r:

--- Quote from: sunu on 12 Jul 2009, 04:18:26 pm ---
--- Quote from: b0b3r on 12 Jul 2009, 04:05:32 pm ---6.4.5 is currently marked as stable on linux that's why I'm using it

--- End quote ---
Disregard that and get a newer version.
...

--- End quote ---

Like you advise I do some test with newer version of boinc and I got strange results:
- with 6.6.36 tasks don't run, they hang with "Waiting" status. So I enable all debugs in cc_config, but it give me no answer for what it is waiting.
- with 6.6.20 tasks run, but both of them on single gpu?

When I go back to 6.4.5 tasks run on both gpu but with errors. However I point out that errors happen only on one gpu . So I think maybe it is a hardware problem and decide to do some test with cudamemtester. It run for few hours with no errors, but on default settings it not start test for long time changes detection. I start this test and it show a lot of errors on this single gpu (the second have no errors). With unavailability good tools to do more diagnose in linux (nvclock have poor support for g200 chips) I decide to install windows on this box.

To this moment I observed interesting behaviour. In idle mode both gpus and ram have lover clocks and voltage (0.975V) when it get loaded then clocks are going up, voltage of first gpu go to 1.035V but second one stay at 0.0975. Don't now how accurate is the measure in gpu-z but it show that.

Currently I want to do some testing with stock seti-cuda application for the next few weeks with default voltage and with manually set to 1.035V and then I'll write some feedback.

sunu:

--- Quote from: b0b3r on 13 Jul 2009, 06:03:10 pm ---- with 6.6.36 tasks don't run, they hang with "Waiting" status. So I enable all debugs in cc_config, but it give me no answer for what it is waiting.

--- End quote ---

6.6.36 has an option checked by default in preferences to not run cuda while the pc is in use. Uncheck it and you should be ok.

As to the other stuff you mention, interesting. What brand/model are your video cards and motherboard?

b0b3r:
MB is Asus M2N-VM DVI and VGA is EVGA GTX 295 with Backplate.

riofl:

--- Quote from: sunu on 04 Jul 2009, 03:00:48 pm ---Follow all steps (1-4) below:

1) Use a newer boinc version. The latest is 6.6.36, http://boinc.berkeley.edu/download_all.php . I haven't checked it, I use 6.6.20, direct download link http://boinc.berkeley.edu/dl/boinc_6.6.20_x86_64-pc-linux-gnu.sh
2) Make sure all the appropriate cuda libs from 2.2 toolkit

libcudart.so
libcudart.so.2
libcudart.so.2.2
libcufft.so
libcufft.so.2
libcufft.so.2.2

are in the projects/setiathome.berkeley.edu directory.

3) Edit accordingly your ld.so.conf or the corresponding ld-something file of your distro with the above location of the cuda libs.

4) Place a copy of the cuda client in one of the following locations:

/usr/local/sbin
/usr/local/bin
/usr/sbin
/usr/bin
/sbin
/bin
/usr/games

--- End quote ---

I have done so and it cured my original problems nicely. However I have since added a usable card (tesla) and now i have some device mixups from boinc. i have posted this msg in the boinc linux forum and crunch3r's forum but so far no replies. i am hoping someone here may have an answer. it appears that boinc's device recognition/usage system is borked. I don't mention it in the msg below but I do have the cuda 2.2 sdk. here is the msg i posted:

I have a weird problem. The system works fine with just my vid card, or even just the Tesla telling Boinc to use only one card because device1 is the last one in the cmdline. . But it falls apart trying to use both.

My setup is as follows:

1. Linux x86_64 running on a q6600 intel system
2. Video card is GTX 285 in first pci-e slot
3. Tesla C1060 is installed in 2nd pci-e slot
4. Boinc version is 6.6.36
5. Nvidia driver is 185.18.14
6. My number_of_gpus is set to 2. I had it at 1 and it made no difference in this behavior below.
6. I have use_all_gpus set to 1 assuming it is a true/false required.
7. I have this statement in my app_info.xml:

<coproc>
<type>CUDA</type>
<count>2</count>
</coproc>

When I start Boinc, it reports 2 Tesla cards instead of the proper ones. Older boincs properly identify both cards. If this were just a naming problem I could live with this but....
With the above coproc statement set to 1,
When I do a ps ax to look at my process list this is what I see:

7987 ? RNLl 0:01 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0
7988 ? RNLl 0:01 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0

and it uses the GTX285 for both simultaneously!

When I have the coproc statement set to 2, it uses the Tesla only and runs only 1 process. it has both device numbers but the GTX285 is not used:

10170 ? RNLl 0:07 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0 --device 1

How can I get this to do the right thing and provide me with processes like these using both cards?

setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0

setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 1

How can I fix this? I know others are using 2 cards successfully.

i am pulling my hair out over this to get it working.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version