+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: SETI MB CUDA for Linux  (Read 503677 times)

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: SETI MB CUDA for Linux
« Reply #210 on: 12 Jul 2009, 03:38:19 pm »
Raistmer do you mean something like this? From cuda 2.2 release notes:

o Individual GPU program launches are limited to a run time
  of less than 5 seconds on a GPU with a display attached.
  Exceeding this time limit causes a launch failure reported
  through the CUDA driver or the CUDA runtime. GPUs without
  a display attached are not subject to the 5 second run time
  restriction. For this reason it is recommended that CUDA is
  run on a GPU that is NOT attached to an X display.

So yes, it also exists in linux.
Exactly. Timer value varies between OSes but it's the same thing.
Quote
Curiously I've crunched tens of thousands of workunits with my GPU that also runs X with ever seeing that kind of error.
Well, maybe you have more fast GPU than user who have issues?...

Tye

  • Guest
Re: SETI MB CUDA for Linux
« Reply #211 on: 12 Jul 2009, 03:50:52 pm »
GPU that used by Windows for video output will subject of 3 or 2 seconds timeout, but secong GPU will not.
Don't know if this relevant to Linux though.

Well, if it is because of the first gpu also drawing the screen then it will probably also exist in linux. We don't have a big sample of seti cuda users with multi gpus in linux. Actually the sample is non-existent  :D

What Tye describes might be some faulty config, strange driver behavior, or some weird motherboard-gpu-gpu hardware incompatibility.

Looking at it a bit closer, it turns out that one 680i motherboard (an ASUS 680i MB) can work with the problem card in the primary slot and the other brand/model (a XFX 680i) cannot, so I think you're right and it's a wierd motherboard-gpu issue (even is crashy with one of those gpus in the primary slot - but put a different one in the primary and move it to the secondary and it's fine).  I may look at see if there's a newer BIOS later, but since all three GPUs are stable and in different machines CUDA'ing away with the 185 drivers, I'll probably take a break from messing with them for awhile.  ;)  Plus I don't have any unused CUDA GPUs on hand to test with.

b0b3r

  • Guest
Re: SETI MB CUDA for Linux
« Reply #212 on: 12 Jul 2009, 04:05:32 pm »
To make some clearance;
- this host have 3 gpu's.
- one nvidia 630 in chipset and two on 295 card.
- display is on 630 in chipset.
- 295 card is dedicated to cuda.
- also this machine is not a workstation, there is no xorg, only text console, mostly work through ssh.

Your system sees three devices.

In your host 5018683, boinc doesn't even see your graphics cards. Are you sure that you have intalled them correctly?

Also in both of your hosts upgrade boinc. 6.4.5 is too old.

I actually do some test so that's why cuda devices disappear.
6.4.5 is currently marked as stable on linux that's why I'm using it and like you see there is no problem with second host.

Strange thing is that both hosts are totally same machine. The only difference is that host with number 2 is newer (in meaning it was built with same components but about week later) and got some new workunits that the older one yet not try.

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: SETI MB CUDA for Linux
« Reply #213 on: 12 Jul 2009, 04:18:26 pm »
6.4.5 is currently marked as stable on linux that's why I'm using it

Disregard that and get a newer version.

and like you see there is no problem with second host.

It has problems, those "unspecified launch failure" errors.

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: SETI MB CUDA for Linux
« Reply #214 on: 12 Jul 2009, 04:31:18 pm »
Well, maybe you have more fast GPU than user who have issues?...
I have a GTX280 and he a GTX295, each GTX295 core more or less the same with GTX280 under cuda. Him saying that he is not running X makes the error message much more strange.

I may look at see if there's a newer BIOS later

I would do the same.
« Last Edit: 12 Jul 2009, 06:22:45 pm by sunu »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: SETI MB CUDA for Linux
« Reply #215 on: 12 Jul 2009, 06:34:56 pm »
I have a GTX280 and he a GTX295, each GTX295 core more or less the same with GTX280 under cuda. Him saying that he is not running X makes the error message much more strange.
Probably we could derive from that this error not connected with watchdog timer expiration. Some other reason...

b0b3r

  • Guest
Re: SETI MB CUDA for Linux
« Reply #216 on: 13 Jul 2009, 06:03:10 pm »
6.4.5 is currently marked as stable on linux that's why I'm using it
Disregard that and get a newer version.
...

Like you advise I do some test with newer version of boinc and I got strange results:
- with 6.6.36 tasks don't run, they hang with "Waiting" status. So I enable all debugs in cc_config, but it give me no answer for what it is waiting.
- with 6.6.20 tasks run, but both of them on single gpu?

When I go back to 6.4.5 tasks run on both gpu but with errors. However I point out that errors happen only on one gpu . So I think maybe it is a hardware problem and decide to do some test with cudamemtester. It run for few hours with no errors, but on default settings it not start test for long time changes detection. I start this test and it show a lot of errors on this single gpu (the second have no errors). With unavailability good tools to do more diagnose in linux (nvclock have poor support for g200 chips) I decide to install windows on this box.

To this moment I observed interesting behaviour. In idle mode both gpus and ram have lover clocks and voltage (0.975V) when it get loaded then clocks are going up, voltage of first gpu go to 1.035V but second one stay at 0.0975. Don't now how accurate is the measure in gpu-z but it show that.

Currently I want to do some testing with stock seti-cuda application for the next few weeks with default voltage and with manually set to 1.035V and then I'll write some feedback.

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: SETI MB CUDA for Linux
« Reply #217 on: 13 Jul 2009, 06:56:35 pm »
- with 6.6.36 tasks don't run, they hang with "Waiting" status. So I enable all debugs in cc_config, but it give me no answer for what it is waiting.

6.6.36 has an option checked by default in preferences to not run cuda while the pc is in use. Uncheck it and you should be ok.

As to the other stuff you mention, interesting. What brand/model are your video cards and motherboard?
« Last Edit: 13 Jul 2009, 06:58:36 pm by sunu »

b0b3r

  • Guest
Re: SETI MB CUDA for Linux
« Reply #218 on: 14 Jul 2009, 04:35:00 am »
MB is Asus M2N-VM DVI and VGA is EVGA GTX 295 with Backplate.

Offline riofl

  • Knight o' The Round Table
  • ***
  • Posts: 240
Re: SETI MB CUDA for Linux
« Reply #219 on: 14 Jul 2009, 08:48:58 am »
Follow all steps (1-4) below:

1)  Use a newer boinc version. The latest is 6.6.36, http://boinc.berkeley.edu/download_all.php . I haven't checked it, I use 6.6.20, direct download link http://boinc.berkeley.edu/dl/boinc_6.6.20_x86_64-pc-linux-gnu.sh
2)  Make sure all the appropriate cuda libs from 2.2 toolkit

libcudart.so
libcudart.so.2
libcudart.so.2.2
libcufft.so
libcufft.so.2
libcufft.so.2.2

are in the projects/setiathome.berkeley.edu directory.

3)  Edit accordingly your ld.so.conf or the corresponding ld-something file of your distro with the above location of the cuda libs.

4)  Place a copy of the cuda client in one of the following locations:

/usr/local/sbin
/usr/local/bin
/usr/sbin
/usr/bin
/sbin
/bin
/usr/games


I have done so and it cured my original problems nicely. However I have since added a usable card (tesla) and now i have some device mixups from boinc. i have posted this msg in the boinc linux forum and crunch3r's forum but so far no replies. i am hoping someone here may have an answer. it appears that boinc's device recognition/usage system is borked. I don't mention it in the msg below but I do have the cuda 2.2 sdk. here is the msg i posted:

I have a weird problem. The system works fine with just my vid card, or even just the Tesla telling Boinc to use only one card because device1 is the last one in the cmdline. . But it falls apart trying to use both.

My setup is as follows:

1. Linux x86_64 running on a q6600 intel system
2. Video card is GTX 285 in first pci-e slot
3. Tesla C1060 is installed in 2nd pci-e slot
4. Boinc version is 6.6.36
5. Nvidia driver is 185.18.14
6. My number_of_gpus is set to 2. I had it at 1 and it made no difference in this behavior below.
6. I have use_all_gpus set to 1 assuming it is a true/false required.
7. I have this statement in my app_info.xml:

<coproc>
<type>CUDA</type>
<count>2</count>
</coproc>


When I start Boinc, it reports 2 Tesla cards instead of the proper ones. Older boincs properly identify both cards. If this were just a naming problem I could live with this but....
With the above coproc statement set to 1,
When I do a ps ax to look at my process list this is what I see:

7987 ? RNLl 0:01 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0
7988 ? RNLl 0:01 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0

and it uses the GTX285 for both simultaneously!

When I have the coproc statement set to 2, it uses the Tesla only and runs only 1 process. it has both device numbers but the GTX285 is not used:

10170 ? RNLl 0:07 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0 --device 1

How can I get this to do the right thing and provide me with processes like these using both cards?

setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0

setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 1


How can I fix this? I know others are using 2 cards successfully.


i am pulling my hair out over this to get it working.


Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: SETI MB CUDA for Linux
« Reply #220 on: 14 Jul 2009, 02:08:16 pm »
...
<coproc>
<type>CUDA</type>
<count>2</count>
</coproc>
...

Start by changing the orange #2 to a 1.  This tag specifies how many GPUs the application (each instance) uses.  AFAIK so far they only ever use 1.  Other stuff, I'm sure some more Linux savvy people can help you with.

Jason
« Last Edit: 14 Jul 2009, 02:10:28 pm by Jason G »

Offline Claggy

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 3111
    • My computers at Seti Beta
Re: SETI MB CUDA for Linux
« Reply #221 on: 14 Jul 2009, 03:12:00 pm »
When I start Boinc, it reports 2 Tesla cards instead of the proper ones. Older boincs properly identify both cards. If this were just a naming problem I could live with this but....
With the above coproc statement set to 1,
When I do a ps ax to look at my process list this is what I see:

7987 ? RNLl 0:01 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0
7988 ? RNLl 0:01 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0

and it uses the GTX285 for both simultaneously!

When I have the coproc statement set to 2, it uses the Tesla only and runs only 1 process. it has both device numbers but the GTX285 is not used:

10170 ? RNLl 0:07 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0 --device 1

How can I get this to do the right thing and provide me with processes like these using both cards?

setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0

setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 1


How can I fix this? I know others are using 2 cards successfully.

Because Boinc versions greater than 6.6.25 only use the most cabable, use a cc_config.xml with this in it:

<cc_config>
  <options>
        <use_all_gpus>1</use_all_gpus>
  </options>
</cc_config>
 

See How do I configure my client using the cc_config.xml file?
for more options and debug flags.

Claggy

Offline riofl

  • Knight o' The Round Table
  • ***
  • Posts: 240
Re: SETI MB CUDA for Linux
« Reply #222 on: 14 Jul 2009, 04:54:55 pm »
When I start Boinc, it reports 2 Tesla cards instead of the proper ones. Older boincs properly identify both cards. If this were just a naming problem I could live with this but....
With the above coproc statement set to 1,
When I do a ps ax to look at my process list this is what I see:

7987 ? RNLl 0:01 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0
7988 ? RNLl 0:01 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0

and it uses the GTX285 for both simultaneously!

When I have the coproc statement set to 2, it uses the Tesla only and runs only 1 process. it has both device numbers but the GTX285 is not used:

10170 ? RNLl 0:07 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0 --device 1

How can I get this to do the right thing and provide me with processes like these using both cards?

setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0

setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 1


How can I fix this? I know others are using 2 cards successfully.

Because Boinc versions greater than 6.6.25 only use the most cabable, use a cc_config.xml with this in it:

<cc_config>
  <options>
        <use_all_gpus>1</use_all_gpus>
  </options>
</cc_config>
 

See How do I configure my client using the cc_config.xml file?
for more options and debug flags.

Claggy

Thanks for the reply! See my item #6 I do have that set in cc_config.

Offline riofl

  • Knight o' The Round Table
  • ***
  • Posts: 240
Re: SETI MB CUDA for Linux
« Reply #223 on: 14 Jul 2009, 05:05:12 pm »
...
<coproc>
<type>CUDA</type>
<count>2</count>
</coproc>
...

Start by changing the orange #2 to a 1.  This tag specifies how many GPUs the application (each instance) uses.  AFAIK so far they only ever use 1.  Other stuff, I'm sure some more Linux savvy people can help you with.

Jason


I agree, but if I do not set it to 2, then it feeds 2 workunits simultaneously to my GTX which is my device 0. my process list shows

7987 ? RNLl 0:01 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0
7988 ? RNLl 0:01 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0

by setting it to 2, it fools seti app into using only the tesla which is at the end of the device parameter list and it only feeds 1 wu at a time. so for now until i find out how to get boinc to set the second app to device 1, this is the most efficient setting. it gives this:

10170 ? RNLl 0:07 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0 --device 1

and the app parses things out and finds device 1 at the end so it uses that and ignores device 0.

i need to somehow make boinc recognize that there are 2 individual cards not the same in the system which they broke. it used to do that... and then to assign an individual seti app to each device.

i have a feel ing this is broken boinc source and i *really* do not want to take the time to dig through to find where it discovers the devices, fix that and then fix the device assignments. my work load for work does not allow me that time needed :( . they had it working to report proper devices in an earlier version, and then it looks like someone decided to place the report on 2 lines and that broke it. however the device assignments were messed up even in that early version ( 6.6.20 i think it was).

i was hoping some kind of screwball configuration magic workaround would force it to do what i need, but i am beginning to doubt that.

Offline riofl

  • Knight o' The Round Table
  • ***
  • Posts: 240
Re: SETI MB CUDA for Linux
« Reply #224 on: 18 Jul 2009, 07:47:42 pm »
Got it working! the formula to make 2 devices work simultaneously in linux is to use, of all things, the ancient 6.4.5 boinc! the device reporter is still borked. it reports 2 teslas instead of 1 gtx and 1 tesla, but it feeds 1 wu to each device like it is supposed to.

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 355
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 44
Total: 44
Powered by EzPortal