Author Topic: SETI MB CUDA for Linux (Read 583125 times)

letni · « **Reply #285 on:** 01 Aug 2009, 12:28:39 pm »

Thanks for the replies.. I am currently trying to use version 6.6.36.. I tried 6.6.20 earlier and had the same results.. I have read places that the 32 bit cuda binary at the beginning of this thread is broke.. I will try the 64 bit binary once I can reload the machine with at 64 bit linux.

Thanks

riofl · « **Reply #286 on:** 01 Aug 2009, 06:38:26 pm »

Quote from: koschi on 31 Jul 2009, 02:31:03 pm

Could you provide a link to your host, or some failed work units?
If you follow the instructions provided by sunu and make sure the Nvidia modules are loaded, then it will also work for you

I'm running the 190.18 driver with CUDA 2.3 libraries and 2.2VLARkill app now on two machines with G92 chips, so far no isses.

hmm must be something i missed then because when i run the same software combination with boinc 6.6.11 and 2 gt200 series cards, it runs fine for about 10 to 12 hours then the desktop begins to pause and sometimes lock up and even the entire system ground to a halt one time. i returned to 2.2 and 185.18.29 and had no trouble since. and yes i paid very careful attention to be sure the 2.3 libs were in their proper places and that ldconfig found them and that ldd to the 2.2 vlarkill app showed no errors and then rebooted the system to make doubly sure the environment was sane.

my gkrellm monitors were the most sensitive to this behavior and began displaying symptoms before it got to the noticable level affecting my desktops. since i run an extrememly busy set of desktops (2 of the desktops display a total of 29 gkrellm monitor strips monitoring our servers in real time) i suspect the 190 driver isn't ready for prime time yet for linux when handling more than near idle desktop load plus cuda.

letni · « **Reply #287 on:** 01 Aug 2009, 10:32:04 pm »

Quote from: koschi on 31 Jul 2009, 02:31:03 pm

Could you provide a link to your host, or some failed work units?
If you follow the instructions provided by sunu and make sure the Nvidia modules are loaded, then it will also work for you

I'm running the 190.18 driver with CUDA 2.3 libraries and 2.2VLARkill app now on two machines with G92 chips, so far no isses.

Here is the machine with 32bit Slackware installed..
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5050097

I am playing around with 64 bit Redhat on the same machine (with the same results) never gets past "Ready to start"
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5052513

As of note.. I am also using the windows BOINC client to connect to the linux box VIA Remote RPC, this is how I can see it never gets past Read to Start.

Thanks..

Letni

koschi · « **Reply #288 on:** 02 Aug 2009, 03:20:52 am »

http://setiathome.berkeley.edu/result.php?resultid=1323470350

Quote

[...]
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce 8600 GTS is okay
SIGSEGV: segmentation violation
Stack trace (16 frames):
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x47cba9]
/lib64/libpthread.so.0[0x7f0954f4f0f0]
/usr/lib64/libcuda.so.1[0x7f09559c3920]
/usr/lib64/libcuda.so.1[0x7f09559c9684]
/usr/lib64/libcuda.so.1[0x7f0955992a0f]
/usr/lib64/libcuda.so.1[0x7f095571e296]
/usr/lib64/libcuda.so.1[0x7f095572ebab]
/usr/lib64/libcuda.so.1[0x7f0955716190]
/usr/lib64/libcuda.so.1(cuCtxCreate+0xaa)[0x7f095571000a]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x5ace4b]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x40d4ca]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x419f23]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x424c7d]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x407f60]
/lib64/libc.so.6(__libc_start_main+0xe6)[0x7f0954bec576]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu(__gxx_personality_v0+0x241)[0x407be9]

Exiting...

</stderr_txt>
]]>

The same errors that I got when trying to use the 2.2VLARkill without putting the app into /usr/bin or any other directory within $PATH. Please check if it helps to copy the app there.
Plese also check with ldd setiathome-CUDA-6.08.x86_64-pc-linux-gnu that all needed libraries are in place.

sunu · « **Reply #289 on:** 02 Aug 2009, 07:07:41 am »

I had some pc troubles and I was offline the whole past week. I'll try to answer some messages that have been posted since.

Quote from: koschi on 26 Jul 2009, 07:12:19 am

After crunching other projects for some months, I restarted SETI@GPU just his morning, but using the initial CUDA build for Linux which uses 100% of one CPU core as well...
So these new versions, that can be found in crunch3rs board, will use only few % of one CPU? That would be awsome...
I'm currently still at 180.60, some 2.6.30 rc and 6.6.17, but willing to update if I could free up that core with a never version of the app

The 100% core usage was a bug with 2.1 and earlier linux cuda libs. With 2.2 and later this has been fixed. Our seti cuda client had nothing to do with it. So you can use anything you like as long as you use 2.2 or later cuda libraries.

Quote from: Tye on 27 Jul 2009, 02:59:42 pm

Yup, that's been bothering me too. I'm wondering if there's a way to trick it into reporting clock time rather than cpu time... I'm using nvidia 185.18.14 and BOINC 6.6.11 btw since I'd like to do multi-GPU here soon.

Are you talking about boinc manager? Some time down the road, boinc manager changed from cpu time to elapsed time. If you use boinc 6.6.11 for multi-gpu you can use a later boinc manager version that shows the elapsed time. I'm using boinc 6.6.11 with boinc manager 6.6.37.
If you're talking about the times reported to Berkeley, you can't change that.

Quote from: riofl on 27 Jul 2009, 03:45:43 pm

6.6.37 was reporting proper cpu/gpu times, but when i went back to 6.6.11 to use multiple devices that time reporting broke. i am not sure if adding a flops statement in app_info.xml will help with that or not.

See my previous reply and no, flops in app_info.xml will not help.

Quote from: koschi on 27 Jul 2009, 04:58:04 pm

The default priority of nice 10 seems to slow the process down on my box, once I switched it to 0 or -5, it processed much faster and collected up CPU time quicker.

Are you talking about the CPU client or the GPU one?

Quote from: Kunin on 29 Jul 2009, 01:41:09 pm

I tried to go to the link, but every time I go to calbe.dw70.de I get access denied... is there another place to get it?

Kunin I've attached it below.

Quote from: koschi on 30 Jul 2009, 02:12:51 am

Has anyone tried the CUDA 2.2 client together with 190.xx drivers and the CUDA 2.3 dlls, if there is some speed-up like under Windows?

I haven't run a comparison but there is no harm using it.

Quote from: riofl on 31 Jul 2009, 12:30:42 pm

does anyone know if there is a cuda 2.3 vlarkill x86_64 app available yet? i am switching everything to 2.3 and the 190 driver today.

Unless Crunch3r makes one... But I don't think there will be any worthy speedup (at least in windows there isn't).

Quote from: letni on 31 Jul 2009, 01:14:06 pm

I'm trying to get CUDA working (with the 32 bit binary posted at message 1 of this thread) with my new 8600GTS in Slackware Linux and I'm having issues.. I have run the nvidia installer, etc, but I get some weird errors..

Please don't use the 32bit client. When we were testing it, it had a strange bug and didn't produce valid results. I haven't checked though if newer cuda libraries make any difference.

Quote from: letni on 31 Jul 2009, 01:14:06 pm

1. The output shows I have a cuda device, however, it says I have revision 0 of the driver installed, even though I have installed the 185.18.14 and updaged to the 185.18.31..
CUDA device: GeForce 8600 GTS (driver version 0, comp
ute capability 1.1, 255MB, est. 18GFLOPS).

This is just cosmetic. Don't pay attention to it.

Quote from: letni on 31 Jul 2009, 01:14:06 pm

2. I have modified my app_info.xml to allow both AK_V8_SSE3 (32bit) and the cuda to run simultaneously (included .xml file).. I have 3 active tasks being worked on, two (for my dual CPU) say setiathome_enhanced 6.03 and run just fine. The third is the CUDA which setiathome_enhanced 6.08 (cuda), and the status NEVER goes past Ready to start. It will eventuall error out with Computation error.
Anyone have any thoughts or advice on how to debug this?

There is an option that is on by default to not run cuda tasks when pc is in use. Check global_prefs.xml for <run_gpu_if_user_active>0</run_gpu_if_user_active> and change that 0 to 1.

Quote from: letni on 31 Jul 2009, 01:14:06 pm

4. I'm not using XWindows at all. This is all console based only.. Is Xorg required to be running to utilize CUDA?

Just copy&pasting from Nvidia:
In order to run CUDA applications, the CUDA module must be
loaded and the entries in /dev created. This may be achieved
by initializing X Windows, or by creating a script to load the
kernel module and create the entries.

An example script (to be run at boot time):

Code: [Select]

#!/bin/bash

modprobe nvidia

if [ "$?" -eq 0 ]; then

# Count the number of NVIDIA controllers found.
N3D=`/sbin/lspci | grep -i NVIDIA | grep "3D controller" | wc -l`
NVGA=`/sbin/lspci | grep -i NVIDIA | grep "VGA compatible controller" | wc -l`

N=`expr $N3D + $NVGA - 1`
for i in `seq 0 $N`; do
mknod -m 666 /dev/nvidia$i c 195 $i;
done

mknod -m 666 /dev/nvidiactl c 195 255

else
exit 1
fi

Quote from: riofl on 01 Aug 2009, 06:38:26 pm

my gkrellm monitors were the most sensitive to this behavior and began displaying symptoms before it got to the noticable level affecting my desktops. since i run an extrememly busy set of desktops (2 of the desktops display a total of 29 gkrellm monitor strips monitoring our servers in real time) i suspect the 190 driver isn't ready for prime time yet for linux when handling more than near idle desktop load plus cuda.

I'm also using 190.18 and 2.3 cuda with no issues. This is my everyday pc with firefox with multitude of tabs and many other applications opening and closing. Have you checked if gkrellm has some kind of memory leak?

[attachment deleted by admin]

Kunin · « **Reply #290 on:** 02 Aug 2009, 07:52:03 am »

Thanks, but before I use it how does the VLAR kill work? I rebrand all of my VLAR to the CPU as soon as I can, and prefer to work whatever units I get since I have the 8 cores just sitting there most of the time.

sunu · « **Reply #291 on:** 02 Aug 2009, 08:08:12 am »

Quote from: Kunin on 02 Aug 2009, 07:52:03 am

Thanks, but before I use it how does the VLAR kill work? I rebrand all of my VLAR to the CPU as soon as I can, and prefer to work whatever units I get since I have the 8 cores just sitting there most of the time.

Well if you miss a VLAR from the rebranding and it gets to your GPU, it will get aborted almost instantly by the client.

Tye · « **Reply #292 on:** 02 Aug 2009, 10:41:12 am »

Quote from: sunu on 02 Aug 2009, 07:07:41 am

Are you talking about boinc manager? Some time down the road, boinc manager changed from cpu time to elapsed time. If you use boinc 6.6.11 for multi-gpu you can use a later boinc manager version that shows the elapsed time. I'm using boinc 6.6.11 with boinc manager 6.6.37.

Perfect - I didn't think of doing that! Thanks again, sunu - you continue to be a big help and it's definitely appreciated!

sunu · « **Reply #293 on:** 02 Aug 2009, 10:48:40 am »

Quote from: Tye on 02 Aug 2009, 10:41:12 am

Perfect - I didn't think of doing that! Thanks again, sunu - you continue to be a big help and it's definitely appreciated!

Thanks Tye!

Tye · « **Reply #294 on:** 02 Aug 2009, 11:16:11 am »

Quote from: Tye on 02 Aug 2009, 10:41:12 am

Quote from: sunu on 02 Aug 2009, 07:07:41 am
Are you talking about boinc manager? Some time down the road, boinc manager changed from cpu time to elapsed time. If you use boinc 6.6.11 for multi-gpu you can use a later boinc manager version that shows the elapsed time. I'm using boinc 6.6.11 with boinc manager 6.6.37.

Perfect - I didn't think of doing that! Thanks again, sunu - you continue to be a big help and it's definitely appreciated!

Argh - somehow it doesn't like running with 6.6.11 boinc and 6.6.36 boincmgr... Is there some trick to that I'm missing?

sunu · « **Reply #295 on:** 02 Aug 2009, 12:17:15 pm »

Quote from: Tye on 02 Aug 2009, 11:16:11 am

Argh - somehow it doesn't like running with 6.6.11 boinc and 6.6.36 boincmgr... Is there some trick to that I'm missing?

What problem do you have? Just copy boincmgr to your 6.6.11 installation.

riofl · « **Reply #296 on:** 02 Aug 2009, 02:28:33 pm »

typically memory use hardly changes once my system is stabilized into 'work mode'. this incarnation of gkrellm has been working fine for months and only showed erratic behavior with the 190 driver. thie second i went back to the 185 driver all problems vanished. i even tried recompiling gkrellm and all supporting libraries and sensors just to be sure. so when i reverted back to the 185 version and 2.2 i again recompiled everything mentioned just to be safe. i originally used the gentoo-supplied 2.3 and 190 and when i started having trouble i went directly to nvidia and got them from there, uninstalled the previous ones and used the nvidia installers. same behavior.

i cannot say for sure whether it is the cuda 2.3 libraries or the 190 driver or both. one or both simply do not like something on my system i guess. i just reverted back to 185 and 2.2 and its smooth sailing once again.

Tye · « **Reply #297 on:** 02 Aug 2009, 04:19:14 pm »

Quote from: sunu on 02 Aug 2009, 12:17:15 pm

Quote from: Tye on 02 Aug 2009, 11:16:11 am
Argh - somehow it doesn't like running with 6.6.11 boinc and 6.6.36 boincmgr... Is there some trick to that I'm missing?

What problem do you have? Just copy boincmgr to your 6.6.11 installation.

Yep, that's what I did, but it just sits there frozen at the "Communicating with client" portion on startup. No messages, no display, no processes starting, etc.

sunu · « **Reply #298 on:** 02 Aug 2009, 05:46:58 pm »

Quote from: Tye on 02 Aug 2009, 04:19:14 pm

Yep, that's what I did, but it just sits there frozen at the "Communicating with client" portion on startup. No messages, no display, no processes starting, etc.

Start boinc first and then open boinc manager.

sunu · « **Reply #299 on:** 02 Aug 2009, 06:19:55 pm »

@letni
Please see my big post above, I'm talking about cuda with no X server. Are you sure you've set it up all correctly? Also make sure you use compatible nvidia drivers and cuda libraries.

Also see my post http://lunatics.kwsn.net/linux/seti-mb-cuda-for-linux.msg19014.html#msg19014 and follow it to the letter.

Your card with 256 MB is borderline. You might get some out of memory messages here and there.

Author Topic: SETI MB CUDA for Linux (Read 583125 times)

letni

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

letni

Re: SETI MB CUDA for Linux

koschi

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

Kunin

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

Tye

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

Tye

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

Tye

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux