+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: SETI MB CUDA for Linux  (Read 503599 times)

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: SETI MB CUDA for Linux
« Reply #195 on: 12 Jul 2009, 07:01:59 am »
In windows the difference in first and second PCI-E slots (if first has motitor attached and second not) is:
GPU that used by Windows for video output will subject of 3 or 2 seconds timeout, but secong GPU will not.
Don't know if this relevant to Linux though.

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: SETI MB CUDA for Linux
« Reply #196 on: 12 Jul 2009, 09:26:46 am »
GPU that used by Windows for video output will subject of 3 or 2 seconds timeout, but secong GPU will not.
Don't know if this relevant to Linux though.

Well, if it is because of the first gpu also drawing the screen then it will probably also exist in linux. We don't have a big sample of seti cuda users with multi gpus in linux. Actually the sample is non-existent  :D

What Tye describes might be some faulty config, strange driver behavior, or some weird motherboard-gpu-gpu hardware incompatibility.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: SETI MB CUDA for Linux
« Reply #197 on: 12 Jul 2009, 10:10:04 am »
Not sure it exist in linux. It's not GPU feature, it's windows feature - it will kill driver (Vista) with more than 2 secs of "no answer" from it.
Don't know if Linux kerner implements such watchdog machanism or not.
GPUs that don't output video don't subject of this "driver hung" check and can run long kernels. That's why surely not all that work OK on Tesla will work OK on user's GPUs (even if newly GPUs slightly faster than first released Teslas IMHO)

b0b3r

  • Guest
Re: SETI MB CUDA for Linux
« Reply #198 on: 12 Jul 2009, 11:56:47 am »
Hello everyone

Came across an interesting error message in task 1294937260 while researching something else.

Quote
SETI@home MB CUDA 608 Linux 64bit SM 1.0 - r06 by Crunch3r :p

Error: API mismatch: the NVIDIA kernel module has version 180.29,
but this NVIDIA driver component has version 180.60. 
...

Something to watch for when fiddling about with Linux drivers and modules.

The anonymous owner of host 5011059 seems to be having a real problem getting his or her GTX 295 running under gentoo.

With this host I don't have any problems. It just happen during system upgrade.

I have a real problem with host 5018683. I don't have any idea what's wrong. It isn't over clocked, or overheating. GPU-s have about 75C~77C at full load (~52C idle). And other CUDA programs are working fine, but with SETI almost all end with:

cufft: ERROR: /root/cuda-stuff/sw/rel/gpgpu/toolkit/r2.2/cufft/src/execute.cu, line 1070
cufft: ERROR: CUFFT_EXEC_FAILED
cufft: ERROR: /root/cuda-stuff/sw/rel/gpgpu/toolkit/r2.2/cufft/src/execute.cu, line 1070
cufft: ERROR: CUFFT_EXEC_FAILED
cufft: ERROR: /root/cuda-stuff/sw/rel/gpgpu/toolkit/r2.2/cufft/src/cufft.cu, line 147
cufft: ERROR: CUFFT_EXEC_FAILED
Cuda error 'cufftExecC2C' in file './cudaAcc_fft.cu' in line 63 : unspecified launch failure.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file './cudaAcc_PowerSpectrum.cu' in line 56 : unspecified launch failure.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file './cudaAcc_PowerSpectrum.cu' in line 56 : unspecified launch failure.
Cuda error 'cudaAcc_summax32_kernel' in file './cudaAcc_summax.cu' in line 148 : unspecified launch failure.
Cuda error 'cudaAcc_summax32_kernel' in file './cudaAcc_summax.cu' in line 148 : unspecified launch failure.
Cuda error 'cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, cudaAcc_NumDataPoints / fftlen * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)' in file './cudaAcc_summax.cu' in line 161 : unspecified launch failure.

I will be thankful for any idea what's wrong and how to solve it.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: SETI MB CUDA for Linux
« Reply #199 on: 12 Jul 2009, 12:06:10 pm »
FFT lib kernel launch failed, most probably incompatibility between CUDA RT and video driver used.

b0b3r

  • Guest
Re: SETI MB CUDA for Linux
« Reply #200 on: 12 Jul 2009, 12:26:26 pm »
I tried these combinations of drivers and cuda:
- drv 180.29 with cuda 2.1
- drv 180.60 with cuda 2.1
- drv 185.18.14 with cuda 2.1
- drv 185.18.14 with cuda 2.2

And they all give same results. Strange thing is that just a few weeks back (drv 180.29 cuda 2.1) everything works fine, maybe there is something wrong with those results unit.

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: SETI MB CUDA for Linux
« Reply #201 on: 12 Jul 2009, 01:14:09 pm »
b0b3r as a start do an ldd of the seti client and post here as well as your xorg.0.log

b0b3r

  • Guest
Re: SETI MB CUDA for Linux
« Reply #202 on: 12 Jul 2009, 01:20:38 pm »
I don't use Xorg on this machine. Here is ldd output:

linux-vdso.so.1 =>  (0x00007fffeb1ff000)
libcufft.so.2 => /opt/cuda/lib/libcufft.so.2 (0x00007f38e2c23000)
libcudart.so.2 => /opt/cuda/lib/libcudart.so.2 (0x00007f38e29e5000)
libcuda.so.1 => /usr/lib/libcuda.so.1 (0x00007f38e2518000)
libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.2/libstdc++.so.6 (0x00007f38e220d000)
libm.so.6 => /lib/libm.so.6 (0x00007f38e1f88000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007f38e1d6c000)
libc.so.6 => /lib/libc.so.6 (0x00007f38e19f9000)
libdl.so.2 => /lib/libdl.so.2 (0x00007f38e17f5000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f38e15de000)
librt.so.1 => /lib/librt.so.1 (0x00007f38e13d5000)
libz.so.1 => /lib/libz.so.1 (0x00007f38e11bf000)
/lib64/ld-linux-x86-64.so.2 (0x00007f38e2f3e000)

b0b3r

  • Guest
Re: SETI MB CUDA for Linux
« Reply #203 on: 12 Jul 2009, 01:34:36 pm »
I found some information on google that it may be problem with memory leak in cufft, here is example.

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: SETI MB CUDA for Linux
« Reply #204 on: 12 Jul 2009, 01:57:31 pm »
I don't use Xorg on this machine.

What do you use? An

ls /dev/nv*

what does it give you?

b0b3r

  • Guest
Re: SETI MB CUDA for Linux
« Reply #205 on: 12 Jul 2009, 02:04:20 pm »
ls -al /dev/nv*
crw-rw-rw- 1 root root 195,   0 Jul 12 20:01 /dev/nvidia0
crw-rw-rw- 1 root root 195,   1 Jul 12 20:01 /dev/nvidia1
crw-rw-rw- 1 root root 195,   2 Jul 12 20:01 /dev/nvidia2
crw-rw-rw- 1 root root 195, 255 Jul 12 20:01 /dev/nvidiactl

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: SETI MB CUDA for Linux
« Reply #206 on: 12 Jul 2009, 02:22:28 pm »
Your system sees three devices.

In your host 5018683, boinc doesn't even see your graphics cards. Are you sure that you have intalled them correctly?

Also in both of your hosts upgrade boinc. 6.4.5 is too old.

pp

  • Guest
Re: SETI MB CUDA for Linux
« Reply #207 on: 12 Jul 2009, 02:29:50 pm »
Came across an interesting error message in task 1294937260 while researching something else.

Quote
SETI@home MB CUDA 608 Linux 64bit SM 1.0 - r06 by Crunch3r :p

Error: API mismatch: the NVIDIA kernel module has version 180.29,
but this NVIDIA driver component has version 180.60.  Please make
sure that the kernel module and all NVIDIA driver components
have the same version.
setiathome_CUDA: Found 1 CUDA device(s):
Cuda error 'cudaGetDeviceProperties( &cDevProp, i )' in file './cudaAcceleration.cu' in line 138 : initialization error.

Something to watch for when fiddling about with Linux drivers and modules.

The anonymous owner of host 5011059 seems to be having a real problem getting his or her GTX 295 running under gentoo.

He hasn't installed the NVIDIA drivers properly.


Well, actually he has. But after installation of the new package he neither rebooted nor loaded the new module. He's still running his system with the old version in memory. Easy mistake to do in Gentoo - been there done that.  :)

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: SETI MB CUDA for Linux
« Reply #208 on: 12 Jul 2009, 03:03:43 pm »
Not sure it exist in linux. It's not GPU feature, it's windows feature - it will kill driver (Vista) with more than 2 secs of "no answer" from it.
Don't know if Linux kerner implements such watchdog machanism or not.
GPUs that don't output video don't subject of this "driver hung" check and can run long kernels. That's why surely not all that work OK on Tesla will work OK on user's GPUs (even if newly GPUs slightly faster than first released Teslas IMHO)

Raistmer do you mean something like this? From cuda 2.2 release notes:

o Individual GPU program launches are limited to a run time
  of less than 5 seconds on a GPU with a display attached.
  Exceeding this time limit causes a launch failure reported
  through the CUDA driver or the CUDA runtime. GPUs without
  a display attached are not subject to the 5 second run time
  restriction. For this reason it is recommended that CUDA is
  run on a GPU that is NOT attached to an X display.

So yes, it also exists in linux.


@b0b3r: The error you posted above with the "unspecified launch failure" messages might be because of that.

Curiously I've crunched tens of thousands of workunits with my GPU that also runs X with ever seeing that kind of error.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: SETI MB CUDA for Linux
« Reply #209 on: 12 Jul 2009, 03:28:06 pm »
....Individual GPU program launches are limited to a run time
  of less than 5 seconds on a GPU with a display attached.....
  Yes.  In embedded microcontroller system terminology, that's called a "Watchdog Timer".  Crazy people program GPUs to take longer than that, Lunatics try to fix it.

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 355
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 46
Total: 46
Powered by EzPortal