Forum > Linux

SETI MB CUDA for Linux

<< < (73/162) > >>

pp:
bit-tech.net recently had an entertaining article about this. They tried both 4xGTX295 and 7x9600GT and not surprisingly, heat was a problem. They tested Folding@Home but it has lots of nice pictures...
http://www.bit-tech.net/bits/2009/08/03/how-to-build-the-best-folding-rig/1

sunu:
Thanks for the link, I haven't seen that.

That's were water cooling enters the picture. 4 x BFG NVIDIA GeForce GTX 295 H2OC 1792MB PCIe 2.0 with ThermoIntelligence Advanced Cooling Solution or 7 x BFG NVIDIA GeForce GTX 285 H2O+ 1GB PCIe 2.0 with ThermoIntelligence Advanced Cooling Solution or any other water cooled solution.

macros:

--- Quote from: pp on 19 Aug 2009, 05:12:23 am ---Are you still running CUDA 2.1? The 100% CPU was apparently a bug in those libraries. Upgrade CUDA to 2.3, nvidia-drivers to 190.xx  and replace your setiathome executable with the 2.2 version and optionally renice that process if you think it's too slow.

--- End quote ---

--- Quote from: sunu on 19 Aug 2009, 05:32:30 am ---Macros, what pp says. Make sure you're using cuda 2.2 or later together with a compatible nvidia driver.

--- End quote ---

--- Quote from: riofl on 19 Aug 2009, 06:16:12 am ---i think you will find best resonse setting your preferences to use 6 or 7 cpus instead of 8 leaving 1 for cuda and your desktop to use. i played around a bit with max_ncpus but did not find a huge difference. mine is set at 0.35.

absolutely if you do nothing else change your cuda tookit and sdk to 2.2 and get the 2.2 application. make sure your driver is at the minimum 185.14 or 185.29. i am using 185.29.

ver 2.1 had huge flaws in it . i have heard 2.3 is even better, however i have not had good luck with 2.3 so i went back to 2.2 until i can figure out what went wrong.

--- End quote ---

--- Quote from: sunu on 19 Aug 2009, 06:30:12 am ---Small correction to riofl: The driver versions are 185.18.14 and 185.18.29. Latest is 185.18.31. Macros, if you go to cuda 2.3 you'll need 190.18.

Macros, what card are you using? Maybe that 99% is because your card goes out of memory?

--- End quote ---

Thanks for everyone's hints.

I've installed nvidia-drivers version 185.18.14 from Ubuntu PPA source (x-updates) (I don't want to get on 'manual track' to manage nvidia drivers here...) plus 2.2 CUDA libraries). Also I've upgraded to setiathome-CUDA_2.2_6.08.x86_64_vlarkill.tar.bz2 client as pp suggested. First results weren't satisfactory - setiathome CUDA client would crash with following error output:


--- Code: ---<core_client_version>6.6.37</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>

SETI@home MB CUDA 608 Linux 64bit SM 1.0 - r06 by Crunch3r :p

setiathome_CUDA: Found 1 CUDA device(s):
   Device 1 : Quadro FX 4600
           totalGlobalMem = 804585472
           sharedMemPerBlock = 16384
           regsPerBlock = 8192
           warpSize = 32
           memPitch = 262144
           maxThreadsPerBlock = 512
           clockRate = 1188000
           totalConstMem = 65536
           major = 1
           minor = 0
           textureAlignment = 256
           deviceOverlap = 0
           multiProcessorCount = 12
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: Quadro FX 4600 is okay
SIGSEGV: segmentation violation
Stack trace (16 frames):
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x47cba9]
/lib/libpthread.so.0[0x7f96066ac080]
/usr/lib/libcuda.so.1[0x7f9607123020]
/usr/lib/libcuda.so.1[0x7f9607128d84]
/usr/lib/libcuda.so.1[0x7f96070f210f]
/usr/lib/libcuda.so.1[0x7f9606e7db3b]
/usr/lib/libcuda.so.1[0x7f9606e8e46b]
/usr/lib/libcuda.so.1[0x7f9606e76211]
/usr/lib/libcuda.so.1(cuCtxCreate+0xaa)[0x7f9606e6ffaa]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x5ace4b]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x40d4ca]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x419f23]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x424c7d]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x407f60]
/lib/libc.so.6(__libc_start_main+0xe6)[0x7f96063495a6]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu(__gxx_personality_v0+0x241)[0x407be9]

Exiting...

</stderr_txt>
]]>
--- End code ---

Then I've made an attempt to run the seti CUDA client standalone on the very same workunit and guess what - it worked.  :o
Messing around, I've ended up in state when there is one CUDA task running (and it seems that this time correctly - around 3-4% CPU time) but I don't have explanation for previous crashes.

The machine is:
Dual QC Xeon X5460 @ 3.16GHz
16GiB RAM
nVidia Quadro FX 4600
Ubuntu 9.04 w/ 2.6.28-15-server (I understood from other threads, that this might be an issue, but it doesn't really add up to fact that I didn't have a single compute error until I've upgraded to 2.2 CUDA + 2.2 seti CUDA client)
boinc ver. 6.6.37

pp:
The crash dump is still referencing the old executable. Did you update your app_info.xml? Also make sure you copy the new libcudart.so.2 and libcufft.so.2 to your projects/setiathome.berkeley.edu directory. And finally, as stated in another thread, also copy the new executable to /usr/local/bin or whatever directory you have in your PATH. I have had no problems since following these advices (well, apart from having to renice the executable to level 0 to give it enough CPU time).

macros:

--- Quote from: pp on 19 Aug 2009, 11:04:18 am ---The crash dump is still referencing the old executable.

--- End quote ---

True, but I got the same for the newer, just picked one from the error list, didn't notice its the old one...


--- Quote ---Did you update your app_info.xml? Also make sure you copy the new libcudart.so.2 and libcufft.so.2 to your projects/setiathome.berkeley.edu directory.
And finally, as stated in another thread, also copy the new executable to /usr/local/bin or whatever directory you have in your PATH. I have had no problems since following these advices (well, apart from having to renice the executable to level 0 to give it enough CPU time).
--- End quote ---

Yes, I did all that. Anyway, it seems to be running now, due to not making one change at the time, I don't know what was exactly the cause.  ;) ::)
Besides, its just first WU, hopefully there will be no more errors.

edit: It works. Finally :)

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version