Unfortunately on app_info.xml for MB+AP both 32 and 64 bit, in line 25, there is a very small error, a . too many.
I guess you made your point then :P (couldn't help myself :-X)
I crunshed several units. One was already correctly validated. The others have error messages:
<core_client_version>6.4.5</core_client_version>
<![CDATA[
<stderr_txt>
file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
....
I have a GeForce 8800 GT. Whats wrong here ? :(
OK... To jump in with both feet and an AMD Athlon64 X2 + nVidia 8600 GT and...
Read all about the fun on s@h: Boinc 6.6.2 just released (CUDA) (http://setiathome.berkeley.edu/forum_thread.php?id=51556)
Briefly, it appears to be working with two nice 19 tasks and additionally a nice 10 s@h CUDA task. Except that that the Boinc Manager consistently crashes whilst starting. The boinc client runs on unperturbed...
Hmm, 256MB seem a bit borderline in your case. I've got a GTX 280 and I've never had out of memory errors. Do you have any compiz 3d effects enabled?Nope.
Hmm, 256MB seem a bit borderline in your case. I've got a GTX 280 and I've never had out of memory errors. Do you have any compiz 3d effects enabled?Nope.
Simply 1280x1024, no effects, 4 desktops, running Mandriva 2009.0 with KDE4.
Any diagnostics that can be done?
Cheers,
Martin
I'm using the latest official stable drivers 180.22. Go to http://www.nvidia.com/object/linux_display_ia32_180.22.html to grab them.
Have you overclocked your card? Do you know what temperatures it gets when running cuda?
I also tried seti_home in text console standalone mode, but then all WUs make lots of strange errors. Is CUDA only supported from within an X session ?CUDA (of couse) only works when the Nvidia driver is loaded. This is usually done during start of the X server/subsystem. I guess the driver wasn't loaded/unloaded when you started/switched to standalone mode...
I have another question:
CPU utilization is still almost 100% for SETI and temp. on the GK stays low (50 C).
Task Status shows " Running (0.04% CPUs, 1 CUDA)", progress proceeds very slowly.
Is it possible that CPU is too weak to feed Graphic Card (Athlon X2 3800+) ?
I have another question:
CPU utilization is still almost 100% for SETI and temp. on the GK stays low (50 C).
Task Status shows " Running (0.04% CPUs, 1 CUDA)", progress proceeds very slowly.
Is it possible that CPU is too weak to feed Graphic Card (Athlon X2 3800+) ?
The CUDA Linux app on my Q6600 with 9800GT (512MB) is also consuming 100% (so one full core) CPU time :-/
Just wonder how to force CUDA MB to use 100% of CPU ?? ::) ::) ::)The CUDA Linux app on my Q6600 with 9800GT (512MB) is also consuming 100% (so one full core) CPU time :-/
Yes, for performance reasons (both cuda app itself and general desktop responsiveness), Crunch3r made it that way .
Just wonder how to force CUDA MB to use 100% of CPU ?? ::) ::) ::)The CUDA Linux app on my Q6600 with 9800GT (512MB) is also consuming 100% (so one full core) CPU time :-/
Yes, for performance reasons (both cuda app itself and general desktop responsiveness), Crunch3r made it that way .
REally more likely it's CPU fallback mode...
Check my results, WUs with angle range of ~0.43 take just ~19 minutes, those took close to 1h before on my machine...
Raistmer, this is what Crunch3r said:OK, replace walltime by CPU time in my prev post. Anyway I interested inthat numbers.
P.S. yes, the app needs a full core, cuz i've removed that part of the code that messes around with that on windows.
It also slows down the gpu if you'd run a 4+1 config.
Since it uses a full core, cpu time is actually wall time. It's not like in windows where cpu time << wall time.
I attached the CUDA start and stop messages from the log, run times are ~ the same I think...Thik in download area of this site, in benchmarking sector.
Where to get this PG*.wu test set?
I had a look at Hefto99's http://setiathome.berkeley.edu/results.php?hostid=4774614host today, he's now completed two Cuda WU's with Chrunch3r's 608 r06 Cuda app,Hm.. As I expected... Still interesting in (clear) CUDA work with CPU 100% allocation times. I don't understand for what 100% CPU usage is needed for CUDA app indeed. That's why I want to look at elapsed times for tasks I know, not some "wild" task from server...
both have completed on CPU fallback after 'CUDA runtime ERROR in device memory allocation' in about 15K secs (4hrs),
no credit granted yet.
Claggy
OK, I have finally got it running under the XFCE display manager :)
14.21 credit WU has been finished in under 15 minutes, although it is still falling fallback:
CUDA runtime ERROR in device memory allocation (Step 1 of 3). Falling back to HOST CPU processing...
Other tasks look much better, no more runtime ERRORS :)And CPU usage with that tasks ?
My CPU is running on 2GHz (Athlon X2 3800+), GK is 8600 GT on default clocks, openSUSE 11.1 64-bit, here are some results:I see pretty much the same on my system for an AthlonXP 6400+ and 8600GT GPU (256 MB VRAM).
[...]
CPU utilization is almost 100% for SETI
I see pretty much the same [100% CPU] on my system for an AthlonXP 6400+ and 8600GT GPU (256 MB VRAM).And for a brief comparison of a very few examples (sorting by AR):
Is the CPU doing a busy-wait poll of the GPU? Or why the high CPU utilisation?
As an experiment I'm keeping the CPU priority down to nice 19 (instead of the default 10) to see if there is any slowdown for the CUDA processing. However, that only reduces the CPU load to between 75% and 90% for a core...
I have been trying to get the SETI MB CUDA client running and ever work unit so far immediately hits a computation error as soon as it trys to crunch.
The error log looks like this:
[...]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu: error while loading shared libraries: libcufft.so.2: invalid ELF header
[...]
The machine has a AMD Opteron 248, with 2 Quadro 5600 FX. These are equivalent to GeForce 8800 GTXs and G80 chipsets. I am using CentOS 5.2 (x86_64) with Linux Kernel 2.6.18-8.el5. I am using NVIDIA drivers x86_64-180.22 and CUDA toolkit 2.1 64-bit. I have tried BOINC clients 6.4.5 and 6.6.2. I have tried using the CUDA toolkit libs and the ones provided in Crunch3r's package.
Have any of you run into this problem and/or have any suggestions?
chess@chess-desktop:~/Documents/setiathome-CUDA-6.08.x86_64$ ldd setiathome-CUDA-6.08.x86_64-pc-linux-gnu
linux-vdso.so.1 => (0x00007ffff01ff000)
libcufft.so.2 => /usr/lib/libcufft.so.2 (0x00007f80e7cbc000)
libcudart.so.2 => /usr/lib/libcudart.so.2 (0x00007f80e7a7e000)
libcuda.so.1 => /usr/lib/libcuda.so.1 (0x00007f80e7607000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f80e72fa000)
libm.so.6 => /lib/libm.so.6 (0x00007f80e7075000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007f80e6e59000)
libc.so.6 => /lib/libc.so.6 (0x00007f80e6ae7000)
libdl.so.2 => /lib/libdl.so.2 (0x00007f80e68e3000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f80e66cb000)
librt.so.1 => /lib/librt.so.1 (0x00007f80e64c2000)
libz.so.1 => /usr/lib/libz.so.1 (0x00007f80e62aa000)
/lib64/ld-linux-x86-64.so.2 (0x00007f80e7fd6000)
Operating System Linux
2.6.27-11-generic
I crunshed several units. One was already correctly validated. The others have error messages:
<core_client_version>6.4.5</core_client_version>
<![CDATA[
<stderr_txt>
file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
....
I have a GeForce 8800 GT. Whats wrong here ? :(
Now I updated to 180.22. But same situation. The card is not overclocked and temperature is 62 C. There are two types of cases:
- small WUs with a time to completion of ~7 min are running fine without any errors and granted credit is the same as claimed credit (14-15)
- big WUs with a time to completion of ~25 min have the gausfit error message and the granted credit is smaller than the claimed credit (claimed ~50, granted ~40)
Also, as I see from workunits page, Windows clients crunch units for 100-200 seconds on 8800 GTS 256MB, while my 8600 GT 256MB run about 1000-2000 seconds for the same unit, it's a huge abnormal difference for similar hardware.
The windows version is faster than the linux version.
http://setiathome.berkeley.edu/workunit.php?wuid=423584760
More correctly - Linux build uses much more CPU time than Windows one. Why it doing so - that's the question.
No, you missed that if CPU is busy - it's busy.
But if CPU free - it can be used somewhere else.
It seems in Linux CPU is busy all time CUDA app runs (I can do conclusions only by read posts of course, didn't run it on own host).
Watch deferred procedure Calls process (DPCs) %CPUusage in process explorer, with & without Cuda app running.For what? Elapsed == WALL CLOCK.
Watch deferred procedure Calls process (DPCs) %CPUusage in process explorer, with & without Cuda app running.For what? Elapsed == WALL CLOCK.
next question is does it matter? Well no, because it is really hidden and doesn't add to wall time, but it does consume some small portion of overall machine resources available to other apps.
SETI@home MB CUDA 608 Linux 32bit SM 1.0 - r06 by Crunch3r :p
setiathome_CUDA: Found 1 CUDA device(s):
Device 1 : GeForce 8800 GTS 512
totalGlobalMem = 536543232
sharedMemPerBlock = 16384
regsPerBlock = 8192
warpSize = 32
memPitch = 262144
maxThreadsPerBlock = 512
clockRate = 1620000
totalConstMem = 65536
major = 1
minor = 1
textureAlignment = 256
deviceOverlap = 1
multiProcessorCount = 16
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce 8800 GTS 512 is okay
SETI@home using CUDA accelerated device GeForce 8800 GTS 512
setiathome_enhanced 6.01 Revision: 402 g++ (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)
libboinc: BOINC 6.5.0
Work Unit Info:
...............
WU true angle range is : 0.447697
Optimal function choices:
-----------------------------------------------------
name
-----------------------------------------------------
v_BaseLineSmooth (no other)
v_vGetPowerSpectrumUnrolled 626.74337 0.00000
sse1_ChirpData_ak 49414.78434 0.00098
v_vTranspose4 22247.17831 0.00000
BH SSE folding 9764.43373 0.00000
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
Lysius, what kind of cuda client are you running? Those tasks marked "Error while computing" seem to have been killed with the VLAR autokill. As far as I know there hasn't been released a linux cuda client with a VLAR autokill function.
Yes Lysius, there is something wrong with the 32bit app. I've found and downloaded one of the WUs that you've done ( http://setiathome.berkeley.edu/workunit.php?wuid=424266619 ). The 32bit app gives me the same errors, while the 64bit app is good. Worse still, the two results are weakly similar.
Any news on this?
And is there a guide anywhere on how to test an app without using BOINC and reporting possible garbage?
Any news on this?
Unfortunately no. All 32bit builds had the same error. If you can, run the 64bit app, else it's better not run CUDA for the time being.
Is the source code for this available? Just wondering if it would be possible to compile a 32bit my self just to test. I would really like to put a 9500GT to work(my other htpc is based on xbmc with vdpau mod, would just like to try if crunching at htpcing is possible at the same time).You could start with sources from Berkeley's SVN.
# ldd setiathome-CUDA-6.08.x86_64-pc-linux-gnu
libcufft.so.2 => /usr/lib64/libcufft.so.2 (0x00002b1499d87000)
libcudart.so.2 => /usr/lib64/libcudart.so.2 (0x00002b149a0a1000)
libcuda.so.1 => /usr/lib64/libcuda.so.1 (0x00002b149a2df000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000031b0e00000)
libm.so.6 => /lib64/libm.so.6 (0x00000031ac200000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00000031aca00000)
libc.so.6 => /lib64/libc.so.6 (0x00000031abe00000)
libdl.so.2 => /lib64/libdl.so.2 (0x00000031ac600000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000031b0a00000)
librt.so.1 => /lib64/librt.so.1 (0x00000031b0600000)
libz.so.1 => /usr/lib64/libz.so.1 (0x00000031ace00000)
/lib64/ld-linux-x86-64.so.2 (0x00000031aae00000)
14:52 [---] Starting BOINC client version 6.6.20 for x86_64-pc-linux-gnu
10-May-2009 13:54:52 [---] This a development version of BOINC and may not function properly
10-May-2009 13:54:52 [---] log flags: task, file_xfer, sched_ops
10-May-2009 13:54:52 [---] Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.3 c-ares/1.5.1
10-May-2009 13:54:52 [---] Data directory: /usr/lib/BOINC
10-May-2009 13:54:52 [SETI@home] Found app_info.xml; using anonymous platform
10-May-2009 13:54:52 [---] Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz [Family 6 Model 26 Stepping 4]
10-May-2009 13:54:52 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc ida nonstop_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm
10-May-2009 13:54:52 [---] OS: Linux: 2.6.18-128.1.10.el5
10-May-2009 13:54:52 [---] Memory: 11.72 GB physical, 13.69 GB virtual
10-May-2009 13:54:52 [---] Disk: 888.98 GB total, 836.73 GB free
10-May-2009 13:54:52 [---] Local time is UTC +1 hours
10-May-2009 13:54:52 [---] CUDA device: GeForce 9400 GT (driver version 0, CUDA version 1.1, 511MB, est. 16GFLOPS)
10-May-2009 13:54:52 [---] Not using a proxy
10-May-2009 13:54:52 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 4917711; location: (none); project prefs: default
10-May-2009 13:54:52 [SETI@home] General prefs: from SETI@home (last modified 02-Mar-2009 19:24:50)
10-May-2009 13:54:52 [SETI@home] Host location: none
10-May-2009 13:54:52 [SETI@home] General prefs: using your defaults
10-May-2009 13:54:52 [---] Preferences limit memory usage when active to 6001.50MB
10-May-2009 13:54:52 [---] Preferences limit memory usage when idle to 10802.69MB
10-May-2009 13:54:52 [---] Preferences limit disk usage to 0.50GB
10-May-2009 14:04:31 [SETI@home] Starting 27fe09ab.28487.20931.9.8.139_1
10-May-2009 14:31:47 [SETI@home] Computation for task 27fe09ab.28487.20931.9.8.139_1 finished
I have a core i7 920 + 9400GT 512MB, but it took about 30min (wall clock) to do a WU with CUDA.
Given the time it took to finish and one of the CPU cores were running at full load, I suspect it is working in fall back mode, but I do not see any error message in boinc.log.
Maybe those wus were VLAR?
Also, as BOINC starts 2 normal CPU crunchers on my C2D E4400 and additionally SETI-CUDA grabs one of CPU for 100%, the crunchers start fighting for second CPU and all thing goes very slowly including X-server response time.Finally, I tried Slackware64-current, and so I tried x86_64 version of Seti-Cuda with the same hardware.
Also, as BOINC starts 2 normal CPU crunchers on my C2D E4400 and additionally SETI-CUDA grabs one of CPU for 100%, the crunchers start fighting for second CPU and all thing goes very slowly including X-server response time.Finally, I tried Slackware64-current, and so I tried x86_64 version of Seti-Cuda with the same hardware.
The symptoms are the same - slowdown of X-server response time, so I don't like it....
Can you give a link to your host?If that was meant for me you need to explain that to me like the noob I am. ;) You want me to provide more detailed system info? Just let me know what you need.
Yes, this is an SSE3 capable Athlon64. My nvidia drivers are 180.60 and if you know what the boinc client are looking for to identify them, I can inform the maintainer.It doesn't matter much what boinc shows. Get the newer nvidia driver if you can.
I compiled the trunk version of boinc successfully yesterday but I'll have to go through that Makefile in detail before I swap to it.Why compile? Use the ready packages from the links I gave you above.
The app_info.xml that came with the CUDA client contains several identical sections but with different version numbers. Aren't they redundant?Some of them maybe, but leave them as they are. It doesn't matter.
# grep version_num app_info.xml
<version_num>528</version_num>
<version_num>603</version_num>
<version_num>605</version_num>
<version_num>606</version_num>
<version_num>608</version_num>
I downloaded the SSE3-capable version of Astropulse v5 and there's a new app_info.xml included with it. Can I just add that info to my current app_info.xml to have boinc use both programs?Get the 64bit one if you haven't already. It's much faster than the 32bit one. Yes add the info to your current app_info.xml.
To have it use two instances on my dual core, I assume that's controlled by the "100% CPU" setting in my S@H profile?That's a bit more complicated. If you want to run 2+1 (that is 2 astropulse in your cpu and 1 multibeam in your gpu) you'll need to get the 185.18.14 nvidia driver just to begin with. More tinkering will be needed but we are here to help. If you stay with your current 180.60 you can only use 1+1 (1cpu + 1gpu).
I'll see if I can try to isolate the cpu hog first before I try Astropulse.
I was referring to the behaviour dtiger reported last in his post a few pages back in this thread: http://lunatics.kwsn.net/linux/seti-mb-cuda-for-linux.msg14847.html#msg14847I'll see if I can try to isolate the cpu hog first before I try Astropulse.
What cpu hog?
185.19 also gives computation errors and it's the latest from nvidia's ftp. Guess my energy has to focus on the boinc version next... I'll just have to empty my queue before I continue experimenting.
Experiments with CUDA 2.2
I tried cuda 2.2 with 185.18.08 beta driver and our 64 bit linux app.
1. 185.18.08 driver isn't compatible with 2.1 cuda libraries. You have to install the 2.2 ones.
2. Current optimized app doesn't have any problem running with the new 2.2 libs.
3. With cuda 2.2, current linux app exhibits the same behavior with the windows app. It no longer uses a full core, only for the first few seconds and then cpu utilisation hovers around 0-2%. Now in linux we potentially (see #6 ) can also use a 4+1 config, not only 3+1 as it is now.
4. Computation time is better with 2.2. Using file creation/modification timestamps, as it is now impossible to get accurate computation times for 2.2 I got for a random 0.44 AR wu:
2.1 9min 30sec
2.2 8min 28sec
5. Results are strongly similar
6. While standalone it runs ok, under BOINC I couldn't make it run, I get instant computation errors.
Huh, once every few days all CUDA WUs error out (libraries etc).
http://setiweb.ssl.berkeley.edu/result.php?resultid=1262888457
boinc restart, and all is OK.
Iztok
Do you switch your wus back and forth between cpu and gpu? I catched these two wus:
http://setiweb.ssl.berkeley.edu/result.php?resultid=1260953059
http://setiweb.ssl.berkeley.edu/result.php?resultid=1260953058
I
Also I've read various bad things about 6.6.3x boinc versions. Maybe try a different boinc? I'm using 6.6.20.
2.2 libraries?
2.2 libraries?
What do you mean?
Also try a different boinc version.
Your app_info xml seem ok. If I'm not mistaken 6.4.5 doesn't support running two different versions of the same app simultaneously (e.g. 6.03 on the cpu and 6.08 on the gpu). If you already have 6.03s it will run them, but all new wus will be 6.08 (gpu).
6.3.20 is then next to try, after SETI starts sending new job again.
Follow all steps (1-4) below:
Hi Tye!
I don't have much time right now, I'll post back in a few hours.
If you wish to install the NVIDIA Linux graphics driver on a Debian GNU/Linux or Ubuntu system that ships with Xorg 7.x, please ensure that your system meets the following requirements:
* development tools like make (build-essential) and gcc are installed
* the linux-headers package matching the installed Linux kernel is installed
* the pkg-config and xserver-xorg-dev packages are installed
* the nvidia-glx package has been uninstalled with the --purge option and the files /etc/init.d/nvidia-glx and /etc/init.d/nvidia-kernel do not exist
If you use Ubuntu, please also ensure that the linux-restricted-modules or linux-restricted-modules-common packages have been uninstalled. Alternatively, you can edit the /etc/default/linux-restricted-modules or /etc/default/linux-restricted-modules-common configuration file and disable the NVIDIA linux-restricted kernel modules (nvidia, nvidia_legacy) via:
DISABLED_MODULES="nv nvidia_new"
Additionally, delete the following file if it exists:
/lib/linux-restricted-modules/.nvidia_new_installed
(--) PCI: (0@1:0:0) nVidia Corporation GeForce 9600 GSO rev 162, Mem @ 0xcc000000/16777216,
0xb0000000/268435456, 0xca000000/33554432, I/O @ 0x00009c00/128, BIOS @ 0x????????/131072
(--) PCI: (0@3:0:0) nVidia Corporation GeForce 9600 GSO rev 162, Mem @ 0xc8000000/16777216,
0xa0000000/268435456, 0xc6000000/33554432, I/O @ 0x00008c00/128, BIOS @ 0x????????/131072
...
(II) NVIDIA(0): NVIDIA GPU GeForce 9600 GSO (G92) at PCI:1:0:0 (GPU-0)
(--) NVIDIA(0): Memory: 786432 kBytes
(--) NVIDIA(0): VideoBIOS: 62.92.4c.00.06
...
(II) NVIDIA(GPU-1): NVIDIA GPU GeForce 9600 GSO (G92) at PCI:3:0:0 (GPU-1)
(--) NVIDIA(GPU-1): Memory: 786432 kBytes
(--) NVIDIA(GPU-1): VideoBIOS: 62.92.4c.00.06
Device 1 : GeForce 9600 GSO
totalGlobalMem = 804585472
sharedMemPerBlock = 16384
regsPerBlock = 8192
warpSize = 32
memPitch = 262144
maxThreadsPerBlock = 512
clockRate = 1350000
totalConstMem = 65536
major = 1
minor = 1
textureAlignment = 256
deviceOverlap = 1
multiProcessorCount = 12
Device 2 : GeForce 9600 GSO 512
totalGlobalMem = 536608768
sharedMemPerBlock = 16384
regsPerBlock = 8192
warpSize = 32
memPitch = 262144
maxThreadsPerBlock = 512
clockRate = 1600000
totalConstMem = 65536
major = 1
minor = 1
textureAlignment = 256
deviceOverlap = 1
multiProcessorCount = 6
BTW, the 9.04 kernel did have an update recently, to 2.6.28-13 rather than the 2.6.28-11 that it shipped with. Does the newer kernel still have the same problems?
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:01:00:0"
EndSection
Section "Device"
Identifier "Device1"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:03:00:0"
EndSection
Option "SLI" "False"
Also, I'm willing to use the 8.10 kernel but I can't seem to make it install with my vbox stuff. Any hints?
When it gets near the end of the kernel packages install, it fails on adding two things for virtualbox into the kernel.
If I just try with the -image and the -headers packages, the -headers fail, but the image installs and shows up in the menu.lst. Is that all I really need? I get the error that the -headers package can't be installed because it depends on itself. ;)
BTW, the modules that fail now are nvidia (180.44), vboxdrv (2.1.4), and vboxnetflt (2.1.4).
SETI@home MB CUDA 608 Linux 64bit SM 1.0 - r06 by Crunch3r :p
Error: API mismatch: the NVIDIA kernel module has version 180.29,
but this NVIDIA driver component has version 180.60. Please make
sure that the kernel module and all NVIDIA driver components
have the same version.
setiathome_CUDA: Found 1 CUDA device(s):
Cuda error 'cudaGetDeviceProperties( &cDevProp, i )' in file './cudaAcceleration.cu' in line 138 : initialization error.
Came across an interesting error message in task 1294937260 (http://setiathome.berkeley.edu/result.php?resultid=1294937260) while researching something else.QuoteSETI@home MB CUDA 608 Linux 64bit SM 1.0 - r06 by Crunch3r :p
Error: API mismatch: the NVIDIA kernel module has version 180.29,
but this NVIDIA driver component has version 180.60. Please make
sure that the kernel module and all NVIDIA driver components
have the same version.
setiathome_CUDA: Found 1 CUDA device(s):
Cuda error 'cudaGetDeviceProperties( &cDevProp, i )' in file './cudaAcceleration.cu' in line 138 : initialization error.
Something to watch for when fiddling about with Linux drivers and modules.
The anonymous owner of host 5011059 (http://setiathome.berkeley.edu/show_host_detail.php?hostid=5011059) seems to be having a real problem getting his or her GTX 295 running under gentoo.
Figured out what the crashiness was - turns out that the 9600 GSOs I had that were 768 do not like being in the primary PCI-Ex slot. If I put them in the secondarly slot and use a different card in the primary, then all is good. Even the 9600 GSO 512 does fine in the primary - just not my two GSO 768s. Argh. Looks like no double-CUDA'ing with them, but might try mixed-mode with the 512 if I can pick up another card and put it where the 512 is being used now...
GPU that used by Windows for video output will subject of 3 or 2 seconds timeout, but secong GPU will not.
Don't know if this relevant to Linux though.
Came across an interesting error message in task 1294937260 (http://setiathome.berkeley.edu/result.php?resultid=1294937260) while researching something else.QuoteSETI@home MB CUDA 608 Linux 64bit SM 1.0 - r06 by Crunch3r :p
Error: API mismatch: the NVIDIA kernel module has version 180.29,
but this NVIDIA driver component has version 180.60.
...
Something to watch for when fiddling about with Linux drivers and modules.
The anonymous owner of host 5011059 (http://setiathome.berkeley.edu/show_host_detail.php?hostid=5011059) seems to be having a real problem getting his or her GTX 295 running under gentoo.
I don't use Xorg on this machine.
Came across an interesting error message in task 1294937260 (http://setiathome.berkeley.edu/result.php?resultid=1294937260) while researching something else.QuoteSETI@home MB CUDA 608 Linux 64bit SM 1.0 - r06 by Crunch3r :p
Error: API mismatch: the NVIDIA kernel module has version 180.29,
but this NVIDIA driver component has version 180.60. Please make
sure that the kernel module and all NVIDIA driver components
have the same version.
setiathome_CUDA: Found 1 CUDA device(s):
Cuda error 'cudaGetDeviceProperties( &cDevProp, i )' in file './cudaAcceleration.cu' in line 138 : initialization error.
Something to watch for when fiddling about with Linux drivers and modules.
The anonymous owner of host 5011059 (http://setiathome.berkeley.edu/show_host_detail.php?hostid=5011059) seems to be having a real problem getting his or her GTX 295 running under gentoo.
He hasn't installed the NVIDIA drivers properly.
Not sure it exist in linux. It's not GPU feature, it's windows feature - it will kill driver (Vista) with more than 2 secs of "no answer" from it.
Don't know if Linux kerner implements such watchdog machanism or not.
GPUs that don't output video don't subject of this "driver hung" check and can run long kernels. That's why surely not all that work OK on Tesla will work OK on user's GPUs (even if newly GPUs slightly faster than first released Teslas IMHO)
....Individual GPU program launches are limited to a run timeYes. In embedded microcontroller system terminology, that's called a "Watchdog Timer". Crazy people program GPUs to take longer than that, Lunatics try to fix it.
of less than 5 seconds on a GPU with a display attached.....
Raistmer do you mean something like this? From cuda 2.2 release notes:Exactly. Timer value varies between OSes but it's the same thing.
o Individual GPU program launches are limited to a run time
of less than 5 seconds on a GPU with a display attached.
Exceeding this time limit causes a launch failure reported
through the CUDA driver or the CUDA runtime. GPUs without
a display attached are not subject to the 5 second run time
restriction. For this reason it is recommended that CUDA is
run on a GPU that is NOT attached to an X display.
So yes, it also exists in linux.
Curiously I've crunched tens of thousands of workunits with my GPU that also runs X with ever seeing that kind of error.Well, maybe you have more fast GPU than user who have issues?...
GPU that used by Windows for video output will subject of 3 or 2 seconds timeout, but secong GPU will not.
Don't know if this relevant to Linux though.
Well, if it is because of the first gpu also drawing the screen then it will probably also exist in linux. We don't have a big sample of seti cuda users with multi gpus in linux. Actually the sample is non-existent :D
What Tye describes might be some faulty config, strange driver behavior, or some weird motherboard-gpu-gpu hardware incompatibility.
Your system sees three devices.
In your host 5018683, boinc doesn't even see your graphics cards. Are you sure that you have intalled them correctly?
Also in both of your hosts upgrade boinc. 6.4.5 is too old.
6.4.5 is currently marked as stable on linux that's why I'm using it
and like you see there is no problem with second host (http://setiathome.berkeley.edu/show_host_detail.php?hostid=5011059).
Well, maybe you have more fast GPU than user who have issues?...I have a GTX280 and he a GTX295, each GTX295 core more or less the same with GTX280 under cuda. Him saying that he is not running X makes the error message much more strange.
I may look at see if there's a newer BIOS later
I have a GTX280 and he a GTX295, each GTX295 core more or less the same with GTX280 under cuda. Him saying that he is not running X makes the error message much more strange.Probably we could derive from that this error not connected with watchdog timer expiration. Some other reason...
6.4.5 is currently marked as stable on linux that's why I'm using itDisregard that and get a newer version.
...
- with 6.6.36 tasks don't run, they hang with "Waiting" status. So I enable all debugs in cc_config, but it give me no answer for what it is waiting.
Follow all steps (1-4) below:
1) Use a newer boinc version. The latest is 6.6.36, http://boinc.berkeley.edu/download_all.php . I haven't checked it, I use 6.6.20, direct download link http://boinc.berkeley.edu/dl/boinc_6.6.20_x86_64-pc-linux-gnu.sh
2) Make sure all the appropriate cuda libs from 2.2 toolkit
libcudart.so
libcudart.so.2
libcudart.so.2.2
libcufft.so
libcufft.so.2
libcufft.so.2.2
are in the projects/setiathome.berkeley.edu directory.
3) Edit accordingly your ld.so.conf or the corresponding ld-something file of your distro with the above location of the cuda libs.
4) Place a copy of the cuda client in one of the following locations:
/usr/local/sbin
/usr/local/bin
/usr/sbin
/usr/bin
/sbin
/bin
/usr/games
...
<coproc>
<type>CUDA</type>
<count>2</count>
</coproc>
...
When I start Boinc, it reports 2 Tesla cards instead of the proper ones. Older boincs properly identify both cards. If this were just a naming problem I could live with this but....
With the above coproc statement set to 1,
When I do a ps ax to look at my process list this is what I see:
7987 ? RNLl 0:01 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0
7988 ? RNLl 0:01 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0
and it uses the GTX285 for both simultaneously!
When I have the coproc statement set to 2, it uses the Tesla only and runs only 1 process. it has both device numbers but the GTX285 is not used:
10170 ? RNLl 0:07 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0 --device 1
How can I get this to do the right thing and provide me with processes like these using both cards?
setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0
setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 1
How can I fix this? I know others are using 2 cards successfully.
When I start Boinc, it reports 2 Tesla cards instead of the proper ones. Older boincs properly identify both cards. If this were just a naming problem I could live with this but....
With the above coproc statement set to 1,
When I do a ps ax to look at my process list this is what I see:
7987 ? RNLl 0:01 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0
7988 ? RNLl 0:01 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0
and it uses the GTX285 for both simultaneously!
When I have the coproc statement set to 2, it uses the Tesla only and runs only 1 process. it has both device numbers but the GTX285 is not used:
10170 ? RNLl 0:07 setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0 --device 1
How can I get this to do the right thing and provide me with processes like these using both cards?
setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 0
setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu --device 1
How can I fix this? I know others are using 2 cards successfully.
Because Boinc versions greater than 6.6.25 only use the most cabable, use a cc_config.xml with this in it:
<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>
See How do I configure my client using the cc_config.xml file?
(http://boincfaq.mundayweb.com/index.php?language=1&view=91) for more options and debug flags.
Claggy
...
<coproc>
<type>CUDA</type>
<count>2</count>
</coproc>
...
Start by changing the orange #2 to a 1. This tag specifies how many GPUs the application (each instance) uses. AFAIK so far they only ever use 1. Other stuff, I'm sure some more Linux savvy people can help you with.
Jason
If this has already been posted, please ignore, but the BOINC developers for some reason decided to use only the most capable GPU by default. Before that they had something which would use more than 1 only if they were identical, etc.
Anyhow, if you have more than one GPU and want BOINC 6.6.25 or later to use all, you need a cc_config.xml (http://boinc.berkeley.edu/wiki/Client_configuration) with the option <use_all_gpus>1</use_all_gpus>.
Joe
\\skip@c17-desktop:/var/lib/boinc-client/projects/setiathome.berkeley.edu$ ldd setiathome-CUDA-6.08.x86_64-pc-linux-gnu
linux-vdso.so.1 => (0x00007fff69fff000)
libcufft.so.2 => /usr/lib/libcufft.so.2 (0x00007f16618de000)
libcudart.so.2 => /usr/lib/libcudart.so.2 (0x00007f16616a0000)
libcuda.so.1 => /usr/lib/libcuda.so.1 (0x00007f16611ff000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f1660ef2000)
libm.so.6 => /lib/libm.so.6 (0x00007f1660c6d000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007f1660a51000)
libc.so.6 => /lib/libc.so.6 (0x00007f16606df000)
libdl.so.2 => /lib/libdl.so.2 (0x00007f16604db000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f16602c3000)
librt.so.1 => /lib/librt.so.1 (0x00007f16600bb000)
libz.so.1 => /lib/libz.so.1 (0x00007f165fea3000)
/lib64/ld-linux-x86-64.so.2 (0x00007f1661bf8000)
I can confirm the above, I tried it with 6.6.36, 6.6.20 and 6.4.5 and ONLY 6.4.5 worked correctly (one WU to one GPU). Both 6.6.20 and 6.6.36 put both on the same device, and 6.6.36 I had to do funky things to get it working (had to install 6.6.20, run it, shut down, install 6.6.36 without removing anything, start 6.6.36... and repeat that any time I shut it down, VERY odd).
It could just be something with our OS, Ubuntu 8.10 x64, but for now 6.4.5 works... now I just need to find some way to get the CPU to do some work.
If someone remembers any showstopper bugs with 6.6.11 please tell.
That was just before the bombshell about not supporting app_info.xml properly. v6.6.12 doesn't aupport app_info, v6.6.14 started to put it back together (there was no .13). So you may have problems getting v6.6.11 working with both CUDA and an optimised CPU app, or indeed Astropulse.
Now that you mention it, I think all the newly downloaded wus went to cuda 6.08. No assignments to 6.03. Maybe it didn't ask? :-\. When did boinc started supporting both 6.03 and 6.08 concurrently?
I'd need to check if you need a detailled answer, but I think:
v6.6.1 for stock applications
v6.6.14 with app_info.xml
Edit: v6.4.5 allows you to specify and run both apps, but will only fetch for the higher-numbered one.
I'd need to check if you need a detailled answer, but I think:
v6.6.1 for stock applications
v6.6.14 with app_info.xml
Edit: v6.4.5 allows you to specify and run both apps, but will only fetch for the higher-numbered one.
If you could find when boinc started to fetch for both 6.03 and 6.08, I would be very grateful.
I'll keep monitoring 6.6.11 to see how it behaves. If it can't fetch for 6.03, I'll have to do it manually till a fixed boinc is released. I'll try to report the bug in boinc_alpha and I hope I'll get some attention.
LOL I just checked my servers page and although it does everything right locally it is just the opposite of all the others in reporting! instead of what others report 2-teslas, this one reports to the page or at least the page displays 2 gtx285! oh well. minor problem affecting nothing but my ego :)
it got a few cuda workunits so it is fetching both. it appears 6.6.11 works fine.
yeah iti got near 100 cpu AK workunits before it got the cuda ones. it is downloading both properly. it seems to work just fine.
Well, then, there is nothing more we could ask. 6.6.11 it is for multi-gpus in linux.
is the flops section of the app info file really important? i'm not using it. my file is below. does the addition of the flops help with processing efficiency?
does the cpu-gpu perl script V5 available in another topic actually catch vlars and vhars? maybe it doesn't report them because i have not seen it report any yet. a little concerned because i have had several computation error workunits from the 'vlar-killer 2.2 cuda' ap and with task viewing turned off at seti i cannot tell.I haven't used that script. Might be that vlar_kill has a greater vlar angle range than the script is looking for? To see if it is working correctly, after a script run that doesn't report anything, make a manual search in the workunits directory to see how many files(workunits) contain the text <true_angle_range>0.0 and then see if some of them are still assigned to cuda.
the other thing is would it be better to have my ratio set so that boinc never sees a shortage of cuda workunits so it never gets cuda workunits but only cpu workunits and then let this script supply the gpu work? i would think if it handles vlars it would be the most efficient way to never get one scheduled for cuda?In both cases you don't avoid using the script. Choose whatever suits you better.
my tesla cannot take them. its an early engineering version of it and it locks right up on vlars. this is the main reason i chose to run the vlar killer app to protect it.
I see this line:
<file_name>setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu</file_name>
But I only have the old app without the 2.2 part - where are you getting this CUDA app?
I see this line:
<file_name>setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu</file_name>
But I only have the old app without the 2.2 part - where are you getting this CUDA app?
sunu reported the url, but a reminder you need to install the 2.2 cuda sdk and tools too and update the libs in every directory you have them from the sdk lib dir. especially /usr/lib64, the boinc dir and the seti project dir and make sure those paths are in ld.so.conf and rerun ldconfig.
Any noticeable speedup?
Will it work with the new 190 driver and CUDA 2.3?
With the 185 driver and 2.2 libraries, I tried the 2.2 VLARkill app and putting the libraries in /usr/lib, the boinc folder, the proj*/setiat* folder, and my ~/bin but I kept getting computation errors with the 2.2 VLARkill app. Argh.
Still only single GPU, but I've got the CPU and GPU both doing MB stuff thanks to you guys now. Will pick up another GPU at some point though.
Still only single GPU, but I've got the CPU and GPU both doing MB stuff thanks to you guys now. Will pick up another GPU at some point though.
Multi-gpu problems of the past with seti CUDA are pretty much solved now in linux.
stupid question.. what sets boinc's timing to report completed workunits?
stupid question.. what sets boinc's timing to report completed workunits?
There's a whole long list of possible triggers (see John McLeod VII's post at SETI for all the gory details), but in general no 'ready to report' task should hand around on your computer for longer than 24 hours. So they should disappear at least once per day without you doing anything.
Thanks a lots for your tips sunu,
it run very well know i have got my firsts valids WUs on my GTS250 (about 5~10 min for a WU).It use about 2 % of CPU time to run so that i can crunch 2 other seti WU on my C2D E7300.
Bye
After crunching other projects for some months, I restarted SETI@GPU just his morning, but using the initial CUDA build for Linux which uses 100% of one CPU core as well...
So these new versions, that can be found in crunch3rs board, will use only few % of one CPU? That would be awsome...
I'm currently still at 180.60, some 2.6.30 rc and 6.6.17, but willing to update if I could free up that core with a never version of the app :)
I can confirm that setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu no longer takes 100% CPU (only about 2-4% now on this PC). That's with CUDA 2.2 - I haven't been brave enough to upgrade to CUDA 2.3 yet. Also using nVidia 185.18.14 and BOINC 6.9.0. The only downside so far is that the CPU time column now only shows actual CPU time used which is only a couple of minutes during the 19 minutes CUDA run. So no good way of checking exactly how long a WU takes now unless I monitor the clock manually.
I can confirm that setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu no longer takes 100% CPU (only about 2-4% now on this PC). That's with CUDA 2.2 - I haven't been brave enough to upgrade to CUDA 2.3 yet. Also using nVidia 185.18.14 and BOINC 6.9.0. The only downside so far is that the CPU time column now only shows actual CPU time used which is only a couple of minutes during the 19 minutes CUDA run. So no good way of checking exactly how long a WU takes now unless I monitor the clock manually.
Yup, that's been bothering me too. I'm wondering if there's a way to trick it into reporting clock time rather than cpu time... I'm using nvidia 185.18.14 and BOINC 6.6.11 btw since I'd like to do multi-GPU here soon.
Follow all steps (1-4) below:
1) Use a newer boinc version. The latest is 6.6.36, http://boinc.berkeley.edu/download_all.php . I haven't checked it, I use 6.6.20, direct download link http://boinc.berkeley.edu/dl/boinc_6.6.20_x86_64-pc-linux-gnu.sh
2) Make sure all the appropriate cuda libs from 2.2 toolkit
libcudart.so
libcudart.so.2
libcudart.so.2.2
libcufft.so
libcufft.so.2
libcufft.so.2.2
are in the projects/setiathome.berkeley.edu directory.
3) Edit accordingly your ld.so.conf or the corresponding ld-something file of your distro with the above location of the cuda libs.
4) Place a copy of the cuda client in one of the following locations:
/usr/local/sbin
/usr/local/bin
/usr/sbin
/usr/bin
/sbin
/bin
/usr/games
Try nvclock: http://www.linuxhardware.org/nvclock/#download
The current 0.8b4 is already part of Ubuntu Jaunty, if you are using some older version or another flavour of Linux you might want to compile it yourself, if it isn't available at that level. Around December/January quite soem progress has been made in that tool, though its still not supporting all nvidia cards.
My 8800GTS also uses the ADT7473, temperature readings show an offset of 8°C to the nvidia-settings temperature reading, fan controlling is working.
I see this line:
<file_name>setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu</file_name>
But I only have the old app without the 2.2 part - where are you getting this CUDA app?
Get it from http://calbe.dw70.de/mb/viewtopic.php?f=9&t=110
Could you provide a link to your host, or some failed work units?
If you follow the instructions provided by sunu and make sure the Nvidia modules are loaded, then it will also work for you :)
Could you provide a link to your host, or some failed work units?
If you follow the instructions provided by sunu and make sure the Nvidia modules are loaded, then it will also work for you :)
I'm running the 190.18 driver with CUDA 2.3 libraries and 2.2VLARkill app now on two machines with G92 chips, so far no isses.
Could you provide a link to your host, or some failed work units?
If you follow the instructions provided by sunu and make sure the Nvidia modules are loaded, then it will also work for you :)
I'm running the 190.18 driver with CUDA 2.3 libraries and 2.2VLARkill app now on two machines with G92 chips, so far no isses.
[...]
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce 8600 GTS is okay
SIGSEGV: segmentation violation
Stack trace (16 frames):
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x47cba9]
/lib64/libpthread.so.0[0x7f0954f4f0f0]
/usr/lib64/libcuda.so.1[0x7f09559c3920]
/usr/lib64/libcuda.so.1[0x7f09559c9684]
/usr/lib64/libcuda.so.1[0x7f0955992a0f]
/usr/lib64/libcuda.so.1[0x7f095571e296]
/usr/lib64/libcuda.so.1[0x7f095572ebab]
/usr/lib64/libcuda.so.1[0x7f0955716190]
/usr/lib64/libcuda.so.1(cuCtxCreate+0xaa)[0x7f095571000a]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x5ace4b]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x40d4ca]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x419f23]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x424c7d]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x407f60]
/lib64/libc.so.6(__libc_start_main+0xe6)[0x7f0954bec576]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu(__gxx_personality_v0+0x241)[0x407be9]
Exiting...
</stderr_txt>
]]>
After crunching other projects for some months, I restarted SETI@GPU just his morning, but using the initial CUDA build for Linux which uses 100% of one CPU core as well...The 100% core usage was a bug with 2.1 and earlier linux cuda libs. With 2.2 and later this has been fixed. Our seti cuda client had nothing to do with it. So you can use anything you like as long as you use 2.2 or later cuda libraries.
So these new versions, that can be found in crunch3rs board, will use only few % of one CPU? That would be awsome...
I'm currently still at 180.60, some 2.6.30 rc and 6.6.17, but willing to update if I could free up that core with a never version of the app :)
Yup, that's been bothering me too. I'm wondering if there's a way to trick it into reporting clock time rather than cpu time... I'm using nvidia 185.18.14 and BOINC 6.6.11 btw since I'd like to do multi-GPU here soon.Are you talking about boinc manager? Some time down the road, boinc manager changed from cpu time to elapsed time. If you use boinc 6.6.11 for multi-gpu you can use a later boinc manager version that shows the elapsed time. I'm using boinc 6.6.11 with boinc manager 6.6.37.
6.6.37 was reporting proper cpu/gpu times, but when i went back to 6.6.11 to use multiple devices that time reporting broke. i am not sure if adding a flops statement in app_info.xml will help with that or not.See my previous reply and no, flops in app_info.xml will not help.
The default priority of nice 10 seems to slow the process down on my box, once I switched it to 0 or -5, it processed much faster and collected up CPU time quicker.Are you talking about the CPU client or the GPU one?
I tried to go to the link, but every time I go to calbe.dw70.de I get access denied... is there another place to get it?Kunin I've attached it below.
Has anyone tried the CUDA 2.2 client together with 190.xx drivers and the CUDA 2.3 dlls, if there is some speed-up like under Windows?I haven't run a comparison but there is no harm using it.
does anyone know if there is a cuda 2.3 vlarkill x86_64 app available yet? i am switching everything to 2.3 and the 190 driver today.Unless Crunch3r makes one... But I don't think there will be any worthy speedup (at least in windows there isn't).
I'm trying to get CUDA working (with the 32 bit binary posted at message 1 of this thread) with my new 8600GTS in Slackware Linux and I'm having issues.. I have run the nvidia installer, etc, but I get some weird errors..Please don't use the 32bit client. When we were testing it, it had a strange bug and didn't produce valid results. I haven't checked though if newer cuda libraries make any difference.
1. The output shows I have a cuda device, however, it says I have revision 0 of the driver installed, even though I have installed the 185.18.14 and updaged to the 185.18.31..This is just cosmetic. Don't pay attention to it.
CUDA device: GeForce 8600 GTS (driver version 0, comp
ute capability 1.1, 255MB, est. 18GFLOPS).
2. I have modified my app_info.xml to allow both AK_V8_SSE3 (32bit) and the cuda to run simultaneously (included .xml file).. I have 3 active tasks being worked on, two (for my dual CPU) say setiathome_enhanced 6.03 and run just fine. The third is the CUDA which setiathome_enhanced 6.08 (cuda), and the status NEVER goes past Ready to start. It will eventuall error out with Computation error.There is an option that is on by default to not run cuda tasks when pc is in use. Check global_prefs.xml for <run_gpu_if_user_active>0</run_gpu_if_user_active> and change that 0 to 1.
Anyone have any thoughts or advice on how to debug this?
4. I'm not using XWindows at all. This is all console based only.. Is Xorg required to be running to utilize CUDA?Just copy&pasting from Nvidia:
#!/bin/bash
modprobe nvidia
if [ "$?" -eq 0 ]; then
# Count the number of NVIDIA controllers found.
N3D=`/sbin/lspci | grep -i NVIDIA | grep "3D controller" | wc -l`
NVGA=`/sbin/lspci | grep -i NVIDIA | grep "VGA compatible controller" | wc -l`
N=`expr $N3D + $NVGA - 1`
for i in `seq 0 $N`; do
mknod -m 666 /dev/nvidia$i c 195 $i;
done
mknod -m 666 /dev/nvidiactl c 195 255
else
exit 1
fi
my gkrellm monitors were the most sensitive to this behavior and began displaying symptoms before it got to the noticable level affecting my desktops. since i run an extrememly busy set of desktops (2 of the desktops display a total of 29 gkrellm monitor strips monitoring our servers in real time) i suspect the 190 driver isn't ready for prime time yet for linux when handling more than near idle desktop load plus cuda.I'm also using 190.18 and 2.3 cuda with no issues. This is my everyday pc with firefox with multitude of tabs and many other applications opening and closing. Have you checked if gkrellm has some kind of memory leak?
Thanks, but before I use it how does the VLAR kill work? I rebrand all of my VLAR to the CPU as soon as I can, and prefer to work whatever units I get since I have the 8 cores just sitting there most of the time.
Are you talking about boinc manager? Some time down the road, boinc manager changed from cpu time to elapsed time. If you use boinc 6.6.11 for multi-gpu you can use a later boinc manager version that shows the elapsed time. I'm using boinc 6.6.11 with boinc manager 6.6.37.
Perfect - I didn't think of doing that! Thanks again, sunu - you continue to be a big help and it's definitely appreciated!
Are you talking about boinc manager? Some time down the road, boinc manager changed from cpu time to elapsed time. If you use boinc 6.6.11 for multi-gpu you can use a later boinc manager version that shows the elapsed time. I'm using boinc 6.6.11 with boinc manager 6.6.37.
Perfect - I didn't think of doing that! Thanks again, sunu - you continue to be a big help and it's definitely appreciated!
Argh - somehow it doesn't like running with 6.6.11 boinc and 6.6.36 boincmgr... Is there some trick to that I'm missing?
Yep, that's what I did, but it just sits there frozen at the "Communicating with client" portion on startup. No messages, no display, no processes starting, etc.Argh - somehow it doesn't like running with 6.6.11 boinc and 6.6.36 boincmgr... Is there some trick to that I'm missing?
What problem do you have? Just copy boincmgr to your 6.6.11 installation.
Yep, that's what I did, but it just sits there frozen at the "Communicating with client" portion on startup. No messages, no display, no processes starting, etc.
@letni
Please see my big post above, I'm talking about cuda with no X server. Are you sure you've set it up all correctly? Also make sure you use compatible nvidia drivers and cuda libraries.
Also see my post http://lunatics.kwsn.net/linux/seti-mb-cuda-for-linux.msg19014.html#msg19014 and follow it to the letter.
Your card with 256 MB is borderline. You might get some out of memory messages here and there.
2. I have modified my app_info.xml to allow both AK_V8_SSE3 (32bit) and the cuda to run simultaneously (included .xml file).. I have 3 active tasks being worked on, two (for my dual CPU) say setiathome_enhanced 6.03 and run just fine. The third is the CUDA which setiathome_enhanced 6.08 (cuda), and the status NEVER goes past Ready to start. It will eventuall error out with Computation error.There is an option that is on by default to not run cuda tasks when pc is in use. Check global_prefs.xml for <run_gpu_if_user_active>0</run_gpu_if_user_active> and change that 0 to 1.
Anyone have any thoughts or advice on how to debug this?
Thanks, but before I use it how does the VLAR kill work? I rebrand all of my VLAR to the CPU as soon as I can, and prefer to work whatever units I get since I have the 8 cores just sitting there most of the time.
Well if you miss a VLAR from the rebranding and it gets to your GPU, it will get aborted almost instantly by the client.
That was the issue the whole time.. I guess I figured just cause I am not uxing XWINDOWs on a headless machine that I technically wasn't using the video card. Enabling that let it do its thing :)letni the hosts that you've posted above are full of errors. Are you sure you've solved all your problems?
Thanks everyone for all your help.
letni
Ok, so it kills it as soon as it actually starts processing and no sooner?Yes. When it starts to crunch a wu, it checks if this wu is a VLAR and if it is, it aborts.
I have the client running currently.. 2 threads on AK_V8_64bit_SSE3 (Pentium D) and one thread on cuda novlar 2.2 build.. The CUDA is extremly slow and I have noticed that the cuda thread is taking up 100% of one (out of 2) of my CPUs, slowing down my SSE3 CPU threads.. Here are the versions:That was the issue the whole time.. I guess I figured just cause I am not uxing XWINDOWs on a headless machine that I technically wasn't using the video card. Enabling that let it do its thing :)letni the hosts that you've posted above are full of errors. Are you sure you've solved all your problems?
Thanks everyone for all your help.
letni
latni can you give us a link of a wu done by the gpu?
I'm suspecting that you get an out of memory error and the wu is then done in cpu, that's why you get 100% cpu utilization by the cuda app.
That is absoutly the case.. my first WU was finally finished by "GPU" and the error messages say MALLOC out of memory errors.. Looks like time for a better video card!
That is absoutly the case.. my first WU was finally finished by "GPU" and the error messages say MALLOC out of memory errors.. Looks like time for a better video card!
letni, what cuda app do you use? The 2.2 one or the one from the first post of this thread? The 2.2 one is very memory hungry. Try using the one from the first post of this thread and see if it helps. Mind you that it has a different name so you'll have to edit your app_info.xml accordingly.
That did the trick.. No 100% CPU usage anymore with the older binary.. Chugging away very fast now!! Hopefully no more computation errors!letni I see you've already completed a few WUs with your GPU. Happy crunching! :)
if i were to build a machine with 4 GTX295 cards, will boinc see and use all 8 cuda processors properly and efficiently, or would i be better off using 4 GTX285 single gpu cards?Well, that is your choice. Boinc won't have any problems at all.
And if it doesn't work you can send that machine to me... ;D
And if it doesn't work you can send that machine to me... ;D
Well, probably, first, he's going to bring that machine down on my head hard for making him blow his money away, but I'll take it anyway. ;D
The default priority of nice 10 seems to slow the process down on my box, once I switched it to 0 or -5, it processed much faster and collected up CPU time quicker.
BOINC has 0, AK_V8_linux64_sse3 has 19 and setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu has 10 so obviously there's something I don't understand here. ;D Right now I'm using an external daemon to renice the processes now and then.
Back in the days when cuda needed a whole core, I was running a 3+1 config in my quad core. All processes had the lowest priority (19) and I don't think I had any serious slowdown, maybe a minute or so, not more. And this was my everyday desktop so many things were running, firefox with many many tabs, full 3d compiz effects, everyday backups, etc.
Only now that cuda shares a core with the other seti@home tasks, I started renicing them only to make them higher priority than the other seti@home instances. I think -5 is not necessary.
Great news! Maybe soon I won't have to use 6.4.5 for crunching and 6.6.11 for downloading (for some reason 6.6.11 would randomly stop using the GPUs, but 6.4.5 always says high priority so would never download new WU).
Great news! Maybe soon I won't have to use 6.4.5 for crunching and 6.6.11 for downloading (for some reason 6.6.11 would randomly stop using the GPUs, but 6.4.5 always says high priority so would never download new WU).
6.6.11 has a bug that if a GPU job is running,and a 2nd GPU job with an earlier deadline arrives, neither job is executed ever. Maybe you get hit by this.
I use a script running in an infinite loop to notify me when this happens. Then a boinc restart fixes it... until next time. Also turning off "leave applications in memory while suspended" in your computing preferences seem to help a bit, but it doesn't solve it completely.
Sounds like it since it happens randomly. On days I work (12 hour shifts) I'm at my computer maybe 3-4 hours, so odds of me catching it is slim, hence I use 6.4.5 for crunching. I just switch to 6.6.11 to download 5-10 days cache, rebrand it all and then back to crunching.Well, you could use a script to restart boinc automatically when the bug kicks in. I didn't do it because I didn't like too much complexity and at the end I preferred I have the control over boinc restart.
wow. never knew that... i dont see that in my system but that is because i run the cpugpu perl script often to catch random downloads so boinc gets restarted several times an hour.Restarting boinc several times an hour surely squashed that bug. :D
soon as the scheduler gets fixed ill give the new one a shot :)Yesterday, some changes to the scheduler were introduced. The problems I posted above, it seems, were across all platforms as the changes were generic. The battle with task scheduling in boinc is an ongoing and never ending one.
Restarting boinc several times an hour surely squashed that bug. :D
Yesterday, some changes to the scheduler were introduced. The problems I posted above, it seems, were across all platforms as the changes were generic. The battle with task scheduling in boinc is an ongoing and never ending one.
I haven't checked boinc with the new changes to see how it runs. Feel free to check it. At last we'll have a "modern" boinc release with proper multi-gpu support in linux in the next round of "official" releases.
or do i have to use svn and hope for the best? :P
or do i have to use svn and hope for the best? :P
Yes, you have to compile from source.
Yes, trunk is the one to get.
What boinc reports is a minor cosmetic bug. The important thing is to use all gpus properly.
libcal.so is for ATI cards (something like libcudart.so for NVIDIA cards). ATI card support was added a couple of days ago for milkyway@home. It should be irrelevant to us.
I've never bothered with boincmgr while compiling from source. I use the released ones. As long as boinc works properly, we're ok.
Lately there was an increase in sensitivity so most of the recent workunits are big ones. In my pc they take about 12-15 min for the gpu and about 1:45-2:00 hours for the cpu. Boinc doesn't have anything to do with the speed of computations, unless it uses 100% of the CPU slowing things down.
was concerned since previously i have never had a cuda work unit take more than 15min to process with typical 9 to 13 min, they are now taking approx 30 min for each card. and my rac has dropped for this machine by more than 400 points. ill just keep plugging away for a while to let things settle out. nothing was changed in the 'backend' applications so it must be the larger workunits presented.
was concerned since previously i have never had a cuda work unit take more than 15min to process with typical 9 to 13 min, they are now taking approx 30 min for each card. and my rac has dropped for this machine by more than 400 points. ill just keep plugging away for a while to let things settle out. nothing was changed in the 'backend' applications so it must be the larger workunits presented.
No,,this is not good. Check how boinc handles the tasks. When a cuda workunit finishes, does it also stop the other one running to start a new pair?
was concerned since previously i have never had a cuda work unit take more than 15min to process with typical 9 to 13 min, they are now taking approx 30 min for each card. and my rac has dropped for this machine by more than 400 points. ill just keep plugging away for a while to let things settle out. nothing was changed in the 'backend' applications so it must be the larger workunits presented.
No,,this is not good. Check how boinc handles the tasks. When a cuda workunit finishes, does it also stop the other one running to start a new pair?
Unless you have a faulty card (if it is cuda) or cpu/ram (if it is a cpu workunit), result overflows are pretty much "normal" and they don't have anything to do about your memory/storage allocations.
Check why your pending cache has increased. Is it genuine "waiting for validation" or is it suspicious "validation inconclusive" If it is the latter, check those workunits if you have returned very strange and different to your wingman results. Check also in the invalid category of your tasks page if there are any there.
30min for a CUDA wu seem too much. Unless you have a lower end card.
riofl, give me a link to your host.
Compiled boinc gave me also increased benchmarks. Don't have any real importance though.
riofl, I'm sure you know, your tesla card has some problems. It gives errors in some workunits. If you look in your errors page, all those workunits were run by the tesla card. It does run successfully though in other workunits.
Checking the reported run times, I don't see any significant difference between eg 18 August and 14 August when you were running 6.6.11.
I do see though that most of the workunits were restarted 2 or 3 or more times. The initialization phase of a cuda task takes about 30 sec. If it is restarted 2 times you lose 1 min and with a total computation time of eg. 14 min you lose 7% credit right there.
You've said that you run the rebranding script several times per hour, why? I search for vlars once per day, sometimes once per two days, and that is more than enough. If newly downloaded tasks get crunched only a few hours later, increase your cache so they are crunched after 2-3 or more days so running the script once per day will be enough.
Why play with your cache levels? Change from 6 to 10 and then back to 2, why? Pick a cache level and leave it there. I'm using 10 days. If you were using 6 days, it is fine also. 6 days cache means that the workunits downloaded now will be crunched in about 6 days, so you have 6 days to check for vlars. No need to run that script x times per hour or x times per day.
Back in the days when cuda needed a whole core, I was running a 3+1 config in my quad core. All processes had the lowest priority (19) and I don't think I had any serious slowdown, maybe a minute or so, not more. And this was my everyday desktop so many things were running, firefox with many many tabs, full 3d compiz effects, everyday backups, etc.
Only now that cuda shares a core with the other seti@home tasks, I started renicing them only to make them higher priority than the other seti@home instances. I think -5 is not necessary.
<avg_ncpus>0.040000</avg_ncpus>
<max_ncpus>0.040000</avg_ncpus>
PID PR NI RES SHR %CPU TIME+ COMMAND
15538 39 19 48m 1472 101 13:44.03 AK_V8_linux64_s
15539 39 19 48m 1464 101 20:17.47 AK_V8_linux64_s
15540 39 19 48m 1468 101 20:25.35 AK_V8_linux64_s
15541 39 19 48m 1464 99 19:52.12 AK_V8_linux64_s
15544 39 19 48m 1484 99 20:30.54 AK_V8_linux64_s
15545 30 10 114m 10m 99 18:42.69 setiathome-CUDA
15546 39 19 48m 1488 94 19:55.04 AK_V8_linux64_s
16208 39 19 48m 1488 51 12:22.11 AK_V8_linux64_s
15542 39 19 48m 1472 46 12:14.44 AK_V8_linux64_s
Why play with your cache levels? Change from 6 to 10 and then back to 2, why? Pick a cache level and leave it there. I'm using 10 days.ahh yes except for one thing. i have seen even this new version of boinc obey the due dates and pick the next workunit from among the newly downloaded.
Now to the question - am I doing something wrong and cuda does not behave correctly?
Or is this normal and I should just set avg_ncpus & max_ncpus to 1, and pin the process to some core + make it use it exclusively?
ahh yes except for one thing. i have seen even this new version of boinc obey the due dates and pick the next workunit from among the newly downloaded. if they stayed in ascending date order i would agree but it does not seem to work that way for me. at least 3 or 4 times so far i noticed a cuda and/or cpu workunit placed on hold to pick up one that had a closer due date that was recently downloaded. this means there is a danger the gpu app will reject a possible vlar before it can be flagged.
ahh yes except for one thing. i have seen even this new version of boinc obey the due dates and pick the next workunit from among the newly downloaded. if they stayed in ascending date order i would agree but it does not seem to work that way for me. at least 3 or 4 times so far i noticed a cuda and/or cpu workunit placed on hold to pick up one that had a closer due date that was recently downloaded. this means there is a danger the gpu app will reject a possible vlar before it can be flagged.
This happens only for vhar workunits, they have shorter deadlines than the rest. VLARs have "normal" deadlines and they are crunched when their time comes, about x(cache) days after they've been downloaded.
Macros, what pp says. Make sure you're using cuda 2.2 or later together with a compatible nvidia driver.
Shameless plug: I've reached #4 in the top hosts list (http://setiathome.berkeley.edu/top_hosts.php). I don't know how long I can hold on there though. Attaching pdf for future proof.
Back in the days when cuda needed a whole core, I was running a 3+1 config in my quad core. All processes had the lowest priority (19) and I don't think I had any serious slowdown, maybe a minute or so, not more. And this was my everyday desktop so many things were running, firefox with many many tabs, full 3d compiz effects, everyday backups, etc.
Only now that cuda shares a core with the other seti@home tasks, I started renicing them only to make them higher priority than the other seti@home instances. I think -5 is not necessary.
Question regarding this. I am using the default settings in app_info.xml's <app_version> for cuda as follows:Code: [Select]<avg_ncpus>0.040000</avg_ncpus>
<max_ncpus>0.040000</avg_ncpus>
The problem is, that setiathome-CUDA process has demand obviously higher than that and is able to eat up CPU time of whole one core. That results in other (regular CPU) processes to fight over the CPU time, context switches, cache thrashing etc. ->Code: [Select]PID PR NI RES SHR %CPU TIME+ COMMAND
15538 39 19 48m 1472 101 13:44.03 AK_V8_linux64_s
15539 39 19 48m 1464 101 20:17.47 AK_V8_linux64_s
15540 39 19 48m 1468 101 20:25.35 AK_V8_linux64_s
15541 39 19 48m 1464 99 19:52.12 AK_V8_linux64_s
15544 39 19 48m 1484 99 20:30.54 AK_V8_linux64_s
15545 30 10 114m 10m 99 18:42.69 setiathome-CUDA
15546 39 19 48m 1488 94 19:55.04 AK_V8_linux64_s
16208 39 19 48m 1488 51 12:22.11 AK_V8_linux64_s
15542 39 19 48m 1472 46 12:14.44 AK_V8_linux64_s
Now to the question - am I doing something wrong and cuda does not behave correctly?
Or is this normal and I should just set avg_ncpus & max_ncpus to 1, and pin the process to some core + make it use it exclusively?
Are you still running CUDA 2.1? The 100% CPU was apparently a bug in those libraries. Upgrade CUDA to 2.3, nvidia-drivers to 190.xx and replace your setiathome executable with the 2.2 version and optionally renice that process if you think it's too slow.
Macros, what pp says. Make sure you're using cuda 2.2 or later together with a compatible nvidia driver.
i think you will find best resonse setting your preferences to use 6 or 7 cpus instead of 8 leaving 1 for cuda and your desktop to use. i played around a bit with max_ncpus but did not find a huge difference. mine is set at 0.35.
absolutely if you do nothing else change your cuda tookit and sdk to 2.2 and get the 2.2 application. make sure your driver is at the minimum 185.14 or 185.29. i am using 185.29.
ver 2.1 had huge flaws in it . i have heard 2.3 is even better, however i have not had good luck with 2.3 so i went back to 2.2 until i can figure out what went wrong.
Small correction to riofl: The driver versions are 185.18.14 and 185.18.29. Latest is 185.18.31. Macros, if you go to cuda 2.3 you'll need 190.18.
Macros, what card are you using? Maybe that 99% is because your card goes out of memory?
<core_client_version>6.6.37</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
SETI@home MB CUDA 608 Linux 64bit SM 1.0 - r06 by Crunch3r :p
setiathome_CUDA: Found 1 CUDA device(s):
Device 1 : Quadro FX 4600
totalGlobalMem = 804585472
sharedMemPerBlock = 16384
regsPerBlock = 8192
warpSize = 32
memPitch = 262144
maxThreadsPerBlock = 512
clockRate = 1188000
totalConstMem = 65536
major = 1
minor = 0
textureAlignment = 256
deviceOverlap = 0
multiProcessorCount = 12
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: Quadro FX 4600 is okay
SIGSEGV: segmentation violation
Stack trace (16 frames):
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x47cba9]
/lib/libpthread.so.0[0x7f96066ac080]
/usr/lib/libcuda.so.1[0x7f9607123020]
/usr/lib/libcuda.so.1[0x7f9607128d84]
/usr/lib/libcuda.so.1[0x7f96070f210f]
/usr/lib/libcuda.so.1[0x7f9606e7db3b]
/usr/lib/libcuda.so.1[0x7f9606e8e46b]
/usr/lib/libcuda.so.1[0x7f9606e76211]
/usr/lib/libcuda.so.1(cuCtxCreate+0xaa)[0x7f9606e6ffaa]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x5ace4b]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x40d4ca]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x419f23]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x424c7d]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu[0x407f60]
/lib/libc.so.6(__libc_start_main+0xe6)[0x7f96063495a6]
setiathome-CUDA-6.08.x86_64-pc-linux-gnu(__gxx_personality_v0+0x241)[0x407be9]
Exiting...
</stderr_txt>
]]>
The crash dump is still referencing the old executable.
Did you update your app_info.xml? Also make sure you copy the new libcudart.so.2 and libcufft.so.2 to your projects/setiathome.berkeley.edu directory.
And finally, as stated in another thread, also copy the new executable to /usr/local/bin or whatever directory you have in your PATH. I have had no problems since following these advices (well, apart from having to renice the executable to level 0 to give it enough CPU time).
The crash dump is still referencing the old executable.
True, but I got the same for the newer, just picked one from the error list, didn't notice its the old one...QuoteDid you update your app_info.xml? Also make sure you copy the new libcudart.so.2 and libcufft.so.2 to your projects/setiathome.berkeley.edu directory.
And finally, as stated in another thread, also copy the new executable to /usr/local/bin or whatever directory you have in your PATH. I have had no problems since following these advices (well, apart from having to renice the executable to level 0 to give it enough CPU time).
Yes, I did all that. Anyway, it seems to be running now, due to not making one change at the time, I don't know what was exactly the cause. ;) ::)
Besides, its just first WU, hopefully there will be no more errors.
edit: It works. Finally :)
one thing you need to make sure of is that the project directory where the cuda libs are is listed in the ld.so.conf file and that you have run ldconfig. without that it is very likely it would crash possibly a few times and then find its libraries by accident.Yeah, I always verify DSO availability with ldd on every client ...
riofl, what is happening?
I've checked again your host today and I've seen this: http://setiathome.berkeley.edu/results.php?hostid=4166601&offset=40&show_names=0&state=2
All 2 hundred and 3 hundred sec tasks were done by your 285. All two-digit sec tasks were done by your tesla. This is completely abnormal.
riofl, what is happening?
I've checked again your host today and I've seen this: http://setiathome.berkeley.edu/results.php?hostid=4166601&offset=40&show_names=0&state=2
All 2 hundred and 3 hundred sec tasks were done by your 285. All two-digit sec tasks were done by your tesla. This is completely abnormal.
Hi.
I was just wondering if there are any known issues in using the CUDA client with the 2.6.30 kernel ? I recently built a 2.6.30 kernel (to see if the AP units will fail), and noticed that my CUDA units were appreciably slower (taking over an hour).
I just switched back the latest ubuntu kernel (2.6.28-15-generic), which seems to work fine.
Any suggestions, any particular info you need ? I'm using the same nvidia driver in both cases (185.18.31, on an x86_64 platform)
riofl, what is happening?
I've checked again your host today and I've seen this: http://setiathome.berkeley.edu/results.php?hostid=4166601&offset=40&show_names=0&state=2
All 2 hundred and 3 hundred sec tasks were done by your 285. All two-digit sec tasks were done by your tesla. This is completely abnormal.
I'm gonna try a vanilla built 2.6.28.10 kernel, see if I get the same performance issues (and hopefully successfull AP units ...).
This is fun ! Damn I missed this stuff !
riofl, what is happening?
I've checked again your host today and I've seen this: http://setiathome.berkeley.edu/results.php?hostid=4166601&offset=40&show_names=0&state=2
All 2 hundred and 3 hundred sec tasks were done by your 285. All two-digit sec tasks were done by your tesla. This is completely abnormal.
i am trying to help someone in the boinc cuda forum. his workunits are giving strange things. he runs windows, the vlarkill cuda app and the ak cpu app, a 9400gt 1gb ram and the 190 driver with 2.3 toolkit. this is one of his workunits.. they are all similar..
his cuda app is specifically CUDA - MB_6.08_CUDA_V12_VLARKill_FPLim248.exe.
...
i dont know what the fft lines mean but i have a feeling they should not be there. my first thought was to drop back to the 185 series drivers and the 2.2 toolkit. im not sure 2.3 and a 9400 can work together.
any ideas?
...
The FFT lines are normal for those builds, ...
You need the 2.2 CUDA binary: http://calbe.dw70.de/mb/viewtopic.php?p=868#p868I try this application too but here is another problem
Search for a newer version of "libstdc++", GLIBCXX_3.4.9 is the version number of that library.You need the 2.2 CUDA binary: http://calbe.dw70.de/mb/viewtopic.php?p=868#p868I try this application too but here is another problem
ldd setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu
./setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by ./setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu)
CentOS5 have no GLIBCXX_3.4.9 in updates now.
Libor
... currently trying the 2.6.27 kernel, and CUDA seems fine (and hoping that AP is stable).On main the only way is currently to run your cache dry, detach from project and immediately after reattach to project. That way the "lost" wus will be marked "detached" and send out again.
Quick question though. I had about 70 WUs uploaded, but not reported. I then tinkered with my app_info.xml, and lost all the WUs, so can't report them ! Is there a way to report with the boinc client ? Otherwise, there's 70 units sitting there, and no one will ever know !!!
I don't understand something. The developers obviously know that cuda devices hate vlar and vhar workunits. why can't they put the smarts that are in the perl script directly into boinc so that as it downloads the workunit it reads the angles etc and decides where to assign it at that point? it should be an easy thing to implement saving lots of trouble for people like me who have a card that locks up at the slightest hint of a vlar and having to run something external to make sure 'proper' workunits are fed to cuda?
seems to me that this ia a gross oversight leaving this out of boinc.
Hey all.Thanks, but it's known error.
I recently got a computation error with one of my CUDA WU's, but it wasn't a segfault. Maybe there's some info in there for you guys :
http://setiathome.berkeley.edu/result.php?resultid=1347109626
now i dont know about desktop settings much but there is one change i made in the past few weeks with nvidia-settings. i unchecked Sync to VBlank in xvideo settings and also unchecked sync to vblank and allow flipping in the opengl settings. wasnt sure what they did but there seemed to be no difference. should they be checked?Sync to vblank and flipping don't have anything to do with cuda computations. They are about tearing.
power mizer which seems to not have settings says adaptive clocking enabled performance level2 perforamce mode desktop. level2 is the 3d settings above however i remember when i first got the card, performance mode said maximum performance and somewhere along the line it changed to desktop. since the other settings are the same i can only assume it is a function of which driver is being used for which text shows up.I don't like powermizer at all, but it doesn't seem you have a problem there.
since 6.9.0 reports 2 teslas, could it be possible it is mixing up which device is 0 and which is 1? because it is completely odd since the tesla is running gpu 500mhz and memory 900mhz so it should be considerably slower. it rates both devices it thinks are teslas at 74gflops yet the 285 is rated by 6.6.11 as 127gflopsAgain, what boinc reports is irrelevant, just cosmetic. What cuda client sees is important and in your case it reports your cards right.
i am going to reboot this tomorrow so when i do i am going to go over the settings in cmos. presently it is set to auto on pci-e bus frequency. maybe i will fix it at 100mhz .. it could be doing God knows what in auto.I don't think pci-e bus frequency has any noticeable effect in cuda speed and even if it did, it should affect both your cards, not only one.
also the 3 digit time workunits are still the 285 and 2 digit the tesla. i wonder if it has something to do with how busy my desktops are? i have quite a lot going on 24/7 with 18 gkrellm server monitors running in one desktop, usually 4 or 5 browser windows in different desktops with maybe 28 or so tabs open, average 8 or 10 ssh konqueror tabs open into our servers, email, virtualbox running xp which also runs boinc, kopete, 8 or 9 postit notes in the various desktops, a few kedit windows open plus momentary things like adobe reader, smplayer or whatever.. im in totally new territory here. my experience in graphics cards is plug it in and make sure it works with a stable and peppy screen :)This is a very busy desktop. Have you tried running a few workunits with absolutely nothing of the above running? I think your times will return to "normal".
however the 'busyness' of the desktops is not new and was basically the same when i had 10-13min workunits out of both cards.
Hello,Cuda 2.1 libraries have a bug, that's why you have 100% CPU utilisation. You need 2.2 or later. Also please follow my post in http://lunatics.kwsn.net/linux/seti-mb-cuda-for-linux.msg19014.html#msg19014 very carefully.
I try Crunch3rs CUDA seti application and search google and this forum too but result is not OK.
I tried to CUDA 2.1 and aplication setiathome-CUDA-6.08.x86_64-pc-linux-gnu. This compute OK but take 100% of CPU.
http://setiathome.berkeley.edu/result.php?resultid=1340190227
Now I test CUDA 2.3 with same apliaction and result is SIGSEGV: segmentation violation
I try add setiathome-CUDA-6.08.x86_64-pc-linux-gnu to /usr/local/bin but still not working.
CUDA 2.2 take segmentation violation too.
Thanks for any ideas
Libor
I try this application too but here is another problemCentos, since it's more enterprise oriented, uses old versions of ...well everything. Have you had any success?
ldd setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu
./setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by ./setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu)
CentOS5 have no GLIBCXX_3.4.9 in updates now.
Libor
Is it even possible to use CUDA with two different devices? If so, what am I missing?Of course it is. Any link to your host? Also try putting
no, i have not tried just running boinc without a gui.. i will try that this coming weekend when i can spare some downtime from work and monitoring the servers. will let it run for 1 hr with no X running and then will go in and see if there are any differences.Leave X, just close all those apps you have running. Just the desktop with boinc in the background.
thing is, the usage of my desktops has not changed much at all during the past year so i had the same stuff open with the 13min workunits a few months ago. will be interesting to see if the 3 digit numbers move into 2 digit though on the tasks report.The bigger multibeam workunits started about a month or two ago.
i hate powermizer myself but i cannot find any options to turn it off and leave the card in high perf mode at all times. every time i spot check it its always in hi perf mode so maybe my temps are not high enough to trigger it (assuming temp is its onlly trigger) and if idle is a trigger, my desktop is never idle even when i go to bed, all the gkrellm monitors are advancing their graphs every second.Many people have tried many ways to turn off powermizer usually with no success. :D Powermizer levels are triggered by GPU usage or very high (95+°C) temperatures.
seems so strange with all the mb servers down, my cuda cards are both idling at around 46c. really odd since i am used to them being in the low or mid 60s all the time.I have some WUs cached for a few days more ;D
no, i have not tried just running boinc without a gui.. i will try that this coming weekend when i can spare some downtime from work and monitoring the servers. will let it run for 1 hr with no X running and then will go in and see if there are any differences.Leave X, just close all those apps you have running. Just the desktop with boinc in the background.
ok ill close down all my 'server' functions as well like my jabber server, bind, etc. so its just x and boinc running.thing is, the usage of my desktops has not changed much at all during the past year so i had the same stuff open with the 13min workunits a few months ago. will be interesting to see if the 3 digit numbers move into 2 digit though on the tasks report.The bigger multibeam workunits started about a month or two ago.
hehe thats about the time i started noticing issues. maybe they're not issues afer all.i hate powermizer myself but i cannot find any options to turn it off and leave the card in high perf mode at all times. every time i spot check it its always in hi perf mode so maybe my temps are not high enough to trigger it (assuming temp is its onlly trigger) and if idle is a trigger, my desktop is never idle even when i go to bed, all the gkrellm monitors are advancing their graphs every second.Many people have tried many ways to turn off powermizer usually with no success. :D Powermizer levels are triggered by GPU usage or very high (95+°C) temperatures.
ok well i hardly do anything involving true graphics besides cuda running on that stuff and i have my hardware monitors set to shut the system down if the gpu gets to 80c.. once i adjusted the fans and air flow in the case they have never gone above 70c.seems so strange with all the mb servers down, my cuda cards are both idling at around 46c. really odd since i am used to them being in the low or mid 60s all the time.I have some WUs cached for a few days more ;D
You can switch those VHARs to your graphics cards.
These result overflows are common. You should pay attention though if you generate a lot of these while your wingmen return "good" results. In this case it could be a hardware problem in your part and these results will be invalid.
After a small hiatus I'm back :)
... snip ...
@riofl and lordvader about kernel versions
Have you compiled these kernels yourselves or have you got them from elsewhere? Maybe some performance/optimization options you left out? Do you have any nvidia related errors in your syslog when running cuda? Any other observations with these newer kernels?
About the kernel versions. I compiled the 2.6.30 kernel using the same config used in Ubunutu's 2.6.28 kernel. I used the same config on the 2.6.27 kernel, which currently gives me equivalent CUDA performance, as well as stable Astropulse results.
credit granted is the same for a given workunit whether it is processed by cpu or gpu correct?Yes.
my thinking is that since the vhar are short, only taking my gpu about 12 min to process what would be the harm in assigning all vhar to gpu to speed up things?Some people say that CPU is more efficient for VHARs than GPU but can't see their reasoning.
That's right, same config for all of them.
I've downloaded the precompiled kernels, but won't get a chance to try them till the end of the weekend, at the earliest.
Processor type and features --->
[*] Supported processor vendors --->
[*] Support Intel processors
[ ] Support Cyrix processors
[ ] Support AMD processors
[ ] Support Centaur processors
[ ] Support Transmeta processors
[ ] Support UMC processors
credit granted is the same for a given workunit whether it is processed by cpu or gpu correct?Yes.
As noted by people over at S@H forums, the GPU seems to claim credit that is about 30% higher than what the CPU does. If your wingman use the CPU you will be granted the lower value but if your wingman use GPU like you, you will receive the higher value it seems.
Shameless plug:
I've managed to climb to the #3 spot in the Top hosts (http://setiathome.berkeley.edu/top_hosts.php) list. This is probably the highest I'll ever be so I'll savour the moment. :D
PDF attached for future proof.
...
SETI@home MB CUDA_2.2 608 Linux 64bit SM 1.0 - r12 by Crunch3r :p
VLAR autokill mod
...
cufft: ERROR: /root/cuda-stuff/sw/rel/gpgpu/toolkit/r2.1/cufft/src/execute.cu, line 1070
cufft: ERROR: CUFFT_EXEC_FAILED
cufft: ERROR: /root/cuda-stuff/sw/rel/gpgpu/toolkit/r2.1/cufft/src/execute.cu, line 1070
cufft: ERROR: CUFFT_EXEC_FAILED
cufft: ERROR: /root/cuda-stuff/sw/rel/gpgpu/toolkit/r2.1/cufft/src/cufft.cu, line 151
cufft: ERROR: CUFFT_EXEC_FAILED
...
I've installed Cuda Driver 185.18.14 (2.2? from the nvidia website). Previously installed I had 185.18.36.185.18.14 is older than 185.18.36, why rollback? Also try cuda 2.3 with 190.xx driver, it's faster than 2.2.
well problem is i am now spoiled.. i had an 8600gt 256mb card i used for my desktop and ran the tesla for cuda before i got the 285. the 285 is several orders of magnitude better in desktop performance. i think i would rather just replace the tesla with a 2nd 285 and let that one crunch full speed and let this one do as it can. would still be a large improvement over the tesla in the 2nd slot. either that or maybe buy a motherboard with 3 slots that can take 3 of these cards leaving room for them to breathe and get a gtx 260 to use for my desktop and minor cuda crunching and let both 285 have at it full steam. i expect the 260 should be up to the task for my desktops.Or maybe get a GTX295 in place of tesla and no need for a new motherboard.
TO ALL
Please see this thread (http://setiathome.berkeley.edu/forum_thread.php?id=55317) and take proper action (abort those workunits): I've lost quite a few credits because of this.
Badly-prepared data is actually pretty rare at SETI - that's why I made such a point of drawing that set to Eric's attention.
The point Sunu was making is that those WUs don't error out while crunching: they run full duration, and then error out when the time comes to upload the results. That's why they're a waste of time.
yeah a 295 is an option. then it can dual crunch away and let the 285 'limp' along :P.. an idea to consider.. i suppose i could take the $ for that from my savings for my new project next year..Well if you want RAC right here, right now a GTX295 is probably your best choice with your current setup (as it was the case with mine). I was "forced" to upgrade because my GTX280 burned about 2 months ago.
any preferences in brand on the 295? i was thinking of going with xfx only because my 285 is an xfx black edition.. also looked at asus and evgaMy GTX280 that burned was EVGA but I stayed with them. Both my GTX285 and GTX295 are EVGA. XFX should be good and ASUS even more. If you take the plunge and buy now prefer someone who gives you a step-up option as you might catch the new NVIDIA cards when they will be released, so EVGA, BFG or XFX (don't know if XFX has a step-up program).
yeah a 295 is an option. then it can dual crunch away and let the 285 'limp' along :P.. an idea to consider.. i suppose i could take the $ for that from my savings for my new project next year..Well if you want RAC right here, right now a GTX295 is probably your best choice with your current setup (as it was the case with mine). I was "forced" to upgrade because my GTX280 burned about 2 months ago.
You have a future project in mind so if I were you I would pursue that. End of 2009 beginning of 2010 we will have the update to Nehalem processors while NVIDIA is going to release its next generation of graphics cards (about Christmas time) with the next generation dual card probably in the 1st quarter of 2010.
any preferences in brand on the 295? i was thinking of going with xfx only because my 285 is an xfx black edition.. also looked at asus and evga
My GTX280 that burned was EVGA but I stayed with them. Both my GTX285 and GTX295 are EVGA. XFX should be good and ASUS even more. If you take the plunge and buy now prefer someone who gives you a step-up option as you might catch the new NVIDIA cards when they will be released, so EVGA, BFG or XFX (don't know if XFX has a step-up program).
that one is hopefully going to give any cray supercomputers a run for their money.
that one is hopefully going to give any cray supercomputers a run for their money.
I hope we'll have a linux machine in the #1 spot of the top hosts list.
Sunu,
I just reporting back to update you on my Fedora Core 10 64bit machine. I have successfully installed the 2.3cuda libraries/stuff and the machine has been chugging through the workunits.
I have a couple of questions, one of which is worrying. Occasionally over the past week the machine has locked up and only a reset has cleared the issue. I had a look at /var/log/messages and I see a number of NVRM:Xvid messages. I've googled around and didn't get a clear answer, so does anyone here have an idea? Here are the entries:-
Sep 17 17:11:41 fc10 kernel: NVRM: Xid (0002:00): 13, 0003 00000000 000050c0 00000368 00000000 00000100
Sep 18 04:58:05 fc10 kernel: NVRM: Xid (0002:00): 13, 0003 00000000 000050c0 00000368 00000000 00000100
Sep 18 05:11:07 fc10 kernel: NVRM: Xid (0002:00): 13, 0003 00000000 000050c0 00000368 00000000 00000100
The second question, is there anyway to ensure that the card is being used to the max, is there any tuning or monitoring of the card that would assist?
Thanks
Ian
Shameless plug:
I've managed to climb to the #3 spot in the Top hosts (http://setiathome.berkeley.edu/top_hosts.php) list. This is probably the highest I'll ever be so I'll savour the moment. :D
PDF attached for future proof.
Sunu,
The XVID problem I reported was in a system that didn't have any Windows Manager running, it was running in mode 3, not 5, the machine was just basic vt100. However there did seem to be a hardware problem, and last Sunday the disk packed up. Today after carefull reinstall, modifications with logical volumes and mounting I've managed to get the machine back to a workable state. I've installed the two Cuda 2.3 packages. However I'm missed/messed something as my tasks keep aborting. The ldd of the seti executable seems ok but as I say the thing fails. What have I done wrong in the attached task error output?
<core_client_version>6.6.36</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
SETI@home MB CUDA_2.2 608 Linux 64bit SM 1.0 - r12 by Crunch3r :p
VLAR autokill mod
setiathome_CUDA: Found 1 CUDA device(s):
Device 1 : GeForce 9600 GT
totalGlobalMem = 536608768
sharedMemPerBlock = 16384
regsPerBlock = 8192
warpSize = 32
memPitch = 262144
maxThreadsPerBlock = 512
clockRate = 1600000
totalConstMem = 65536
major = 1
minor = 1
textureAlignment = 256
deviceOverlap = 1
multiProcessorCount = 8
SIGSEGV: segmentation violation
Stack trace (17 frames):
setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu(boinc_catch_signal+0x43)[0x485ef3]
/lib64/libpthread.so.0[0x60880f0]
/usr/lib64/libcuda.so.1[0xb8d980]
/usr/lib64/libcuda.so.1[0xb933c4]
/usr/lib64/libcuda.so.1[0xb63557]
/usr/lib64/libcuda.so.1[0xb0ecf7]
/usr/lib64/libcuda.so.1[0xb2052b]
/usr/lib64/libcuda.so.1[0xb05940]
/usr/lib64/libcuda.so.1[0xafea8a]
/usr/lib64/libcuda.so.1(cuCtxCreate+0x57)[0xb59187]
setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu[0x5bf335]
setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu[0x413c5b]
setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu[0x41f68d]
setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu[0x42b54d]
setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu[0x408707]
/lib64/libc.so.6(__libc_start_main+0xe6)[0x6c9d546]
setiathome-6.08.CUDA_2.2_x86_64-pc-linux-gnu(__gxx_personality_v0+0x219)[0x408349]
Exiting...
</stderr_txt>
]]>
Thanks Ian!
Shameless plug:
I've managed to climb to the #3 spot in the Top hosts (http://setiathome.berkeley.edu/top_hosts.php) list. This is probably the highest I'll ever be so I'll savour the moment. :D
PDF attached for future proof.
And now we have two Linux machines among the top 20. Don't know yet how high it will reach though... :D
ok you say you installed cuda 2.3 libraries and the 2.3 v190 series driver right (earlier drivers won't work) ?
your error report says the app is cuda 2.2 so it will error. the app must also be cuda 2.3 compliant. those who explained things to me insisted that the driver, toolkit and app must use the same cuda version. i don't believe there is such a thing as 'backward compatibility' with cuda.
ls -l /usr/lib64/libcuda*
]There's a third Linux computer among the top 20 hosts now but it's not me this time though. I will however fight his 4xGTX275 with my single GTX295! ;D
There's a third Linux computer among the top 20 hosts now but it's not me this time though. I will however fight his 4xGTX275 with my single GTX295! ;D
I don't think so. It is relatively new host (since it was upgraded) and that's why it have lower RAC. It currently generating a lot more points http://pl.boincstats.com/stats/host_graph.php?pr=sah&id=5011059 ;D
Compiling my already über optimized 2.6.31-kernel with some -floop-interchange or -floop-strip-mine will take care of that... or I'll just throw in another 295. :D Nice to see another Gentooer on the list though... but I hope the heat in your room makes your skin curl up and peel off! ;D
Darn, darn, darn! ;D Well, I guess I have to let you hunt down Vyper by yourself then. I wish you luck. :D Would be nice with a Linux computer on top and a majority of them occupying the top 10...
i have been doing a lot of studying on the 285 vs 295 battle that has been going on in my brain. each element of the 295 is slower than the 285 by a reasonable margin (approx 160gflops difference.. 285=1062gflops while 295=894gflops per element). the thing the 295 has is 'density' to make up for it. so even if it takes longer to do a wu than the 285 does, it can do 2 of them in the same package in an attempt to make up for it which works ok.. wonder if there is an extended length motherboard out there that will take 8 pcie devices with a reasonable distance spread (at least 1/2 - 1 in between mounted devices)? there are cases available that can handle this, but i have not found a mobo that can.. for raw speed i would be more inclined to put 8 285 in something than 4 295. thoughts? i have a feeling i am not accounting for something here besides the obvious power requirements and cost savings of 1 295 vs 2 285...
maybe i should be looking into addon pcie density expansion like the nvidia supercomputer appliances do,putting 4 teslas into a single pcie slot.. wonder if empty appliance devices are available....
i have been doing a lot of studying on the 285 vs 295 battle that has been going on in my brain. each element of the 295 is slower than the 285 by a reasonable margin (approx 160gflops difference.. 285=1062gflops while 295=894gflops per element). the thing the 295 has is 'density' to make up for it. so even if it takes longer to do a wu than the 285 does, it can do 2 of them in the same package in an attempt to make up for it which works ok.. wonder if there is an extended length motherboard out there that will take 8 pcie devices with a reasonable distance spread (at least 1/2 - 1 in between mounted devices)? there are cases available that can handle this, but i have not found a mobo that can.. for raw speed i would be more inclined to put 8 285 in something than 4 295. thoughts? i have a feeling i am not accounting for something here besides the obvious power requirements and cost savings of 1 295 vs 2 285...
maybe i should be looking into addon pcie density expansion like the nvidia supercomputer appliances do,putting 4 teslas into a single pcie slot.. wonder if empty appliance devices are available....
You may also consider this card http://www.asus.com/product.aspx?P_ID=3OXEUQmsHmmewEyu&templete=2 if you have enough money. ;D
But generally i observed that it's not so big difference in Seti speed between 260sp216 and 275 and 285 compared to difference in price. So if theoretical speed difference is about 20% you should be happy if you see only about half of that in Seti. That's because they are peak value of GPU capability, but real computation depends also on cpu, bus, memory speed and even more on application architecture.
hmm yeah ... my basis is on integer gflops since i have not found double precision gflops comparisons. basing performance comparisons between my tesla at 933 integer gflops and my 285 at 1062 integer gflops, boinc displays them as 74 and 127 gflops respectively. now, considering the 295 is slower in integer gflops per processing system (894 each) than the tesla, i would expect it would display less than 74gflops each half.
which basically means that for a given card, a 295 using both halves will only give approximately 50-60% higher performance in total then a single 285 which makes me curious about its value other than accepting that 50% more per physical device is preferable. i just wonder if since the 295 is essentially supposed to be 2x 285 with slightly degraded performance why it is so? it has 4 less pixel shaders (28 vs 32) and smaller memory bus width (448 vs 512 which to me is the most major item). although these vary by mfgr, in general the 295 also has slower default clock speeds. admittedly lower clock speeds will help with eliminating heat buildup, but instead of using the same default heatsink assy, put a better designed one on to compensate and keep the performance up. guess i just wonder why its design doesn't make a lot of sense or maybe i am in wishful thinking mode that it 'should' be a 2x full 285 units when in fact it is 2x crippled 285 units.
I've been using BOINC 6.6.11 for awhile now, to make sure it handles my multi-GPUs of different types. Is there any newer version that will also do this yet? Sunu, I think you were also using 6.6.11...
cd
svn co http://boinc.berkeley.edu/svn/tags/boinc_core_release_6_10_11
cd boinc_core_release_6_10_11
./_autosetup
./configure --disable-server --disable-fcgi --enable-unicode CFLAGS="-march=core2 -O2 -pipe" CXXFLAGS="-march=core2 -O2 -pipe"
make
cd packages/generic/sea
make
rm BOINC/libcudart.so
cp -rv BOINC/* ~/BOINC/
cd ~/BOINC
./boinc --allow_remote_gui_rpc --daemon
cd ~/BOINC
./boinccmd --quit
Roifl and Sunu,IanJ, have you solved your problems? What version exactly are your nvidia drivers? You can try anyone of 190.18, 190.25, 190.32, 190.36, to see if those xid errors go away.
Just an update. It looks like the copying of the seti cuda executable into the /usr/sbin directory finally got it to calm down and start crunching.
The NVRM Xid issue continues but now doesn't lock up the machine. It's been up nearly a week without lookup, but I've seen eight in the past three days. As the machine continues on happily I'll forget about it for now. During the reinstall last week I took off the expansion card blanking plates (this machine has only one card in it, the 9600GT) so the machine can get a bit more air.
Thanks for your help!
Ian
your error report says the app is cuda 2.2 so it will error. the app must also be cuda 2.3 compliant. those who explained things to me insisted that the driver, toolkit and app must use the same cuda version. i don't believe there is such a thing as 'backward compatibility' with cuda.The driver and libs have to support each other but not the app. The app can be a lower number (previous version) with no problems but can't be a higher number. So the duo driver/libs have backwards compatibility. At least that's how it seems right now.
And now we have two Linux machines among the top 20. Don't know yet how high it will reach though... :DCongrats pp! Welcome to the top 20 hosts club!
I don't think so. It is relatively new host (since it was upgraded) and that's why it have lower RAC. It currently generating a lot more pointsb0b3r is that third linux machine yours? Congrats to you too!
I've been using BOINC 6.6.11 for awhile now, to make sure it handles my multi-GPUs of different types. Is there any newer version that will also do this yet? Sunu, I think you were also using 6.6.11...Yes I'm still using 6.6.11. 6.6.40 (the recommended version) should have proper multi-gpu support and also 6.10.11 that Richard says is now available. Link http://boinc.berkeley.edu/download_all.php I've seen many reports with problematic scheduling for 6.10.x but I think most of them have been solved.
Are these machines dedicated crunchers or you use them also as your desktops?For the moment it's a dedicated cruncher but it's supposed to be my regular desktop computer. I have still to fit some more disks, serial I/O cards and deal with the air flow inside the box. It just breaks my heart to stop it from crunching to do that ;D
It just breaks my heart to stop it from crunching to do that ;D
whoah. seems like things are moving fast :) that 8 gpu machine must be able to cook a turkey placed behind it or at least heat a small home! those cards are entirely too close. each one will cascade more heat into the next one until the end one must run well over 130c! that thing would have to run in an ambient temp environment of near 0c to keep all those puppies cool.That's why I'm saying that for multi multi-gpu installations it's better a water-cooling setup.
btw for those talking about boinc versions, i have been running 6.9.0 compiled from source and it has worked fine for multiple gpus... the scheduler is a bit odd in that it will refuse to pickup up work units until it is down to less than a few hundred then it goes into panic mode and tries to continuously request them until it gets its quantity back up, rather than begin asking for more when the queue reaches around 50% which seems the sensible place to refill.As I said to Tye, you could try 6.6.40 or 6.10.11, they should have proper multi-gpu support.
btw for those talking about boinc versions, i have been running 6.9.0 compiled from source and it has worked fine for multiple gpus... the scheduler is a bit odd in that it will refuse to pickup up work units until it is down to less than a few hundred then it goes into panic mode and tries to continuously request them until it gets its quantity back up, rather than begin asking for more when the queue reaches around 50% which seems the sensible place to refill.
You could also consider ready for water cooling cards like this (http://www.evga.com/products/moreInfo.asp?pn=017-P3-1297-AR&family=Geforce%20200%20Series%20Family) or this (http://www.bfgtech.com/bfgegtx2951792h2ocwbe.aspx) or, for even easier water-cooling, plug 'n play, self-contained water-cooled cards like these (http://www.bfgtech.com/bfgrgtx2951792h2ocle.aspx).
i can see the need for water cooling when packing those babies together but i think i can get away without water on the standalone replacement for my tesla.
heh... might as well go for broke and get this case to put it all in :PI don't think cooling-wise it'll have many big advantages over "standard" cases.
absolutely the greatest 'style' ive ever seen!
http://www.ttlevel10.com/
i can see the need for water cooling when packing those babies together but i think i can get away without water on the standalone replacement for my tesla.
Of course, I'm talking about watercooling only for your multi multi-gpu project. For a single card you can do without it.heh... might as well go for broke and get this case to put it all in :PI don't think cooling-wise it'll have many big advantages over "standard" cases.
absolutely the greatest 'style' ive ever seen!
http://www.ttlevel10.com/
an alternative would be a mountainmods u2 ufo duality case... holds 2 complete computer systems, 4 power supplies and enough fans to create a tornadoThat's what I was going to tell you!!! If you want a truly high end case go Mountain Mods and never look back.
or i could just settle for another of my boring workstation case, a thermaltake armor series full tower but the newer one with the dual large fans over the gpus instead of one big one in the middle of the side cover. dont get me wrong its a fantastic case, but its becoming a bit too 'traditional' for my tastes :)I currently have an oldie thermaltake xaser III. It definitely shows its age, since when it was designed the thermal requirements were way lower than today and I have moded it quite a lot to keep my current system cool and it still can't keep up.
i have always loved mountainmods.. i just love the look of the level10. maybe sometime ill just get one of those to put my workstation into. dunno, but i had mostly made up my mind for the ufo series for the new project. especially since if i can find a mobo with enough extended pci slots i can space the cards out more and order a special build on the ufo to cover that.an alternative would be a mountainmods u2 ufo duality case... holds 2 complete computer systems, 4 power supplies and enough fans to create a tornadoThat's what I was going to tell you!!! If you want a truly high end case go Mountain Mods and never look back.
or i could just settle for another of my boring workstation case, a thermaltake armor series full tower but the newer one with the dual large fans over the gpus instead of one big one in the middle of the side cover. dont get me wrong its a fantastic case, but its becoming a bit too 'traditional' for my tastes :)
I currently have an oldie thermaltake xaser III. It definitely shows its age, since when it was designed the thermal requirements were way lower than today and I have moded it quite a lot to keep my current system cool and it still can't keep up.
if i can find a mobo with enough extended pci slots i can space the cards out more and order a special build on the ufo to cover that.Unfortunately if you go multi (4x) dual-slot cards they'll have to sit one right next to each other so no matter how strong the air circulation is, it is very difficult to enter the tiny space between the cards. That's why I'm telling you to go watercooling for your super project. But that's what the Mountain Mods cases are built for: extreme watercooling setups. So another reason to go Mountain Mods.
what helped mine was i put 2 more drive bays in the front so there are 3 fans and i replaced the rear 120mm led fan with a 110cfm adjustable fan.. those 2 additions made all the difference.120mm? What is that? :D In the days of xaser III, 80mm were more than enough! I wish my case had 120mm fans.
you speak of trying other higher 190.xx versions, where do I get them from the Nvidia download site, as it seems only to have one version, 190.18, the one I'm using?
Thanks
Ian
But I've just switched over to a 2.6.27 kernel, which also has the CPU tasks at 19, and CUDA at 10, and everything seems to be running fine ...
* * * * * renice 0 `pgrep setiathome` >/dev/null
What priority do your seti apps (CPU and GPU) run?
In sidux (uses 2.6.31-5) I've seen way too slow crunching with the default priorities, 19 for CPU and 10 for GPU tasks. Renicing the GPU tasks to 0, they speeded up considerably. Maybe newer kernels need more aggressive priority levels for cuda.
I'm using the script attached below to renice the cuda tasks to 0 (it runs in infinite loop, checking every 5 seconds for seti cuda tasks, renicing them to 0).
can you run gpus at higher levels sucn as -1 or so or is it the nature of the gpu system to not go below 0? just wondering if there is only marginal benefit at running them at -1 or maybe even -5.You can run them at -5 if you like but I didn't notice any difference in speed running them at -5 compared to 0.
can you run gpus at higher levels sucn as -1 or so or is it the nature of the gpu system to not go below 0? just wondering if there is only marginal benefit at running them at -1 or maybe even -5.
i did notice an apparent slow down using -5 so i went back to 0 but again that was short term.So we see the same result, at least with my ubuntu machine using the old, 2.6.27-14 kernel.
Well I had the units run overnight with the renice script running in the background, and it really made no difference.
CUDA workunits under the 2.6.31 kernel take between 2-4 times longer than they should. I may add, that while these units are running, compositing effects (such as wobbly windows and other eye candy) are extremely stuttery. This wasn't the case with the 2.6.27 kernel.
I'm running a phenom 955 CPU, GTX 275 GPU.
zgrep DEBUG /proc/config.gz
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_SLUB_DEBUG=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_X86_CPU_DEBUG=m
CONFIG_PM_DEBUG=y
CONFIG_IRDA_DEBUG=y
CONFIG_CFG80211_REG_DEBUG=y
CONFIG_CFG80211_DEBUGFS=y
CONFIG_MAC80211_DEBUGFS=y
CONFIG_WIMAX_DEBUG_LEVEL=8
CONFIG_PNP_DEBUG_MESSAGES=y
CONFIG_CB710_DEBUG_ASSUMPTIONS=y
CONFIG_AIC7XXX_DEBUG_ENABLE=y
CONFIG_AIC7XXX_DEBUG_MASK=0
CONFIG_AIC79XX_DEBUG_ENABLE=y
CONFIG_AIC79XX_DEBUG_MASK=0
CONFIG_SCSI_MVSAS_DEBUG=y
CONFIG_SCSI_LPFC_DEBUG_FS=y
CONFIG_SCSI_DEBUG=m
CONFIG_FIREWIRE_OHCI_DEBUG=y
CONFIG_MLX4_DEBUG=y
CONFIG_ATH9K_DEBUG=y
CONFIG_LIBIPW_DEBUG=y
CONFIG_B43LEGACY_DEBUG=y
CONFIG_WIMAX_I2400M_DEBUG_LEVEL=8
CONFIG_ATM_FORE200E_DEBUG=0
CONFIG_USB_SERIAL_DEBUG=m
CONFIG_INFINIBAND_MTHCA_DEBUG=y
CONFIG_INFINIBAND_AMSO1100_DEBUG=y
CONFIG_INFINIBAND_IPOIB_DEBUG=y
CONFIG_THINKPAD_ACPI_DEBUGFACILITIES=y
CONFIG_OCFS2_DEBUG_MASKLOG=y
CONFIG_JFFS2_FS_DEBUG=0
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_SCHED_DEBUG=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_RODATA=y
interesting... doesnt look like i will have the large increase you did but my rac went down by about 100 points but my pending went up by almost 300 points.. wonder why pending increases when running more aggressively?
Interesting ...
Here's the output.Code: [Select]CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_SLUB_DEBUG=y
<snip>
interesting... doesnt look like i will have the large increase you did but my rac went down by about 100 points but my pending went up by almost 300 points.. wonder why pending increases when running more aggressively?
Because you complete them faster than your wingmate.
Typically, when testing new builds, you start with an empty cache (at least that's the way you should do it). So the run starts quickly. And if the build is any good, it'll finish quicker too ;D.
The ones with bad times were from the stock Kubuntu 9.10 kernel.
are they still supplying larger workunits? i remember a few months ago my average was 11 to 13 min consistantly, now its averaging 28 to 29 min consistantly.
Holy €^#! Nez has a serious problem!Eric's FAQ on the subject, in the staff blog, at http://setiathome.berkeley.edu/forum_thread.php?id=56450
is there a link for a setiathome 6.08 app not a vlar killer for cuda 2.2+?No, because there is no such app.
all i can find is a 64bit app dated january 2009. anything newer?No. If you want a non-VLAR-kill app that's the one to get. It's only a tiny bit slower than the 2.2 so you won't be missing much.
riofl, is the computer 4166601 ( http://setiathome.berkeley.edu/show_host_detail.php?hostid=4166601 ) yours?
The error cudaAcc_find_triplets erroneously found a triplet twice in find_triplets_kernel is a "normal" one. There is nothing in it.
The errors from that computer's page are interesting. They occur right after the "preparatory" phase in the CPU and when the GPU was supposed to take over. I've checked a few and all seem to happen in your "good" GTX285 card and not in the problematic tesla card. am I right?
If I remember correctly you were experiencing unusually high run times in your GPUs, does it still happen?
There is definitely something not right with the setup of this computer.
I think I've asked you before and you have told me the brand of your motherboard, can you remind me?
i think my workstation is using more resources than i think it does and the gtx285 is simply overwhelmed if i have kde options enabled and does not have enough resources for seti.With 1 GB ram I think you should be safe. Still, even if it hadn't memory left it should throw an out of memory error message and switch to CPU computing, not error out completely.
...
the computing errors were happening just as the gpu was supposed to take over. that was when i had all the 'cute' features of kde4 enabled which included dimming of unfocused windows and cube desktop switching and several other things including sharpen desktop (all experimental to see what it was like to use a workstation that had glitz enabled). i also use dual 24" monitors each at 1920x1200 using nvidia twinview option so i am sure that takes up a bit of vid resources as well. i also use different backgrounds on each of 9 desktops, same image loaded in each monitor/desktop.
my times now are averaging 16-18 min off the tesla and 19-22min off the gtx285 . much better than previously at around 30 min. my scores have finally climbed to near 15k like you said they should be.Maybe you could do better, 20000+ RAC ;)
once i disabled the glitz and glitter options and did a power down restart to allow everything to clear and changed back to the older non vlar killer app, all the errors stopped.If you were looking in your errors page and didn't see new errors that was because they haven't been updated since 17th January, not because there weren't new errors.
...
things have been stable for the past 20 hours or so :)
cpu and ram voltages are stock factory recommendations. instead of autoMaybe this isn't enough? What are their values? For cpu voltages don't look at bios, see the real value with 100% CPU utilization under seti.
since i readjusted everything back to standard dull desktop :)Personally I don't like kde's effects now that I've seen them in sidux and I've them also switched offf. Compiz effects are way better I think.
I've also started to see reports from users of Mac OS X, who have just gained the ability to run Einstein on CUDA - or not, if they only have 512MB. One poster attributed the loss of 125MB available memory (512MB --> 387MB) to OS effects alone.1GB video RAM might not seem excessive any more but the bare minimum? ::)
20k+ rac huhj? might be pushing this puppy a little bitWhy not? :)
i have only used the voltages in the bios.. under load i have nothing that reads them properly. for some reason lm sensors and gkrellm report the voltage sensors are in error.. for example... 2.85v for the 12v line? nope.. nada... only voltage readouts that make any sense are the ram and some cpu voltages but i am guessing they are that since they are only labelled in1 in2 in3 in4 etc... the only thing i know for sure is correct is in1 as ram voltage. it matches what the bios says., and the fans and temps. temp1 is the mosfets and temp2 is the southbridge. i discovered that with a hair dryer against the chips.. and discovered what fanx belonged to which fan by unplugging the fan to see which one dropped to 0.
20k+ rac huhj? might be pushing this puppy a little bitWhy not? :)i have only used the voltages in the bios.. under load i have nothing that reads them properly. for some reason lm sensors and gkrellm report the voltage sensors are in error.. for example... 2.85v for the 12v line? nope.. nada... only voltage readouts that make any sense are the ram and some cpu voltages but i am guessing they are that since they are only labelled in1 in2 in3 in4 etc... the only thing i know for sure is correct is in1 as ram voltage. it matches what the bios says., and the fans and temps. temp1 is the mosfets and temp2 is the southbridge. i discovered that with a hair dryer against the chips.. and discovered what fanx belonged to which fan by unplugging the fan to see which one dropped to 0.
I agree that lm-sensors reports most of the voltages incorrectly, with the rest I'll have to disagree.
There are no RAM voltages in those values, in fact I don't think there is a utility that can show them, windows, linux or whatever.
temp1 mosfets? Maybe, if you watercool them. Mosfets go high, really high, 100+ °C.
From the values you posted, the two that resemble your CPU voltage (vcore) is in0 (1.22) and cpu0_vid (1.219). The "cpu0_vid" is a very interesting name. "VID" is something like a default voltage for the chip. The lower it is the more overclockable the chip is. A Q6600 with a VID of 1.219 is very very good. My Q6600 with a VID of 1.2750 (average for this chip) has easily gone to 3.24 GHz.
The thing is I don't think lm-sensors can show the VID of a chip and it is just the vcore with a fancy name. Now if we assume that vcore=in0=cpu0_vid=1.22, I think it is a little low for 3GHz. Maybe try 1.24-1.25volts. Still it depends on the VID of the chip. If the VID is really 1.219 then 1.22 is not necessarily bad. Then again I don't think the VID can take such a value (1.219), it goes with increments.
Is this machine dual boot with windows by any chance?
Lets do something else. Post again the above values with 0% cpu utilization (idle, nothing runs) and 100%.
Hello
Can you also include output of "nvidia-smi -lsa" command when gpus are on load?
Well, you never know, seems like cpu0_vid is the VID after all. Still, I'd like to see it from windows too.
With this VID and this motherboard I think you can easily go to3.6GHz if you have a decent cooler.
Well, you never know, seems like cpu0_vid is the VID after all. Still, I'd like to see it from windows too.
With this VID and this motherboard I think you can easily go to3.6GHz if you have a decent cooler.
Don't bother with windows but if you're going to do it anyway or for future reference:
CPU-Z from http://www.cpuid.com/cpuz.php Among other things it will show the vcore.
Core Temp http://www.alcpu.com/CoreTemp/ Most importantly it shows the VID of the processor.
Prime95 http://www.mersenne.org/freesoft/ For stability testing and to load the processor 100%.
As for the cooler, it's decent. Personally I don't like my fans autothrottled and don't connect them on the motherboard. I connect them directly to the PSU cables for 100% speed always.
Ok. Be sure to check how the vcore changes between 0% and 100% cpu utilization. And we'll see if lm_sensors reports the correct VID.
In prime95 you'll need to start 4 threads for your quad core. It has three tests: small FFTs, large FFTs and a mixed mode. Choose small FFTs.
What voltage did you use when you're running at 3.6 GHz?
if i remember correctly (it has been almost 2 yrs since i did this) i believe i upped the VID to 1.3 v or so.Please refer to it as vcore. VID is constant something like a characteristic printed on the chip. 1.3 volts is very very good for 3.6Ghz. I also use 1.3 volts but for 3.24 Ghz.
but since i always really wanted a 3.0 ghz machine but at the time could not afford the processor, and this can safely go to that speed i decided to go against my beliefs so that i had the best of both worlds for myself. :)That was also the reasoning beside my decision. I settled for 3.24Ghz. I didn't want to go beyond that since my motherboard is very basic.
interesting.. didnt know freesoft had a prime95 for linux 64. got that too just to have it. could be useful.
so in0 is the Vcore and its voltage matches the cpu0_vid... which i find out appears to be a static listing. it never moves so maybe it reads something from the cpu? dunno.
the sensors core0VID is the VID as reported by the chip itself. still there is a difference between linux and windows.Why do you say that? We were never sure about core0vid and as I said in another post, 1.219 as a VID, was a strange number anyway. Coretemp clears the things for us, your chip's VID is 1.2500 and that core0VID from lm-sensors is something else.
linux says it is 1.219 while coretemp reports it as 1.2500
under load after 4 iterations of the smal fft test1.216 volts under load? At 3Ghz? Might be dangerously low. Maybe not. See also below.
core0=56
core1=56
core2=51
core3=51
cpuz coreV=1.2500 initially then after a few seconds drops to 1.216
played around at 3.6ghz but since i have changed ram from the last time (old ram was crucial ballistix new ram is OCZ low voltage Blade series) I find that at 3.6ghz the lowest besides 1000 i can go is 1200 which the ram plainly complains even upped from 1.8 to 2.0 volts which is .1 v higher than recommended maximum. so i had no success in stability at 3.6ghz so i dropped it to 3.45ghz along with 1152 on the ram (base is 1066) and VCore set at 1.312v.What vcore at 3.6GHz? Might not be the memory's fault but too low vcore. You also say "vcore set at 1.312v". Where, in bios? Don't look there. Look at what cpu-z says under 100% load. That's the one to watch. Before I pencil moded my motherboard I had to set the vcore in bios at 1.4750 in order to achieve 1.3000 under 100% load.
the system passed small fft , large ffp and mixed bag with flying colors. i then booted into memtest86 and ran 2 iterations of its test and all passed just fine.How much time did you let prime95 run? Many people consider a system stable after 12, 18 or even 24 hours of prime95 load. For my overclock of 3.24Ghz I went over the board and ran prime95 for three days: one day small FFTs, one day large and one day mixed ;D Also for memtest, I think they recommend to also leave it for many hours to be sure that your ram is good and stable. Whenever I buy new ram the first thing I do is run memtest for 24 hours.
what do you run on your cpus? multibeam or astropulse? do you find astropulse gives higher credit than the equivalent time with MB would?There is no AP work now at all, so question highly theoretical one :P
what do you run on your cpus? multibeam or astropulse? do you find astropulse gives higher credit than the equivalent time with MB would?
it it filled me with more than 3000 workunits..
anyway, my times per workunit completion are noticably shorter, yet my scores for this machine have dropped from 18.6k to 17.5k and my pending credit jumped from 89k to 105k! is this an indication boinc isnt happy with the changes to this system?No, it means that your pending credit increased ;D
http://setiathome.berkeley.edu/results.php?hostid=4166601&offset=0&show_names=0&state=4
hmm i seem to be getting some invalids almost daily ... considering the number of work units i got swamped with do you think this points to a problem in my system?
is there a cuda 3.0 seti app available yet? i installed 3.0 toolkit but the existing app still uses 2.2
is there a cuda 3.0 seti app available yet? i installed 3.0 toolkit but the existing app still uses 2.2
YES LINUX x64 CUDA 3.0
http://calbe.dw70.de/mb/viewtopic.php?f=9&t=120
ok this is driving me insane :)On linux :
does anyone know where the boinc_client gui stores its list of computers in its pulldown? i have a mess of old ones in there and cannot find where it keeps them. i did a search in the /var/lib/boinc dir for the names of a few of the computers.. nothing... searched /etc, tried a locate for boinc and nothing shows up.
does boinc/seti-app support ATI Radeon cards yet? if so anyone have any performance results compared to nvidia? i have been looking at reviews and comparisons but outside of gaming graphs very little is to the point for this specific application.
what is dissapointing is the performance of the fermi cards compared to a gtx285 outside of gaming. the TDP is horrendously high for the performance points you appear to get.. until i see one actually in use in this application to compare average time differences i still tend to lean to the gtx285. the gtx470 isnt bad for TDP and is mildly higher than the 285 but the 480 appears to be a power hog for what it appears to give back.
the radeon cards appear to outperform nvidia in gaming and TDP but how do they compare when processing workunits?
ok this is driving me insane :)On linux :
does anyone know where the boinc_client gui stores its list of computers in its pulldown? i have a mess of old ones in there and cannot find where it keeps them. i did a search in the /var/lib/boinc dir for the names of a few of the computers.. nothing... searched /etc, tried a locate for boinc and nothing shows up.
Look for ".BOINC Manager" and in there the section starting with "[Compter MRU]" should contain what you are searching.
is anyone else getting these messages all of a sudden, or did something go wrong in my setup? i checked my app_info and all is as it has been..
Thu 10 Jun 2010 05:44:04 AM EDT SETI@home Message from server: No work sent
Thu 10 Jun 2010 05:44:04 AM EDT SETI@home Message from server: Your app_info.xml file doesn't have a usable version of SETI@home Enhanced.
Thu 10 Jun 2010 05:44:04 AM EDT SETI@home Message from server: (reached daily quota of 100 tasks)
It was broken actually, but instead of fixing only broken part DA brought whole package of changes to SETI main, most of them were not needed...
yet another reason why i believe firmly if it aint broke dont fix it :P
Still, I'm thinking of picking up a 460, but I did hear that the 400s weren't doing CUDA so well. Any advice/thoughts/benches?
In one of my systems I have a 8800 GT, 9600 GSO, 250 GTS all working together - very nice. Still, I'm thinking of picking up a 460, but I did hear that the 400s weren't doing CUDA so well. Any advice/thoughts/benches?
I crunshed several units. One was already correctly validated. The others have error messages:
<core_client_version>6.4.5</core_client_version>
<![CDATA[
<stderr_txt>
file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 506 : invalid configuration argument.
....
I have a GeForce 8800 GT. Whats wrong here ? :(
For windows version I would propose to update drivers. No idea if it applicable to Linux...
NVIDIA GPU 0: GeForce 8600 GT (driver version unknown, CUDA version 3010, compute capability 1.1, 255MB, 99 GFLOPS peak)
<prog>0.20329889</prog>
- Change the red number with something much lower, I don't know what a proper number for your card will be, maybe half of it.
- You have to make the cuda app run at 0 priority. I use the following script that runs in infinite loop:
I assume that this flops stuff is more or less aesthetics.Absolutely NOT! Check http://lunatics.kwsn.net/3-linux/seti-mb-cuda-for-linux.msg19829.html#msg19829 and http://lunatics.kwsn.net/5-windows/appinfo-flops-question.0.html
I've left toolkit 3.1 while installing 2.0 libraries systemwide as well. ldd shows that correct (2.0) libraries are then loaded by seti/cuda app.If you don't need them for other things DELETE the 3.1 and 2.0 and KEEP the 2.3.
I've updated toolkit to 2.3 now and Crunch3r's app to the one you proposed. I'll check and see how it works out.
- You have to make the cuda app run at 0 priority. I use the following script that runs in infinite loop:
Could this be reason for quite long run times on my card?
Meanwhile two results got verified and peers were running stock GPU app for Windows. One peer had GTX295, the other GTX285, both had from 5 to 8 times shorter run time. Or is such a speed difference expected between more modern and almost ancient GPUs?Of course a gtx295 or gtx 285 will be multiple times faster than a 8600gt. If they weren't, why people then would fork out multiple hundred euros/dollars to buy them?
I assume that this flops stuff is more or less aesthetics.Absolutely NOT! Check http://lunatics.kwsn.net/3-linux/seti-mb-cuda-for-linux.msg19829.html#msg19829 and http://lunatics.kwsn.net/5-windows/appinfo-flops-question.0.html
- You have to make the cuda app run at 0 priority. I use the following script that runs in infinite loop:
Could this be reason for quite long run times on my card?
Yes, you ABSOLUTELY have to run the cuda app at priority 0.
Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 497 : invalid configuration argument.
Seems benign though as most results have validated. Is there any particular reason for this error being reported and app seemingly still operating OK?i'm currently using nvidia-drivers-195.36.31. noticed an upgrade available to nvidia-drivers-256.52.
i'm always a bit suspicious of large jumps in upgrade versions. worth it? avoid it?
i'm currently using nvidia-drivers-195.36.31. noticed an upgrade available to nvidia-drivers-256.52.There have been quite a few releases between them, so not really a big jump. You can try it and if you don't like it, revert back.
i'm always a bit suspicious of large jumps in upgrade versions. worth it? avoid it?
settings of <avg_ncpus> and <max_ncpus> don't matter much (if at all)Wrong
settings of <flops> should not be too high or else BOINC bails out due to excessive resources (read: CPU/GPU cycles) being usedWrong
setting niceness of CPU-part of GPU task to 0 (normal priority) doesn't seem to affect things a lot, but doesn't hurt.It seems to depend on the kernel/distro used. Other systems seem to highly benefit from it, others not so much.
How so? I've tried some values between 0.00 and 0.15 and I haven't noticed any difference. The only time that I could imagine the difference to pop up is if there are multiple (probably more than 3-4) GPUs installed and used.settings of <avg_ncpus> and <max_ncpus> don't matter much (if at all)Wrong
If not, what then? My estimates are currently way too high (around 4 days) so I tried to fix it by changing <flops> value. If I set it 10-times larger, WUs erred out due to excessive resources used. Run time (wall) was roughly the same as for successful WUs, so I can attribute the error only to too high <flops> value.settings of <flops> should not be too high or else BOINC bails out due to excessive resources (read: CPU/GPU cycles) being usedWrong
settings of <avg_ncpus> and <max_ncpus> don't matter much (if at all)
Wrong
How so? I've tried some values between 0.00 and 0.15 and I haven't noticed any difference. The only time that I could imagine the difference to pop up is if there are multiple (probably more than 3-4) GPUs installed and used.
settings of <flops> should not be too high or else BOINC bails out due to excessive resources (read: CPU/GPU cycles) being used
Wrong
If not, what then? My estimates are currently way too high (around 4 days) so I tried to fix it by changing <flops> value. If I set it 10-times larger, WUs erred out due to excessive resources used. Run time (wall) was roughly the same as for successful WUs, so I can attribute the error only to too high <flops> value.
either today's batch of downloads is supposed to take a very long time for a gpu to complete or i have something going wrong. my fastest gpu is taking 3 hours 2 minutes to reach 85%! and even the others are taking 10 to 15 minutes longer on the other 2 gpus. all 3 gpu temps are also much lower than normal. typically they run 58-65c max load and i have not seen them rise above 50c in several hours.Check out your results:
is this a 'common' experience others are having too today or am i facing something going haywire?
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
SETI@home MB CUDA 3.0 6.09 Linux 64bit - r16 by Crunch3r :p
- thread priority mod
setiathome_CUDA: Found 3 CUDA device(s):
Device 1 : GeForce GTX 285
totalGlobalMem = 1073020928
sharedMemPerBlock = 16384
regsPerBlock = 16384
warpSize = 32
memPitch = 2147483647
maxThreadsPerBlock = 512
clockRate = 1476000
totalConstMem = 65536
major = 1
minor = 3
textureAlignment = 256
deviceOverlap = 1
multiProcessorCount = 30
Device 2 : GeForce GTX 295
totalGlobalMem = 939327488
sharedMemPerBlock = 16384
regsPerBlock = 16384
warpSize = 32
memPitch = 2147483647
maxThreadsPerBlock = 512
clockRate = 1345500
totalConstMem = 65536
major = 1
minor = 3
textureAlignment = 256
deviceOverlap = 1
multiProcessorCount = 30
Device 3 : GeForce GTX 295
totalGlobalMem = 939327488
sharedMemPerBlock = 16384
regsPerBlock = 16384
warpSize = 32
memPitch = 2147483647
maxThreadsPerBlock = 512
clockRate = 1345500
totalConstMem = 65536
major = 1
minor = 3
textureAlignment = 256
deviceOverlap = 1
multiProcessorCount = 30
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 285 is okay
SETI@home using CUDA accelerated device GeForce GTX 285
Cuda error 'cufftPlan1d(&fft_analysis_plans[FftNum], FftLen, CUFFT_C2C, NumDataPoints / FftLen)' in file './cudaAcc_fft.cu' in line 49 : no CUDA-capable device is available.
Cuda error 'cufftPlan1d(&fft_analysis_plans[FftNum], FftLen, CUFFT_C2C, NumDataPoints / FftLen)' in file './cudaAcc_fft.cu' in line 49 : no CUDA-capable device is available.
setiathome_CUDA: CUDA runtime ERROR in plan FFT. Falling back to HOST CPU processing...
setiathome_enhanced 6.01 Revision: 737 g++ (GCC) 4.2.1 (SUSE Linux)
libboinc: BOINC 6.11.0
Work Unit Info:
...............
WU true angle range is : 1.433000
Flopcounter: 11714606392639.039062
Spike count: 1
Pulse count: 0
Triplet count: 0
Gaussian count: 0
05:22:35 (16178): called boinc_finish
</stderr_txt>
Do you still make heavy use of your main graphics card?
What driver do you use?
Why cuda 3.1? I think you shouldn't use it. Cuda 3.x is intended for different software and hardware.
Do you raise the priority of the seti cuda app? Also why don't you use a newer graphics driver?
At least the app I'm running (x86, 2.2, vlar-kill) has a nasty habit of complaining:[/li]Code: [Select]Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 497 : invalid configuration argument.
Seems benign though as most results have validated. Is there any particular reason for this error being reported and app seemingly still operating OK?
I solved the issue. Apparently, setiathome-CUDA_3.0_6.09.x86_64_vlarkill doesn't raise the priority if boinc is run by user "boinc" - it needs admin privilege. any thoughts?Do you raise the priority of the seti cuda app? Also why don't you use a newer graphics driver?
Yah I found that was the problem - NICE is not playing nice at all ;D
Upgraded the driver and also used crunch3r cuda3 bins as the cuda2.2 doesn't have raised priority.
BTW, I have 1 rig that doesn't raised priority at system start-up (I also set a 20sec delay but doesn't work). I have to manually restart boinc so setiathome-CUDA_3.0_6.09.x86_64_vlarkill will be raised. My other rigs works perfectly fine though.
Maybe boinc user needs right to increase process priority ?Isn't it giving boinc user right to increase process priority the same as root? or is there a specific group that I need to add boinc to?
Why do you need boinc user and other stuff? Keep the garbage out of your system. Install it in your home directory and go from there.In windows service(protected) install required for BOINC work w/o user log on. If linux can launch user's apps (installed in particular user home dir ) w/o user logon - good.
My rig is running ubuntu server with no x-window, stripped down kernel (using localmodconfig - 9MB kernel) and boinc running as service. Add some startup script to initialize cuda and other stuff. Everything is running perfectly now, just boinc running as root - i don't think its a risk as i don't have anything in this rig - just pure cruncher :)Why do you need boinc user and other stuff? Keep the garbage out of your system. Install it in your home directory and go from there.In windows service(protected) install required for BOINC work w/o user log on. If linux can launch user's apps (installed in particular user home dir ) w/o user logon - good.
Also, it's good when some kind of autologon enabled. But if PC sits idle and just awaits when user come and do logon - it's not good.
My rig is running ubuntu server with no x-window, stripped down kernel (using localmodconfig - 9MB kernel) and boinc running as service. Add some startup script to initialize cuda and other stuff. Everything is running perfectly now, just boinc running as root - i don't think its a risk as i don't have anything in this rig - just pure cruncher :)Why do you need boinc user and other stuff? Keep the garbage out of your system. Install it in your home directory and go from there.In windows service(protected) install required for BOINC work w/o user log on. If linux can launch user's apps (installed in particular user home dir ) w/o user logon - good.
Also, it's good when some kind of autologon enabled. But if PC sits idle and just awaits when user come and do logon - it's not good.
If you read the first dozen pages of this thread, you'll see that the original 32bit Linux Cuda app was broken, it might be that the x86, 2.2, vlar-kill app isn't totally reliable eithier,At least the app I'm running (x86, 2.2, vlar-kill) has a nasty habit of complaining:[/li]Code: [Select]Cuda error 'GaussFit_kernel' in file './cudaAcc_gaussfit.cu' in line 497 : invalid configuration argument.
Seems benign though as most results have validated. Is there any particular reason for this error being reported and app seemingly still operating OK?
Can anybody comment on the error above? I don't think I've got any results without this 'error' while almost all of them validated OK. It's annoying to see thousands of lines of error while meaningful information gets truncated.
Edit: The old broken 32bit Linux Cuda app really should be removed from the first post.
[Mod:] Removed outdated build
Edit: The old broken 32bit Linux Cuda app really should be removed from the first post.
Done:Quote[Mod:] Removed outdated build
Was 64bit OK ? I can put that back if needed, or better yet, put it in a proepr place.
Any news on this?
Unfortunately no. All 32bit builds had the same error. If you can, run the 64bit app, else it's better not run CUDA for the time being.
Then I think somebody should add a warning to the original post or remove the 32bit app entirely.
Unfortunately I only have 32bit linux installed. The last time I tried 64bit I had a sound problem in some 32bit games, so I just reverted to 32bit. Perhaps I should give it a try again, but I don't think this will happen anytime soon.
[Mod:] Removed outdated/bad 32 bit build
If you read the first dozen pages of this thread, you'll see that the original 32bit Linux Cuda app was broken, it might be that the x86, 2.2, vlar-kill app isn't totally reliable eithier,
If you can, try swapping to a 64bit OS and use the 64bit app instead.
ldd setiathome-CUDA_3.0_6.09.x86_64_vlarkill
linux-vdso.so.1 => (0x00007ffff6fff000)
libcufft.so.3 => /usr/local/cuda/lib64/libcufft.so.3 (0x00007fa18cf7d000)
libcuda.so.1 => not found
libcudart.so.3 => /usr/local/cuda/lib64/libcudart.so.3 (0x00007fa18cd2f000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fa18ca29000)
libm.so.6 => /lib/libm.so.6 (0x00007fa18c7a6000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007fa18c588000)
libc.so.6 => /lib/libc.so.6 (0x00007fa18c205000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa18ed44000)
libdl.so.2 => /lib/libdl.so.2 (0x00007fa18c001000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007fa18bdea000)
librt.so.1 => /lib/librt.so.1 (0x00007fa18bbe2000)
Anyone tried setiathome-CUDA_3.0_6.09.x86_64_vlarkill with cuda32?
Anyone tried setiathome-CUDA_3.0_6.09.x86_64_vlarkill with cuda32?
Supposedly setiathome-CUDA_3.0_6.09.x86_64_vlarkill was done for fermi compatibility but when I tested it, it wasn't working.
Not surprising given the Fermi incompatibilities inherited from 'old stock' Cuda code (all platforms) are fairly specific, complex, and not related directly to Cuda version (apart from needing at least Cuda 3 for fermi binaries)When I tested it, it produced garbage results in all WUs except VLAR ones. It could run VLAR workunits in fermi GPUs and produce valid results.
FYI,Nice to hear that :)
Have got a few other responsibilities sorted today, so should hopefully be able to get some Linux guys on the case soon. No firm timetable yet, for Linux, but wheels are in motion. Will try to provide more info as something more tangible develops.
When I tested it, it produced garbage results in all WUs except VLAR ones. It could run VLAR workunits in fermi GPUs and produce valid results.
what i cannot understand is why then does gpu0 still work under the old syntax?
My problem is this. The system and BOINC both recognise there are 2 video cards and BOINC runs 2 GPU units. However they are both running on the one card. It's the card in the first PCIE socket that's doing all the work. The one in socket 2 stays dead cold. Using a cc_config file made no difference
Now I get it.Why didn't you just say you wanted the Driver version and the CUDA app ? :D
Give me the output of :
ps -o cmd --no-headers -p $(pgrep setiathome)
and
nvidia-smi -q
Why didn't you just say you wanted the Driver version and the CUDA app ? :D
Now I get it.Why didn't you just say you wanted the Driver version and the CUDA app ? :D
Give me the output of :
ps -o cmd --no-headers -p $(pgrep setiathome)
and
nvidia-smi -q
I've never used a distribution provided nvidia driver. I always use nvidia's own driver. That latest official version is 275.09.07. Try that.I tried installing that driver but the NVidia installer is a pain, the xorg.conf it creates doesn't work (X crashes on start up) and the problem is too subtle for me to pick up quickly.
Also post your xorg.conf and your xorg log.Will do but I'll have to reinstall the 2nd card first to get meaningful files. will post them tomorrow.
Also what version are your cuda libraries?Cos I work on the "if it ain't broke don' fix it" principal :) Plus I don't have the time to spend compiling kernels and drivers. Repository kernels may be slow updating but they work with a minimum of fuss (usually).
2.6.33 is almost a year and a half old. Why don't you use a newer kernel?
one answer was to make 2nd card think it had a monitor connectedThis is/was for windows. Doesn't matter in linux. Unless you want to run multiple X screens or servers.
I'll try your xorg file when the project is back up and report back then.
Is this with only one GPU?No - both were installed
1) Why do you run as root?I don't - I just happened to be root in the console when I ran those commands
2) There is no "--device x" flag on the running app. Is this because of small terminal window? Make it bigger and recheck.Pardon my ignorance but where should that line be ?
3) This isn't what I would expect from nvidia-smi. Sample ouput:....That's all that came up. (Is this a clue to the problem ?)
It shows only one instance running.Is this with only one GPU?No - both were installed
Maximize your terminal window and rerun it. Alternatively rerun it with > ps.txt at the end and post the file.2) There is no "--device x" flag on the running app. Is this because of small terminal window? Make it bigger and recheck.Pardon my ignorance but where should that line be ?
No, it is a clue that you need to update kernel, driver, distro, everything ;D3) This isn't what I would expect from nvidia-smi. Sample ouput:....That's all that came up. (Is this a clue to the problem ?)
BTW the correct command for nvidia-smi is: nvidia-smi -a
The -a, -s and -g arguments are now deprecated in favor of -q and -i, respectively. However, the old arguments still work for this release.
ugh. just reviewed the specs of the nvs 420 and 450.. neither will work for me :( they only have 512mb ram . i would clobber that in a heartbeat. typically my setup chews up a good 1gb+ vidram and thats without boinc running..
Option "BaseMosaic" "boolean"
This option can be used to extend a single X screen transparently across all of the available display outputs on each GPU. This is like SLI Mosaic mode except that it does not require a video bridge connected to the graphics cards. Due to this Base Mosaic does not guarantee there will be no tearing between the display boundaries. Base Mosaic is supported on all the configurations supported by SLI Mosaic Mode. It is also supported on Quadro FX 380, Quadro FX 580 and all G80 or higher non-mobile NVS cards.
Use this in conjunction with the MetaModes X configuration option to specify the combination of mode(s) used on each display. nvidia-xconfig can be used to configure Base Mosaic via a command like nvidia-xconfig --base-mosaic --metamodes=METAMODES where the METAMODES string specifies the desired grid configuration. For example, to configure four DFPs in a 2x2 configuration, each running at 1920x1024, with two DFPs connected to two cards, the command would be:
nvidia-xconfig --base-mosaic --metamodes="GPU-0.DFP-0: 1920x1024+0+0, GPU-0.DFP-1: 1920x1024+1920+0, GPU-1.DFP-0: 1920x1024+0+1024, GPU-1.DFP-1: 1920x1024+1920+1024"
ugh. just reviewed the specs of the nvs 420 and 450.. neither will work for me :( they only have 512mb ram . i would clobber that in a heartbeat. typically my setup chews up a good 1gb+ vidram and thats without boinc running..
You can get a EVGA 450 SC with 1GB.
Mike
If it is only for display, what do you need 1GB for? Maybe use 2 NVSs?
Nvidia gives extra capabilities/configuration options to their professional series of cards. For example those cards support BaseMosaic. BaseMosaic seems the perfect solution in your case:QuoteOption "BaseMosaic" "boolean"
This option can be used to extend a single X screen transparently across all of the available display outputs on each GPU. This is like SLI Mosaic mode except that it does not require a video bridge connected to the graphics cards. Due to this Base Mosaic does not guarantee there will be no tearing between the display boundaries. Base Mosaic is supported on all the configurations supported by SLI Mosaic Mode. It is also supported on Quadro FX 380, Quadro FX 580 and all G80 or higher non-mobile NVS cards.
Use this in conjunction with the MetaModes X configuration option to specify the combination of mode(s) used on each display. nvidia-xconfig can be used to configure Base Mosaic via a command like nvidia-xconfig --base-mosaic --metamodes=METAMODES where the METAMODES string specifies the desired grid configuration. For example, to configure four DFPs in a 2x2 configuration, each running at 1920x1024, with two DFPs connected to two cards, the command would be:
nvidia-xconfig --base-mosaic --metamodes="GPU-0.DFP-0: 1920x1024+0+0, GPU-0.DFP-1: 1920x1024+1920+0, GPU-1.DFP-0: 1920x1024+0+1024, GPU-1.DFP-1: 1920x1024+1920+1024"
running diag software revealed my average vidram usage without boinc running is 800mb to 1gb.
running diag software revealed my average vidram usage without boinc running is 800mb to 1gb.
That would be nvidia-smi -a ?
What driver version do you use? I don't remember nvidia-smi showing percentages for memory use.shows Total, Used & Free here on 280.13 Ubuntu proprpietary driver supplied through the additional drivers automatic thing, Ubuntu 11.10 64 bit. I'm guessing that he just did the math from that.
+------------------------------------------------------+
| NVIDIA-SMI 3.295.20 Driver Version: 295.20 |
|-------------------------------+----------------------+----------------------+
| Nb. Name | Bus Id Disp. | Volatile ECC SB / DB |
| Fan Temp Power Usage /Cap | Memory Usage | GPU Util. Compute M. |
|===============================+======================+======================|
| 0. GeForce GTX 460 | 0000:01:00.0 N/A | N/A N/A |
| 20% 36 C N/A N/A / N/A | 26% 200MB / 767MB | N/A Default |
|-------------------------------+----------------------+----------------------|
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0. Not Supported |
+-----------------------------------------------------------------------------+
What driver version do you use? I don't remember nvidia-smi showing percentages for memory use.
nvidia-smi (without the -a option) now prints a pretty formatted output which includes the % memory usage.Code: [Select]+------------------------------------------------------+
| NVIDIA-SMI 3.295.20 Driver Version: 295.20 |
|-------------------------------+----------------------+----------------------+
| Nb. Name | Bus Id Disp. | Volatile ECC SB / DB |
| Fan Temp Power Usage /Cap | Memory Usage | GPU Util. Compute M. |
|===============================+======================+======================|
| 0. GeForce GTX 460 | 0000:01:00.0 N/A | N/A N/A |
| 20% 36 C N/A N/A / N/A | 26% 200MB / 767MB | N/A Default |
|-------------------------------+----------------------+----------------------|
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0. Not Supported |
+-----------------------------------------------------------------------------+