Author Topic: SETI MB CUDA for Linux (Read 579039 times)

pp · « **Reply #450 on:** 30 Sep 2009, 09:56:40 am »

Darn, darn, darn!

Well, I guess I have to let you hunt down Vyper by yourself then. I wish you luck.

Would be nice with a Linux computer on top and a majority of them occupying the top 10...

b0b3r · « **Reply #451 on:** 30 Sep 2009, 10:30:19 am »

Quote from: pp on 30 Sep 2009, 09:56:40 am

Darn, darn, darn! Well, I guess I have to let you hunt down Vyper by yourself then. I wish you luck. Would be nice with a Linux computer on top and a majority of them occupying the top 10...

Thanks. I also wish you luck.

It's sad but I don't think that this host will beat Vyper's one. No at this time maybe someday.

It will be very nice to see many Linux hosts on the top computer list and to see generally more Linux hosts in boinc.

riofl · « **Reply #452 on:** 01 Oct 2009, 08:56:50 am »

i have been doing a lot of studying on the 285 vs 295 battle that has been going on in my brain. each element of the 295 is slower than the 285 by a reasonable margin (approx 160gflops difference.. 285=1062gflops while 295=894gflops per element). the thing the 295 has is 'density' to make up for it. so even if it takes longer to do a wu than the 285 does, it can do 2 of them in the same package in an attempt to make up for it which works ok.. wonder if there is an extended length motherboard out there that will take 8 pcie devices with a reasonable distance spread (at least 1/2 - 1 in between mounted devices)? there are cases available that can handle this, but i have not found a mobo that can.. for raw speed i would be more inclined to put 8 285 in something than 4 295. thoughts? i have a feeling i am not accounting for something here besides the obvious power requirements and cost savings of 1 295 vs 2 285...

maybe i should be looking into addon pcie density expansion like the nvidia supercomputer appliances do,putting 4 teslas into a single pcie slot.. wonder if empty appliance devices are available....

b0b3r · « **Reply #453 on:** 01 Oct 2009, 10:00:02 am »

Quote from: riofl on 01 Oct 2009, 08:56:50 am

i have been doing a lot of studying on the 285 vs 295 battle that has been going on in my brain. each element of the 295 is slower than the 285 by a reasonable margin (approx 160gflops difference.. 285=1062gflops while 295=894gflops per element). the thing the 295 has is 'density' to make up for it. so even if it takes longer to do a wu than the 285 does, it can do 2 of them in the same package in an attempt to make up for it which works ok.. wonder if there is an extended length motherboard out there that will take 8 pcie devices with a reasonable distance spread (at least 1/2 - 1 in between mounted devices)? there are cases available that can handle this, but i have not found a mobo that can.. for raw speed i would be more inclined to put 8 285 in something than 4 295. thoughts? i have a feeling i am not accounting for something here besides the obvious power requirements and cost savings of 1 295 vs 2 285...

maybe i should be looking into addon pcie density expansion like the nvidia supercomputer appliances do,putting 4 teslas into a single pcie slot.. wonder if empty appliance devices are available....

You may also consider this card http://www.asus.com/product.aspx?P_ID=3OXEUQmsHmmewEyu&templete=2 if you have enough money.

But generally i observed that it's not so big difference in Seti speed between 260sp216 and 275 and 285 compared to difference in price. So if theoretical speed difference is about 20% you should be happy if you see only about half of that in Seti. That's because they are peak value of GPU capability, but real computation depends also on cpu, bus, memory speed and even more on application architecture.

riofl · « **Reply #454 on:** 01 Oct 2009, 11:45:43 am »

Quote from: b0b3r on 01 Oct 2009, 10:00:02 am

Quote from: riofl on 01 Oct 2009, 08:56:50 am
i have been doing a lot of studying on the 285 vs 295 battle that has been going on in my brain. each element of the 295 is slower than the 285 by a reasonable margin (approx 160gflops difference.. 285=1062gflops while 295=894gflops per element). the thing the 295 has is 'density' to make up for it. so even if it takes longer to do a wu than the 285 does, it can do 2 of them in the same package in an attempt to make up for it which works ok.. wonder if there is an extended length motherboard out there that will take 8 pcie devices with a reasonable distance spread (at least 1/2 - 1 in between mounted devices)? there are cases available that can handle this, but i have not found a mobo that can.. for raw speed i would be more inclined to put 8 285 in something than 4 295. thoughts? i have a feeling i am not accounting for something here besides the obvious power requirements and cost savings of 1 295 vs 2 285...

maybe i should be looking into addon pcie density expansion like the nvidia supercomputer appliances do,putting 4 teslas into a single pcie slot.. wonder if empty appliance devices are available....

You may also consider this card http://www.asus.com/product.aspx?P_ID=3OXEUQmsHmmewEyu&templete=2 if you have enough money.

But generally i observed that it's not so big difference in Seti speed between 260sp216 and 275 and 285 compared to difference in price. So if theoretical speed difference is about 20% you should be happy if you see only about half of that in Seti. That's because they are peak value of GPU capability, but real computation depends also on cpu, bus, memory speed and even more on application architecture.

hmm yeah ... my basis is on integer gflops since i have not found double precision gflops comparisons. basing performance comparisons between my tesla at 933 integer gflops and my 285 at 1062 integer gflops, boinc displays them as 74 and 127 gflops respectively. now, considering the 295 is slower in integer gflops per processing system (894 each) than the tesla, i would expect it would display less than 74gflops each half.
which basically means that for a given card, a 295 using both halves will only give approximately 50-60% higher performance in total then a single 285 which makes me curious about its value other than accepting that 50% more per physical device is preferable. i just wonder if since the 295 is essentially supposed to be 2x 285 with slightly degraded performance why it is so? it has 4 less pixel shaders (28 vs 32) and smaller memory bus width (448 vs 512 which to me is the most major item). although these vary by mfgr, in general the 295 also has slower default clock speeds. admittedly lower clock speeds will help with eliminating heat buildup, but instead of using the same default heatsink assy, put a better designed one on to compensate and keep the performance up. guess i just wonder why its design doesn't make a lot of sense or maybe i am in wishful thinking mode that it 'should' be a 2x full 285 units when in fact it is 2x crippled 285 units.

b0b3r · « **Reply #455 on:** 01 Oct 2009, 12:53:44 pm »

Quote from: riofl on 01 Oct 2009, 11:45:43 am

hmm yeah ... my basis is on integer gflops since i have not found double precision gflops comparisons. basing performance comparisons between my tesla at 933 integer gflops and my 285 at 1062 integer gflops, boinc displays them as 74 and 127 gflops respectively. now, considering the 295 is slower in integer gflops per processing system (894 each) than the tesla, i would expect it would display less than 74gflops each half.
which basically means that for a given card, a 295 using both halves will only give approximately 50-60% higher performance in total then a single 285 which makes me curious about its value other than accepting that 50% more per physical device is preferable. i just wonder if since the 295 is essentially supposed to be 2x 285 with slightly degraded performance why it is so? it has 4 less pixel shaders (28 vs 32) and smaller memory bus width (448 vs 512 which to me is the most major item). although these vary by mfgr, in general the 295 also has slower default clock speeds. admittedly lower clock speeds will help with eliminating heat buildup, but instead of using the same default heatsink assy, put a better designed one on to compensate and keep the performance up. guess i just wonder why its design doesn't make a lot of sense or maybe i am in wishful thinking mode that it 'should' be a 2x full 285 units when in fact it is 2x crippled 285 units.

First there is no such a thing like a "integer gflops". There are single (32bit) or double (64bit) precision floating-point operations. And indeed double precision performance is about 8 times lower than single in nvidia gpus. The reason why each of gpus on 295 is slower clocked is heat production. Following documentation it is 290W for two gpus on 295 card. For 285 it is about 205W for single GPU. So considering this card with two 285 with normal clocks may produce over 410W of heat power. It is very hard to dissipate that much heat. Even 295 with it's 290W is a very hot card and need very good cooling to stable work. Asus card from the link is build with two full clocked 285 chip and its cooling system is very big. It take 2.5 slots.

Difference in number of pixel shader is not important for CUDA computing. It uses vertex shader which both have 240 organized in 30 stream processors to use with CUDA (8 shader in each). Memory is faster 159GB/s (285) vs 112GB/s (for single gpu on 295) and shader clock is faster in 285. And again following documentation theoretical peak single precision performance for 285 is 1062Gflop/s (about 130Gflop/s in double) and for 295 895Gflop/s (about 112Gflop/s in double) for each gpu. Difference is about 15% but what is a difference for real computation time. Unfortunately I don't have any 285 card to do a test but i have 275 and 260sp216. Theoretical difference is over 25% (275 - 1010Gflop/s, 260 - 804Gflop/s) but real computation time for Seti is about 670 sec. for 275 and about 750 sec. for 260 (with normal 0.44 ar. unit). So the real difference is little over 10% like I said before.

What I try to say is that each of gpu's on 295 are theoretically about 15% slower compared to 285. But in real computation each of 295 gpu will be slower only about 5% to 6%. So with 295 we have more than 185% performance of 285 with not so big difference in price.

riofl · « **Reply #456 on:** 03 Oct 2009, 05:12:09 am »

ahh... thanks. makes sense. i have a nasty habit of thinking myself into corners. unfortunately that asus mars is untouchable for me. just way too much $$. 295 does sound like the way to go and i think i have enough air flow around the gpus to keep it cool. my 285 and tesla both never go above 65c with a summer room ambient temp of 28c. took running the fans at 100%, adding extra bottom front fan to move cooler air into the lower case pocket plus a few small pci slot exhaust fans. with the spacing of my mobo's pcie slots it is not easy getting the extra heated air out from between the 2 cards. had to mount a little 1 in fan on the tops of the cards aiming between them to move the air out which dropped both card temps quite a bit. probably would have been eaiser to buy another case side cover with fans directly over the gpus. mine has a single 25cm fan in the middle of the cover.

b0b3r · « **Reply #457 on:** 03 Oct 2009, 05:25:04 am »

I also advise to wait for g300. It's premiere may greatly change prices of 295.

Tye · « **Reply #458 on:** 04 Oct 2009, 07:53:43 am »

I've been using BOINC 6.6.11 for awhile now, to make sure it handles my multi-GPUs of different types. Is there any newer version that will also do this yet? Sunu, I think you were also using 6.6.11...

pp · « **Reply #459 on:** 04 Oct 2009, 08:36:31 am »

I've been using the 6.10.x series for a while and it works correctly with multiple GPUs. Use at least 6.10.7 and upwards because previous versions introduced some new bug that preempted all CUDA tasks.

b0b3r · « **Reply #460 on:** 04 Oct 2009, 08:37:05 am »

Quote from: Tye on 04 Oct 2009, 07:53:43 am

I've been using BOINC 6.6.11 for awhile now, to make sure it handles my multi-GPUs of different types. Is there any newer version that will also do this yet? Sunu, I think you were also using 6.6.11...

Currently marked as stable is version 6.6.40 and for me it work with multiple different gpu-s.

Richard Haselgrove · « **Reply #461 on:** 04 Oct 2009, 09:05:21 am »

There are very few pre-compiled Linux v6.10.xx versions available for download - the last was v6.10.6

Rom Walton was asked yesterday for a v6.10.11 build, and replied "Alright, I'll have them out tonight." - but no sign yet (the Berkeley server problem may have got in the way). But worth keeping an eye open.

pp · « **Reply #462 on:** 04 Oct 2009, 10:25:39 am »

It's actually quite simple to build your own Boinc client. Just make sure you have all the dependencies installed as listed on the following page in the columns "Core client" and "BOINC Manager".
http://boinc.berkeley.edu/trac/wiki/SoftwarePrereqsUnix

Download whatever version you want with subversion. Available versions can be found here: http://boinc.berkeley.edu/trac/browser/tags

Code: [Select]

cd
svn co http://boinc.berkeley.edu/svn/tags/boinc_core_release_6_10_11

Run autosetup and configure. Make sure to define -march with whatever is appropriate for your own CPU.

Code: [Select]

cd boinc_core_release_6_10_11
./_autosetup
./configure --disable-server --disable-fcgi --enable-unicode CFLAGS="-march=core2 -O2 -pipe" CXXFLAGS="-march=core2 -O2 -pipe"

If everything went well you can compile the code.

Code: [Select]

make
To make things easier there's a Makefile to create a distributable Boinc data folder with all the files.

Code: [Select]

cd packages/generic/sea
make

Remove the included libcudart.so from the package since it can potentially interfere with your CUDA installation.

Code: [Select]

rm BOINC/libcudart.so
After stopping your currently running client and backing it up, you can copy the new version into your current Boinc data folder and overwrite the old binaries. This step depends heavily on how your current client is installed by your distribution. I suggest you move it to your home folder and run it manually from there in the future. In my particular case the copy command looks like this.

Code: [Select]

cp -rv BOINC/* ~/BOINC/
Command to start the client.

Code: [Select]

cd  ~/BOINC
./boinc --allow_remote_gui_rpc --daemon

Command to stop the client.

Code: [Select]

cd ~/BOINC
./boinccmd --quit

It's easy to make shell files to perform those commands if you don't want to type them manually.

Richard Haselgrove · « **Reply #463 on:** 04 Oct 2009, 02:41:19 pm »

Linux v6.10.11 BOINC is now available for download from Berkeley.

sunu · « **Reply #464 on:** 04 Oct 2009, 03:45:07 pm »

Quote from: IanJ on 30 Sep 2009, 08:43:49 am

Roifl and Sunu,
Just an update. It looks like the copying of the seti cuda executable into the /usr/sbin directory finally got it to calm down and start crunching.
The NVRM Xid issue continues but now doesn't lock up the machine. It's been up nearly a week without lookup, but I've seen eight in the past three days. As the machine continues on happily I'll forget about it for now. During the reinstall last week I took off the expansion card blanking plates (this machine has only one card in it, the 9600GT) so the machine can get a bit more air.
Thanks for your help!
Ian

IanJ, have you solved your problems? What version exactly are your nvidia drivers? You can try anyone of 190.18, 190.25, 190.32, 190.36, to see if those xid errors go away.

Quote from: riofl on 24 Sep 2009, 05:48:35 pm

your error report says the app is cuda 2.2 so it will error. the app must also be cuda 2.3 compliant. those who explained things to me insisted that the driver, toolkit and app must use the same cuda version. i don't believe there is such a thing as 'backward compatibility' with cuda.

The driver and libs have to support each other but not the app. The app can be a lower number (previous version) with no problems but can't be a higher number. So the duo driver/libs have backwards compatibility. At least that's how it seems right now.

Quote from: pp on 24 Sep 2009, 02:07:20 pm

And now we have two Linux machines among the top 20. Don't know yet how high it will reach though...

Congrats pp! Welcome to the top 20 hosts club!

Quote from: b0b3r on 30 Sep 2009, 08:33:32 am

I don't think so. It is relatively new host (since it was upgraded) and that's why it have lower RAC. It currently generating a lot more points

b0b3r is that third linux machine yours? Congrats to you too!

Are these machines dedicated crunchers or you use them also as your desktops?
We need more, to flood the top 20 hosts list with linux machines. riofl is next in line with his super project. I probably won't upgrade for a year or more so eventually I'll get off the top 20.

@riofl
A GTX295 is essentially 2xGTX275 clocked lower.
Currently there are 2 motherboards with 7 PCIE slots that I know of, ASUS and EVGA (intel). So you can have 7 GTX285 with single slot water-cooling or 4 GTX295 (I would say preferably also water-cooled). Which one of the two configs would have higher RAC? I don't know, maybe still the 4 GTX295.
A few days ago, nvidia presented their next architecture GF100. Currently we don't know when the actual cards will come out, optimists say end of 2009, pessimists say Q1 2010. Since you're going to invest some serious money, if I were you, I would wait for the new cards and put my money there. I wouldn't be too happy to put multi thousand dollars in a project and in two months it would be already surpassed.
You could also go more pro/hardcore. This is the first I see with 8 dual-slot cards.

Quote from: Tye on 04 Oct 2009, 07:53:43 am

I've been using BOINC 6.6.11 for awhile now, to make sure it handles my multi-GPUs of different types. Is there any newer version that will also do this yet? Sunu, I think you were also using 6.6.11...

Yes I'm still using 6.6.11. 6.6.40 (the recommended version) should have proper multi-gpu support and also 6.10.11 that Richard says is now available. Link http://boinc.berkeley.edu/download_all.php I've seen many reports with problematic scheduling for 6.10.x but I think most of them have been solved.

Author Topic: SETI MB CUDA for Linux (Read 579039 times)

pp

Re: SETI MB CUDA for Linux

b0b3r

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

b0b3r

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

b0b3r

Re: SETI MB CUDA for Linux

riofl

Re: SETI MB CUDA for Linux

b0b3r

Re: SETI MB CUDA for Linux

Tye

Re: SETI MB CUDA for Linux

pp

Re: SETI MB CUDA for Linux

b0b3r

Re: SETI MB CUDA for Linux

Richard Haselgrove

Re: SETI MB CUDA for Linux

pp

Re: SETI MB CUDA for Linux

Richard Haselgrove

Re: SETI MB CUDA for Linux

sunu

Re: SETI MB CUDA for Linux