Seti@Home optimized science apps and information

Optimized Seti@Home apps => Linux => Topic started by: Simon on 02 Apr 2007, 04:08:06 pm

Title: New compiles of Rev-2.2B code released
Post by: Simon on 02 Apr 2007, 04:08:06 pm
Hi folks,

I've uploaded some new compiles of the Rev-2.2B apps.

Contrary to the initial apps, these will not be static/statified, but instead come in two flavours - Kernel-2.4 and Kernel-2.6.
Should you still have trouble with libraries and/or segfaults, please post in this thread; still, 2 versions per app should be enough, I hope.

The compilation platform we used, Suse 9.2, is really not all that current - if your libraries are still not recent enough, it's probably high time you upgraded.

The download links remain the same, if you had trouble with the initially released applications, please try these.

Also, the SSE3-P4 app that was initially released has some accuracy problems; if you're running it on any of your hosts, this version is a required upgrade - you'll lose credit and turn in invalid results if you don't upgrade.

Regards,
Simon.
Title: Re: New compiles of Rev-2.2B code released
Post by: clk on 02 Apr 2007, 10:48:28 pm
Howdy,
     It works, as long as you don't run more than one instance.  On RHEL3 Kernel-2.4 on a P4 2.8GHz I tried both the SSE2-P4 and SSE3-P4.  With one instance of Seti running everything was ok.  But, since I'm running Hyper-Thread Boinc starts a second Seti task and instantly errors out with:

Quote
2007-04-02 22:08:12 [SETI@home] Can't get shared memory segment name: can't get shared mem segment name
2007-04-02 22:08:12 [SETI@home] Unrecoverable error for result 16oc03ab.11131.5954.29822.3.78_1 (Couldn't start or resume: -202)
2007-04-02 22:08:12 [SETI@home] Deferring scheduler requests for 1 minutes and 0 seconds
2007-04-02 22:08:12 [---] Rescheduling CPU: start failed
2007-04-02 22:08:12 [SETI@home] Computation for task 16oc03ab.11131.5954.29822.3.78_1 finished

That is this result http://setiathome.berkeley.edu/result.php?resultid=510804716

Meanwhile the first, original, Seti workunit continues processing and eventually finishes.

The funny thing is, by playing with "suspend task" I can get BOINC to start a task for another project (Einstein in this case) and it will run ok in tandem with the first Seti task.  You just cannot run two Seti tasks simultaneously.

At this point I've completed one workunit http://setiathome.berkeley.edu/workunit.php?wuid=122308461.  Mine is the one for 11614 seconds.  The workunit hasn't validated at this point in time, still waiting for another result.  Hopefully it is ok.

But that still leaves the bigger question of why the shared memory fault.

Thanks again for all the effort you guys are putting into this work.
Best of Luck, and get some sleep.
Cheers, Chris
Title: Re: New compiles of Rev-2.2B code released
Post by: Metod, S56RKO on 03 Apr 2007, 03:46:04 am
Also, the SSE3-P4 app that was initially released has some accuracy problems; if you're running it on any of your hosts, this version is a required upgrade - you'll lose credit and turn in invalid results if you don't upgrade.

I guess this app for kernel 2.6 won't run on my 64-bit Debian Etch:

$ ldd KWSN-R2.2B-SSE3-P4
        linux-gate.so.1 =>  (0xffffe000)
        libdl.so.2 => /lib32/libdl.so.2 (0xf7ee8000)
        libm.so.6 => /lib32/libm.so.6 (0xf7ec4000)
        libgcc_s.so.1 => not found
        libpthread.so.0 => /lib32/libpthread.so.0 (0xf7eb2000)
        libc.so.6 => /lib32/libc.so.6 (0xf7d87000)
        /lib/ld-linux.so.2 (0xf7f04000)


As you can see I don't have libgcc_s.so.1 for 32-bit. Can you link at least gcc libraries statically?
Title: Re: New compiles of Rev-2.2B code released
Post by: clk on 03 Apr 2007, 11:08:59 pm
Howdy again,
     Well, another variation.  The new compile of SSE2-P4 runs a little better on a RHEL3 2.4 Kernel when used with Boinc 5.8.15 vs. 5.4.9 or 5.4.11 .  This was tested on two separate boxes, but the results were about the same.  On a Boinc 5.8.15 client when a series of Seti workunits are downloaded and work starts the first workunit runs ok, the second workunit crashes, but then the third workunit will run ok.  So, on a hyperthread cpu you can get two seti workunits running simultaneously.   This is better than on a Boinc 5.4.9 client where only the first Seti workunit will run, and any attempt to start another workunit will crash (note my previous post in this thread).
     Continuing work on the Boinc 5.8.15 client, when one of the Seti workunits finishes, the next workunit will crash, but then a second attempt to start another workunit will run ok.  Obviously, I'm not continuing at this time with this new code on the 2.4 Kernel, don't want to "error out" a bunch of workunits.  Hopefully, this adds to the understanding and leads to a solution for the problem.

Two of the error workunits are:
http://setiathome.berkeley.edu/result.php?resultid=511457534
and, http://setiathome.berkeley.edu/result.php?resultid=511457667

The relevent log outputs:
Quote
2007-04-03 16:22:50 [SETI@home] Starting 16oc03ab.11131.14736.840898.3.129_0
2007-04-03 16:22:50 [SETI@home] Deferring communication for 1 min 0 sec
2007-04-03 16:22:50 [SETI@home] Reason: Unrecoverable error for result 16oc03ab.11131.14736.840898.3.129_0 (Can't get shared memory segment
 name: can't get shared mem segment name)
2007-04-03 16:22:51 [SETI@home] Computation for task 16oc03ab.11131.14736.840898.3.129_0 finished
2007-04-03 16:22:51 [SETI@home] Output file 16oc03ab.11131.14736.840898.3.129_0_0 for task 16oc03ab.11131.14736.840898.3.129_0 absent

and,
Quote
2007-04-03 13:25:57 [SETI@home] Starting 16oc03ab.11131.14736.840898.3.157_3
2007-04-03 13:25:57 [SETI@home] Deferring communication for 1 min 0 sec
2007-04-03 13:25:57 [SETI@home] Reason: Unrecoverable error for result 16oc03ab.11131.14736.840898.3.157_3 (Can't get shared memory segment
 name: can't get shared mem segment name)
2007-04-03 13:25:59 [SETI@home] Computation for task 16oc03ab.11131.14736.840898.3.157_3 finished
2007-04-03 13:25:59 [SETI@home] Output file 16oc03ab.11131.14736.840898.3.157_3_0 for task 16oc03ab.11131.14736.840898.3.157_3 absent

So, on a 2.4 Kernel with Boinc 5.8.15 basically the same shared memory fault, but not as often as with Boinc 5.4.9 (or 5.4.11).

I'm an experimental scientist so I'm  always  looking for more data.  So, here are a couple of the "successful" workunits for completeness, maybe they will be of some use.
http://setiathome.berkeley.edu/result.php?resultid=511457685
http://setiathome.berkeley.edu/result.php?resultid=511457749

If I can help speed up the work I'll be happy to test anything you want to throw my way.  We can see if it sticks to the wall.

Ciao, Chris
Title: Re: New compiles of Rev-2.2B code released
Post by: Ned Slider on 04 Apr 2007, 03:22:06 pm
SSE client looking good so far on a couple CentOS 4 boxes, Athlon XP's (2.6 kernel). Seeing great speed improvements and no errors.

Great work everyone!

Regards,

Ned
Title: Re: New compiles of Rev-2.2B code released
Post by: Simon on 04 Apr 2007, 04:08:28 pm
Thanks for your replies guys,

maybe we can figure out something for the future. However, 2.4.x kernel systems really won't be a major focus; I do realize this is a bother for people (I still have some 2.4 systems myself running Slackware, but they do just fine with the 2.2B apps).

The scheduling problems with BOINC are interesting; for me, 5.8.x works as well as 5.4.11 on two Pentium-D boxes. The only errors I had there were because the initial SSE3-P4 version had an accuracy problem.

We'll see what we can figure out, and the sources plus some info on how to get the thing compiled will be online soon.

Regards,
Simon.
Title: Re: New compiles of Rev-2.2B code released
Post by: michael37 on 05 Apr 2007, 10:44:33 am
Simon and everyone,

I use all kind of flavors of 2.4 applications (sse, sse2, sse3, sse2-generic) with RHEL3 and boinc 5.4.11 as well as boinc 5.6.4.  I have not seen any problems running multiple seti apps at once, up to 4 per SMP system.  The errors you are seeing may be environmental.
Title: Re: New compiles of Rev-2.2B code released
Post by: clk on 18 Jun 2007, 01:09:27 am
Howdy all,
     This is to report that the problem I reported earlier in this thread is no longer a problem. 

     I.e. I had seen, on multiple boxes, errors when more than one instance of the new Seti 2.2B code were run simultaneously ( with hyper threading on these boxes).  What exactly was the cure, I don't know.  After a recent update of the RHEL3 system software I noted that there had been an update of glibc to 2.3.2-95.50.  I do not know if this new glibc package is the cure, but it triggered my curiosity to retry the 2.2B code again on these RHEL3 boxes.  And, Viola, everything works great.  So, on 4 different RHEL3 systems I've now run the 2.2B code with no problems.

     I still don't know why I had problems with the 2.2B code on RHEL3 boxes while michael37 (see preceding post in this thread) did not.  I would hazard a guess that maybe he had more updated systems at that time, since I have to test new updates for compatibility with our software applications before installing them.

     Anyway, I'm now pumping out data faster than ever with the 2.2B code running on all the boxes (except an old SGI and SUN).  And when the opportunity arises I'll install the even faster 64bit code  (http://lunatics.at/index.php?module=Downloads;catd=16) on all my Linux boxes.

Thanks again for all the effort.  Czesc, Chris
Title: Re: New compiles of Rev-2.2B code released
Post by: blackdragon on 18 Jun 2007, 12:56:57 pm
Simon,

I tried to install 32-bit SSE2 generic for 2.4.X linux, but I got the following error:

<core_client_version>5.8.16</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1)
</message>
<stderr_txt>
ar=0.434345 NumCfft=71773 NumGauss=           451610376 NumPulse=         86584468096 NumTriplet=       7549050339328
KWSN-R2.2B-SSE2-generic: /lib/tls/libc.so.6: version `GLIBC_2.3.4' not found (required by KWSN-R2.2B-SSE2-generic)

</stderr_txt>
]]>

Is it true that I need GLIBC_2.3.4 for 2.4 version also? (I think I tried both 2.4 and 2.6 release and got same error).

Thanks

Hi folks,

I've uploaded some new compiles of the Rev-2.2B apps.

Contrary to the initial apps, these will not be static/statified, but instead come in two flavours - Kernel-2.4 and Kernel-2.6.
Should you still have trouble with libraries and/or segfaults, please post in this thread; still, 2 versions per app should be enough, I hope.

The compilation platform we used, Suse 9.2, is really not all that current - if your libraries are still not recent enough, it's probably high time you upgraded.

The download links (http://lunatics.at/index.php?module=Downloads;catd=14) remain the same, if you had trouble with the initially released applications, please try these.

Also, the SSE3-P4 app that was initially released has some accuracy problems; if you're running it on any of your hosts, this version is a required upgrade - you'll lose credit and turn in invalid results if you don't upgrade.

Regards,
Simon.
Title: Re: New compiles of Rev-2.2B code released
Post by: Crunch3r on 18 Jun 2007, 02:23:58 pm
Simon,

I tried to install 32-bit SSE2 generic for 2.4.X linux, but I got the following error:

<core_client_version>5.8.16</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1)
</message>
<stderr_txt>
ar=0.434345 NumCfft=71773 NumGauss=           451610376 NumPulse=         86584468096 NumTriplet=       7549050339328
KWSN-R2.2B-SSE2-generic: /lib/tls/libc.so.6: version `GLIBC_2.3.4' not found (required by KWSN-R2.2B-SSE2-generic)

</stderr_txt>
]]>

Is it true that I need GLIBC_2.3.4 for 2.4 version also? (I think I tried both 2.4 and 2.6 release and got same error).

It does require GLIBC 2.3.4, can you upgrade you host ?
Title: Re: New compiles of Rev-2.2B code released
Post by: michael37 on 25 Jun 2007, 06:31:11 pm
Simon,

I tried to install 32-bit SSE2 generic for 2.4.X linux, but I got the following error:

<core_client_version>5.8.16</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1)
</message>
<stderr_txt>
ar=0.434345 NumCfft=71773 NumGauss=           451610376 NumPulse=         86584468096 NumTriplet=       7549050339328
KWSN-R2.2B-SSE2-generic: /lib/tls/libc.so.6: version `GLIBC_2.3.4' not found (required by KWSN-R2.2B-SSE2-generic)

</stderr_txt>
]]>

Is it true that I need GLIBC_2.3.4 for 2.4 version also? (I think I tried both 2.4 and 2.6 release and got same error).

Thanks

When you download the 32-bit packages, look for directory "Files-to-install".  Inside it, there are Kernel-2.4 and Kernel-2.6 directories. 

Try using the binary build from the Kernel-2.4.  It has the glibc-2.3.2 requirement and has much better chances to run on your older 2.4 based system.

Title: Re: New compiles of Rev-2.2B code released
Post by: blackdragon on 05 Jul 2007, 03:52:46 pm
Simon,

I tried to install 32-bit SSE2 generic for 2.4.X linux, but I got the following error:

<core_client_version>5.8.16</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1)
</message>
<stderr_txt>
ar=0.434345 NumCfft=71773 NumGauss=           451610376 NumPulse=         86584468096 NumTriplet=       7549050339328
KWSN-R2.2B-SSE2-generic: /lib/tls/libc.so.6: version `GLIBC_2.3.4' not found (required by KWSN-R2.2B-SSE2-generic)

</stderr_txt>
]]>

Is it true that I need GLIBC_2.3.4 for 2.4 version also? (I think I tried both 2.4 and 2.6 release and got same error).

Thanks

When you download the 32-bit packages, look for directory "Files-to-install".  Inside it, there are Kernel-2.4 and Kernel-2.6 directories. 

Try using the binary build from the Kernel-2.4.  It has the glibc-2.3.2 requirement and has much better chances to run on your older 2.4 based system.


Simon,

I double checked today, kernel-2.4 version gave same error, seems like it does require glibc-2.3.4.
Is there any particular reason not supporting glibc-2.3.2 in this version?

P.S.
I had tried a few other downloads and they all support glibc-2.3.2 for kernel-2.4 version, only this one is the exception.