Author Topic: 2.4V updated apps. (Read 33988 times)

Sutaru Tsureku · « **Reply #30 on:** 04 Sep 2007, 12:42:07 am »

Quote from: Crunch3r on 03 Sep 2007, 04:10:29 pm

Quote from: Raistmer on 03 Sep 2007, 03:56:42 pm
According my own tests SSSE3 64bit under 64-bit OS is the best one for such CPU
So right now probably KWSN_2.4V_SSSE3_MB.exe is the leader (from 2.4V_Windows_x64_SSSE3 archive)

That's what i'm telling people all day long However... i do see a possibility to gain another 10 to max 15% in performance... but ONLY for the 64 bit app.

Anyhow, we need to get a common base (2.4V changes) for ALL apps. That' Linux,Windows,UNIX before we can start figuring out how to get some more performance...

So if I have the QX6700 with WinVista Home Basic 64Bit..
The best performance I have with the SSSE3- 32Bit app now?

BTW.
I saw that the opt. app have a lower 'Claimed credit' than the stock app..
This is 'only' sometimes with this special AR?

This are only -0.02, but..

(The opt. app is from 08/26/2007)
_____________________________________________________

<core_client_version>5.10.13</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_enhanced 5.27 DevC++/MinGW

Work Unit Info:
...............
WU true angle range is : 1.393579
Optimal function choices:
-----------------------------------------------------
name
-----------------------------------------------------
v_BaseLineSmooth (no other)
v_vGetPowerSpectrumUnrolled 0.00013 0.00000
sse1_ChirpData_ak 0.01417 0.00000
v_vTranspose4 0.00449 0.00000
AK SSE folding 0.00083 0.00000

Flopcounter: 5876485106912.311500

Spike count: 1
Pulse count: 0
Triplet count: 2
Gaussian count: 0

</stderr_txt>
]]>

Validate state Initial
Claimed credit 19.4006742531251
_____________________________________________________

_____________________________________________________

<core_client_version>5.10.13</core_client_version>
<![CDATA[
<stderr_txt>
Optimized SETI@Home Enhanced application
Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra
Version: Windows SSSE3 32-bit based on S@H V5.15 'Noo? No - Ni!'
Revision: R-2.4v|xT|FFT:IPP_SSSE3|Ben-Joe
CPUID: Intel(R) Core(TM)2 Quad CPU @ 2.66GHz
Speed: 4 x 3143 MHz
Cache: L1=64K L2=4096K
Features: MMX SSE SSE2 SSE3 SSSE3

Work Unit Info
True angle range: 1.393579

Spikes Pulses Triplets Gaussians Flops
1 0 2 0 5875824229395

</stderr_txt>
]]>

Validate state Initial
Claimed credit 19.3820590900193
_____________________________________________________

Josef W. Segur · « **Reply #31 on:** 04 Sep 2007, 01:28:34 am »

Quote from: Sutaru Tsureku on 04 Sep 2007, 12:42:07 am

...
BTW.
I saw that the opt. app have a lower 'Claimed credit' than the stock app..
This is 'only' sometimes with this special AR?

This are only -0.02, but..

Some of the alternative routines which are checked for performance just after startup have flop counting embedded. The stock app uses a different and longer lasting routine to test for which routines are optimal, so accrues more flops due to testing.

If the angle range were within the about 0.226 to 1.12 limits for Gaussian fitting, then two WUs with the same angle range but different data could have larger credit differences because each Gaussian test starts with a precheck which can get out quickly if the data has too little range to possibly find a Gaussian. When it takes that early exit there are fewer flops counted for the test.
Joe

Raistmer · « **Reply #32 on:** 04 Sep 2007, 01:53:40 am »

Quote from: Sutaru Tsureku on 04 Sep 2007, 12:42:07 am

So if I have the QX6700 with WinVista Home Basic 64Bit..
The best performance I have with the SSSE3- 32Bit app now?

Under Win2003 it's 64-bit one (on Core2 class CPU). Probably the same for 64-bit Vista...

Josef W. Segur · « **Reply #33 on:** 04 Sep 2007, 01:55:19 am »

Quote from: msattler on 03 Sep 2007, 09:22:43 pm

...
If there were a way to test the same app on 2 cores or 4 cores simultaneously, I wouldn't mind knowing if it can be done and trying it..............would it be a hard thing to modify the knabench script to do it, or really just not worth the bother?

It might be possible to modify knabench that way, but certainly difficult.

There is a way to do realistic testing, though. It requires a cache of work, but none which might cause going into EDF during the test.

1. Turn off Network activity in BOINC, then shut it down.
2. Make another folder, say BOINCTEST.
3. Copy everything from the BOINC folder and its subdirectories to BOINCTEST.
4. Install the application you want to test in the project folder below BOINCTEST.
5. Start a timer and the Boinc Manager in BOINCTEST.
6. Run for say two hours then save all messages from BOINC Manager and shut down. Make a copy of client_state.xml, that and the saved messages are the test results.
7. To test another app, wipe out all the contents of BOINCTEST and go back to step 3.

This should be possible on any platform with minor modifications. I wouldn't recommend comparing more than two apps this way, it does require going through the messages and/or client_state.xml files and checking time differences, contents of stderr reports, etc. But it's about as realistic as testing can be, each test using identical WUs starting at the same points.
Joe

Raistmer · « **Reply #34 on:** 04 Sep 2007, 02:22:44 am »

Well, this approach assumes to use "normal" full-length WUs. Really realistic one

but at least one WU per core should be completed during the test because of not perfectly linear %of work done changing during WU calculation, right? This can take more than 2 hours on lower CPUs

Does CPU time for WUs with the same AR spread widely to not allow statistical approach?
And how CPU time logged on web-page corresponds real time spent on WU (assuming app running 100% of time)? Are any CPU-time corrections performed?

Quote from: Josef W. Segur on 04 Sep 2007, 01:55:19 am

It might be possible to modify knabench that way, but certainly difficult.

All we need is some utility that starts prescribed app in prescribed quantity and set affinity to each child process (optional step? does last BOINC versions do this ?) and wait for all childs exit,t hen exits
such utility then may be used instead of optimized app in knabench, right? This approach will test "worst case" of simultaneous calculation - time for completion of all work on all cores.

msattler · « **Reply #35 on:** 04 Sep 2007, 10:17:11 am »

Quote from: Josef W. Segur on 04 Sep 2007, 01:55:19 am

Quote from: msattler on 03 Sep 2007, 09:22:43 pm
...
If there were a way to test the same app on 2 cores or 4 cores simultaneously, I wouldn't mind knowing if it can be done and trying it..............would it be a hard thing to modify the knabench script to do it, or really just not worth the bother?

It might be possible to modify knabench that way, but certainly difficult.

There is a way to do realistic testing, though. It requires a cache of work, but none which might cause going into EDF during the test.

1. Turn off Network activity in BOINC, then shut it down.
2. Make another folder, say BOINCTEST.
3. Copy everything from the BOINC folder and its subdirectories to BOINCTEST.
4. Install the application you want to test in the project folder below BOINCTEST.
5. Start a timer and the Boinc Manager in BOINCTEST.
6. Run for say two hours then save all messages from BOINC Manager and shut down. Make a copy of client_state.xml, that and the saved messages are the test results.
7. To test another app, wipe out all the contents of BOINCTEST and go back to step 3.

This should be possible on any platform with minor modifications. I wouldn't recommend comparing more than two apps this way, it does require going through the messages and/or client_state.xml files and checking time differences, contents of stderr reports, etc. But it's about as realistic as testing can be, each test using identical WUs starting at the same points.
Joe

Thanks Joe! You've given me some food for thought there. As you mentioned earlier, may be very time consuming to play with, but you've go my curiosity going now. As the holiday is over and I have to go back to work today, it'll have to wait until perhaps this weekend, but I may experiment with your approach.

Josef W. Segur · « **Reply #36 on:** 04 Sep 2007, 01:14:19 pm »

Quote from: Raistmer on 04 Sep 2007, 02:22:44 am

Well, this approach assumes to use "normal" full-length WUs. Really realistic one but at least one WU per core should be completed during the test because of not perfectly linear %of work done changing during WU calculation, right? This can take more than 2 hours on lower CPUs

Although the progress isn't perfectly linear, it is monotonic (never goes backward) and is close enough to linear to remain useful. I don't think the method can provide precise speed comparison in any case, but should clearly indicate which of two apps is faster on whatever mix of work is present. Completing WUs for each core would give result files which could be compared, but my presumption was this sort of extended testing would only be used for apps already known to produce correct results.

Quote

Does CPU time for WUs with the same AR spread widely to not allow statistical approach?

Contention can cause something like 30% CPU time differences, the data in WUs with equal angle range probably no more than 2%.

Quote

And how CPU time logged on web-page corresponds real time spent on WU (assuming app running 100% of time)? Are any CPU-time corrections performed?

IIRC, BOINC doesn't start the CPU time when it launches the app, rather when the app initiates its BOINC imterface. After that, CPU time accrues as accurately as the OS allows. On my Win2k Pentium-M system, Windows Task Manager shows about 2.5 seconds more CPU time for the current SETI task than BOINC Manager does. Most of that difference is probably delay in the BOINC Manager getting the data from the core client and displaying it.
Joe

msattler · « **Reply #37 on:** 04 Sep 2007, 02:07:51 pm »

Well Joe, my thought were somewhere along the lines of cloning the WUs, so that you had 4 copies of the same WU (to test on a quad), so that you could get 4 instances of the same WU to run at the same time.

Raistmer · « **Reply #38 on:** 04 Sep 2007, 02:22:52 pm »

Thank you very much for detailed answer! You right, there is no need in linear percentage to chose faster/slower case in case of all % bigger or all % smaller.
I imagined case in that lets' say WU-1 got 50%, WU-2 got 95% and with second app WU-1 got 52% and WU-2 got 90%. In that case we cant just sum up nonlinear %. But don't know will be such situation in real testing or not (BTW, completion of full WU doesnt help anyway, you right).

Only one refinement - the maximum CPU time for WU is the same that time that logged with result on project web page? Not artifical time correction (some multiplier or so? )
As I remember there was a time that some optimized app adjusted CPU time logged to achive correct credit allocation - from that case my question arose.

Josef W. Segur · « **Reply #39 on:** 04 Sep 2007, 09:13:02 pm »

Quote from: msattler on 04 Sep 2007, 02:07:51 pm

Well Joe, my thought were somewhere along the lines of cloning the WUs, so that you had 4 copies of the same WU (to test on a quad), so that you could get 4 instances of the same WU to run at the same time.

That's probably possible by naming the cloned WUs with existing queued WU names and suspending other WUs so only those run. It may cause maximum contention, having all 4 cores trying to do exactly the same things at the same time. OTOH, initial contention might get the 4 instances an ideal amount out of phase so they'd perform very well.
Joe

Josef W. Segur · « **Reply #40 on:** 04 Sep 2007, 09:29:55 pm »

Quote from: Raistmer on 04 Sep 2007, 02:22:52 pm

...
Only one refinement - the maximum CPU time for WU is the same that time that logged with result on project web page? Not artifical time correction (some multiplier or so? )
As I remember there was a time that some optimized app adjusted CPU time logged to achive correct credit allocation - from that case my question arose.

Trux's optimized BOINC core client "calibration" feature adjusted both reported CPU time and BOINC benchmarks. It was a well-intentioned attempt to correct the logical flaw in the old method of generating credit claims. Our apps certainly don't make any time adjustments, total CPU time for a day of running will be very close to 24 hours times the number of CPUs in the host.
Joe

Vyper · « **Reply #41 on:** 05 Sep 2007, 07:37:41 am »

One idea of this is to update Knabench to have a separate Multithread drawer where the temporary files can be created and a specifically chosen or more WUs lie.

A little program is called to se how many threads the cpu can run in parallell and then creates dir cpu1,cpu2,cpu3 and cpu4 for instance..

Then u could create a call procedure to execute multiple apps that calculates each thread and waits for the last one to return, perhaps u even can make a callroutine that executes on X cpu/thread (affinity)..

If this could be acomplished we will soon see which app that is the best compile for use in parallell execution..

This is thoughts and nothing but thoughts.

There is a app called Wprime that u can enter how many threads it is going to start and a Dos windows appear that takes care of this.. http://www.wprime.net ..

Kind Regards Vyper

H Elzinga · « **Reply #42 on:** 14 Oct 2007, 04:50:38 am »

Quote from: Crunch3r on 02 Sep 2007, 03:00:06 pm

Howdy,

there are new apps ready for download both Windows x32 and x64 incl. GFX enabled ones, ALL are new.

You can see there's been a little change in the name tag as well, 2.4v ---> 2.4V is the new one .

There will be a credit multiplier shown in the log file (stderr.txt).
Those apps are compatible with a soon to be released 5.28 stock application that reads the credit multiplier from the workunit header.

DOWNLOAD ---> http://calbe.dw70.de/seti.html

EDIT

Make sure you have a look at the app_info.xml first ! There might by typos in there. So to make sure all will work, have a look for yourself

HTH
Crunch3r

Are there plans to relese a new automatic installer / test and benchmark tool or should i just download the same app as the 2.2 version currently running and asume this is again the fastest for my setup.

Josef W. Segur · « **Reply #43 on:** 14 Oct 2007, 10:55:34 am »

Quote from: H Elzinga on 14 Oct 2007, 04:50:38 am

Are there plans to relese a new automatic installer / test and benchmark tool or should i just download the same app as the 2.2 version currently running and asume this is again the fastest for my setup.

Installing the 2.4V equivalents to the 2.2B versions you were using is the best approach for now. There may eventually be an automatic install / test, but not soon.
Joe

H Elzinga · « **Reply #44 on:** 15 Oct 2007, 03:38:15 am »

Will give it a try today.
Thanks.

Author Topic: 2.4V updated apps. (Read 33988 times)

Sutaru Tsureku

Re: 2.4V updated apps.

Josef W. Segur

Re: 2.4V updated apps.

Raistmer

Re: 2.4V updated apps.

Josef W. Segur

Re: 2.4V updated apps.

Raistmer

Re: 2.4V updated apps.

msattler

Re: 2.4V updated apps.

Josef W. Segur

Re: 2.4V updated apps.

msattler

Re: 2.4V updated apps.

Raistmer

Re: 2.4V updated apps.

Josef W. Segur

Re: 2.4V updated apps.

Josef W. Segur

Re: 2.4V updated apps.

Vyper

Re: 2.4V updated apps.

H Elzinga

Re: 2.4V updated apps.

Josef W. Segur

Re: 2.4V updated apps.

H Elzinga

Re: 2.4V updated apps.