Forum > Windows

New version of the KWSN Test & Benchmark Tool with Auto-Installer released

<< < (4/6) > >>

KarVi:
Arnulf:

Are you sure about that?

My processor is only a x2 3800+, and has only 512kB cache pr core, where the Opteron has 1Mb, but according to my test the modified Intel only core, was faster on my AMD64.

And returned results seem to indicate the same.

Some results for comparison (notice the Qxn versus Qxb):

The Intel "only" core on my A64:

*** CUT START***
CPU time   7928.3125
stderr out   

<core_client_version>5.3.12.tx36</core_client_version>
<stderr_txt>
Optimized SETI@Home Enhanced application

Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra
  Version: Windows SSE2 32-bit 'Ni!' based on seti V5.15  'Chicken Good!'
      Rev: (R-2.0|QxN|FFT:IPP_SSE2|Ben-Joe)
    CPUID: 'AMD K8 Athlon 64 X2 (Toledo)'
     cpus: 1 cores: 2 threads: 1   cache: L1=64K  L2=512K L3=0K
 features: mmx 3Dnow 3Dnow+ sse sse2 sse3 
    speed: 2564 MHz  -- read megs/sec: L1=14068, L2=7206, RAM=3067

Work Unit Info
True angle range:  0.421187

Spikes Pulses Triplets Gaussians Flops
   0      2       2        4     16726219691776
</stderr_txt>

Validate state   Valid
Claimed credit   64.8528194067711
***CUT END***

The generic client on my A64:

***CUT START***
CPU time   8045.375
stderr out   

<core_client_version>5.3.12.tx36</core_client_version>
<stderr_txt>
Optimized Windows SETI@Home Enhanced application

Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra
  Version: Windows SSE2 32-bit based on seti V5.15  'Chicken Good!'
      Rev: (R-2.0|QxB|FFT:IPP_SSE2|Ben-Joe|)
    CPUID: 'AMD K8 Athlon 64 X2 (Toledo)'
     cpus: 1 cores: 2 threads: 1   cache: L1=64K  L2=512K L3=0K
 features: mmx 3Dnow 3Dnow+ sse sse2 sse3 
    speed: 2564 MHz  -- read megs/sec: L1=14063, L2=7058, RAM=3159

Work Unit Info
True angle range:  0.426463

Spikes Pulses Triplets Gaussians Flops
   2      2       4        2     16092869927990
</stderr_txt>

Validate state   Valid
Claimed credit   62.3971229846839
***CUT END***

There's off course nothing proven by displaying two results, but with the results my system have finished since the patch, the Intel client seems about a 100 to 200 secs faster on 60+ credit WU's. And a _little faster on other WU's, but of course not as much, since they have shorter runtimes.

Delerious:
Some food for thought between Intel chip revisions, the A1 was bought about 4 months before the B2 one. Both are on the same MB (Abit AB9 pro) and have the same BIOS version as each other and the same memory modules at the same settings.

Each test was run 3x and then avg out.

C2D E6600 Conroe B2 revision/step
Medium
Testing setiathome-kwsn-ssse3-c2-v141.exe...ran for  124 seconds

Long
Testing setiathome-kwsn-ssse3-c2-v141.exe...ran for  226 seconds

C2D E6600 Conroe A1 revision/step
Medium
Testing setiathome-kwsn-ssse3-c2-v141.exe...ran for  135 seconds

Long
Testing setiathome-kwsn-ssse3-c2-v141.exe...ran for  238 seconds

Thats some serious performance increase between one stepping and another Intel, allowing for flucuations thats still nearly 8% gain. :)

Arnulf:
Hi KarVi!

This is clipped from one of my results, and one thing that springs to mind is that I'm running the 5.5.0 app while you are running the .tx36 calibrating app - I don't know if that may be the cause?

As you can see below, the app. chooses various approaches for the different parts of the analysis.
I patched the Intel versions and the renamed  them to replace the generic versions,
then I compared the different runs in the "KWSN - CPU Test & Benchmark Tool V2"

The generic SSE2 were the fastest on my system, in all of the three versions of the test.

But I will re-run the tests just to be sure an report back to you!  :P

Update!

Having re-run the tests with potentially changed clients I have to agree with you, the Intel is the fastest.
And I'm changing to the fastest one now. .... :D

--------------------------------------------------------------------------------------------------------------------
Starting tests. This will take a few minutes, please be patient!

Testing setiathome-kwsn-ssse3-c2-v141.exe...does not work on your system!

Testing SaH_5.15_KWSN_SSE3_Ben-Joe_2.0_B.exe...ran for  624 seconds

Testing SaH_5.15_KWSN_SSE2-Intel_Ben-Joe_2.0_B.exe...ran for  606 seconds

Testing SaH_5.15_KWSN_SSE2-PM_Ben-Joe_2.0_B.exe...ran for  628 seconds

Testing SaH_5.15_KWSN_SSE2_generic_Ben-Joe_2.0_B.exe...ran for  630 seconds

Skipping other apps - SSE2 is quicker than SSE if supported.

Finished with test run!
--------------------------------------------------------------------------------------------------------------------------

Arnulf

-------------------------------------------------------------------------------------------------------------------------

<core_client_version>5.5.0</core_client_version>
<stderr_txt>
ChirpData--[ak's_sse3_chirp]:  10199721 (chosen)
 GetPeak--[hand_opt]:     11601 (chosen)
   f_sum--[hand_sse]:     37522 (chosen)
GetChiSq--[hoisted+abs(]:     40813 (chosen)
IPP FFT SSE2(64K)[original]:   4788086 (chosen)
Bench Time: 0.42 seconds
work_len=1048576
Optimized Windows SETI@Home Enhanced application

Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra
  Version: Windows SSE2 32-bit based on seti V5.15  'Chicken Good!'
      Rev: (R-2.0|QxB|FFT:IPP_SSE2|Ben-Joe|)
    CPUID: 'AMD K8 Opteron DC 2 (Italy)'
     cpus: 2 cores: 2 threads: 1   cache: L1=64K  L2=1024K L3=0K
 features: mmx 3Dnow 3Dnow+ sse sse2 sse3 
    speed: 1799 MHz  -- read megs/sec: L1=9868, L2=4944, RAM=2325

Work Unit Info
True angle range:  0.624283

Spikes Pulses Triplets Gaussians Flops
   1      1       2        0     12471831233146
</stderr_txt>

KarVi:
Arnulf:

Allthough I'm using the tx36 calibrating client, all calibration is turned off. The only feature I use is its ability, to set processor affinity, so that each cruncher stays on each own CPU-core (normally they are working on both to various degrees). According to what I've read, its a little faster to set the affinity, the gains should be at about 1-2%.

Every little bit counts.

Glad to see you came to the same conclusion, that the Intel "only" rev is the fastest.

Intel claims that the Intel "only" files will use special hidden registers that are only present in Intel chips, but still they run flawless on AMD systems. I find it to be extremely bad behaviour to restrict the competition in such a way, allthough Intell is probably in their rights to do it (who knows).

It seems that the larger cache in your Opteron benefits even more than my X2 3800+. My difference was only 8 secs, yours is 24 secs, and even SSE3 is faster, which it is not on my system.

Simon:
About the ICCpatched exes - it's funny, in some tests they're quicker, in some they're not. My opinion on the issue is still not set in stone.


--- Quote from: Delerious on 22 Nov 2006, 08:34:32 am ---[...]
Thats some serious performance increase between one stepping and another Intel, allowing for flucuations thats still nearly 8% gain. :)

--- End quote ---

Nice, didn't know that the different steppings were *that* much apart in performance...

Wonder what my Woodcrest is, will have to check with CPU-Z.

Thanks,
Simon.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version