+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: New version of the KWSN Test & Benchmark Tool with Auto-Installer released  (Read 28872 times)

Offline KarVi

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 252
Arnulf:

Are you sure about that?

My processor is only a x2 3800+, and has only 512kB cache pr core, where the Opteron has 1Mb, but according to my test the modified Intel only core, was faster on my AMD64.

And returned results seem to indicate the same.

Some results for comparison (notice the Qxn versus Qxb):

The Intel "only" core on my A64:

*** CUT START***
CPU time   7928.3125
stderr out   

<core_client_version>5.3.12.tx36</core_client_version>
<stderr_txt>
Optimized SETI@Home Enhanced application

Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra
  Version: Windows SSE2 32-bit 'Ni!' based on seti V5.15  'Chicken Good!'
      Rev: (R-2.0|QxN|FFT:IPP_SSE2|Ben-Joe)
    CPUID: 'AMD K8 Athlon 64 X2 (Toledo)'
     cpus: 1 cores: 2 threads: 1   cache: L1=64K  L2=512K L3=0K
 features: mmx 3Dnow 3Dnow+ sse sse2 sse3 
    speed: 2564 MHz  -- read megs/sec: L1=14068, L2=7206, RAM=3067

Work Unit Info
True angle range:  0.421187

Spikes Pulses Triplets Gaussians Flops
   0      2       2        4     16726219691776
</stderr_txt>

Validate state   Valid
Claimed credit   64.8528194067711
***CUT END***

The generic client on my A64:

***CUT START***
CPU time   8045.375
stderr out   

<core_client_version>5.3.12.tx36</core_client_version>
<stderr_txt>
Optimized Windows SETI@Home Enhanced application

Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra
  Version: Windows SSE2 32-bit based on seti V5.15  'Chicken Good!'
      Rev: (R-2.0|QxB|FFT:IPP_SSE2|Ben-Joe|)
    CPUID: 'AMD K8 Athlon 64 X2 (Toledo)'
     cpus: 1 cores: 2 threads: 1   cache: L1=64K  L2=512K L3=0K
 features: mmx 3Dnow 3Dnow+ sse sse2 sse3 
    speed: 2564 MHz  -- read megs/sec: L1=14063, L2=7058, RAM=3159

Work Unit Info
True angle range:  0.426463

Spikes Pulses Triplets Gaussians Flops
   2      2       4        2     16092869927990
</stderr_txt>

Validate state   Valid
Claimed credit   62.3971229846839
***CUT END***

There's off course nothing proven by displaying two results, but with the results my system have finished since the patch, the Intel client seems about a 100 to 200 secs faster on 60+ credit WU's. And a _little faster on other WU's, but of course not as much, since they have shorter runtimes.
« Last Edit: 21 Nov 2006, 01:24:55 pm by KarVi »
A smile is the shortest distance between two peoble (Victor Borge).

Delerious

  • Guest
Some food for thought between Intel chip revisions, the A1 was bought about 4 months before the B2 one. Both are on the same MB (Abit AB9 pro) and have the same BIOS version as each other and the same memory modules at the same settings.

Each test was run 3x and then avg out.

C2D E6600 Conroe B2 revision/step
Medium
Testing setiathome-kwsn-ssse3-c2-v141.exe...ran for  124 seconds

Long
Testing setiathome-kwsn-ssse3-c2-v141.exe...ran for  226 seconds

C2D E6600 Conroe A1 revision/step
Medium
Testing setiathome-kwsn-ssse3-c2-v141.exe...ran for  135 seconds

Long
Testing setiathome-kwsn-ssse3-c2-v141.exe...ran for  238 seconds

Thats some serious performance increase between one stepping and another Intel, allowing for flucuations thats still nearly 8% gain. :)
« Last Edit: 22 Nov 2006, 08:57:21 am by Delerious »

Offline Arnulf

  • Alpha Tester
  • Knight o' The Realm
  • ***
  • Posts: 63
Hi KarVi!

This is clipped from one of my results, and one thing that springs to mind is that I'm running the 5.5.0 app while you are running the .tx36 calibrating app - I don't know if that may be the cause?

As you can see below, the app. chooses various approaches for the different parts of the analysis.
I patched the Intel versions and the renamed  them to replace the generic versions,
then I compared the different runs in the "KWSN - CPU Test & Benchmark Tool V2"

The generic SSE2 were the fastest on my system, in all of the three versions of the test.

But I will re-run the tests just to be sure an report back to you!  :P

Update!

Having re-run the tests with potentially changed clients I have to agree with you, the Intel is the fastest.
And I'm changing to the fastest one now. .... :D

--------------------------------------------------------------------------------------------------------------------
Starting tests. This will take a few minutes, please be patient!

Testing setiathome-kwsn-ssse3-c2-v141.exe...does not work on your system!

Testing SaH_5.15_KWSN_SSE3_Ben-Joe_2.0_B.exe...ran for  624 seconds

Testing SaH_5.15_KWSN_SSE2-Intel_Ben-Joe_2.0_B.exe...ran for  606 seconds

Testing SaH_5.15_KWSN_SSE2-PM_Ben-Joe_2.0_B.exe...ran for  628 seconds

Testing SaH_5.15_KWSN_SSE2_generic_Ben-Joe_2.0_B.exe...ran for  630 seconds

Skipping other apps - SSE2 is quicker than SSE if supported.

Finished with test run!
--------------------------------------------------------------------------------------------------------------------------

Arnulf

-------------------------------------------------------------------------------------------------------------------------

<core_client_version>5.5.0</core_client_version>
<stderr_txt>
ChirpData--[ak's_sse3_chirp]:  10199721 (chosen)
 GetPeak--[hand_opt]:     11601 (chosen)
   f_sum--[hand_sse]:     37522 (chosen)
GetChiSq--[hoisted+abs(]:     40813 (chosen)
IPP FFT SSE2(64K)[original]:   4788086 (chosen)
Bench Time: 0.42 seconds
work_len=1048576
Optimized Windows SETI@Home Enhanced application

Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra
  Version: Windows SSE2 32-bit based on seti V5.15  'Chicken Good!'
      Rev: (R-2.0|QxB|FFT:IPP_SSE2|Ben-Joe|)
    CPUID: 'AMD K8 Opteron DC 2 (Italy)'
     cpus: 2 cores: 2 threads: 1   cache: L1=64K  L2=1024K L3=0K
 features: mmx 3Dnow 3Dnow+ sse sse2 sse3 
    speed: 1799 MHz  -- read megs/sec: L1=9868, L2=4944, RAM=2325

Work Unit Info
True angle range:  0.624283

Spikes Pulses Triplets Gaussians Flops
   1      1       2        0     12471831233146
</stderr_txt>
« Last Edit: 22 Nov 2006, 02:27:41 pm by Arnulf »

Offline KarVi

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 252
Arnulf:

Allthough I'm using the tx36 calibrating client, all calibration is turned off. The only feature I use is its ability, to set processor affinity, so that each cruncher stays on each own CPU-core (normally they are working on both to various degrees). According to what I've read, its a little faster to set the affinity, the gains should be at about 1-2%.

Every little bit counts.

Glad to see you came to the same conclusion, that the Intel "only" rev is the fastest.

Intel claims that the Intel "only" files will use special hidden registers that are only present in Intel chips, but still they run flawless on AMD systems. I find it to be extremely bad behaviour to restrict the competition in such a way, allthough Intell is probably in their rights to do it (who knows).

It seems that the larger cache in your Opteron benefits even more than my X2 3800+. My difference was only 8 secs, yours is 24 secs, and even SSE3 is faster, which it is not on my system.
A smile is the shortest distance between two peoble (Victor Borge).

Offline Simon

  • Ni!
  • Knight who says 'Ni!'
  • *****
  • Posts: 1045
    • Is it a bird? Is it a plane? No...its-the.net!
About the ICCpatched exes - it's funny, in some tests they're quicker, in some they're not. My opinion on the issue is still not set in stone.

[...]
Thats some serious performance increase between one stepping and another Intel, allowing for flucuations thats still nearly 8% gain. :)

Nice, didn't know that the different steppings were *that* much apart in performance...

Wonder what my Woodcrest is, will have to check with CPU-Z.

Thanks,
Simon.
« Last Edit: 22 Nov 2006, 03:05:25 pm by Simon »

Offline KarVi

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 252
Simon:

On my machine the results are conclusive, and repeatable.

Every time I've run the test, the Intel "only" version was faster.

Off course things can change with other WU's, different number of spikes, gaussians, triplets, the angle range, you name it.

But until anything else is proven, I stick with what my tests tell me.
A smile is the shortest distance between two peoble (Victor Borge).

Offline Simon

  • Ni!
  • Knight who says 'Ni!'
  • *****
  • Posts: 1045
    • Is it a bird? Is it a plane? No...its-the.net!
KarVi,

I think it's also pretty model-dependent. For example, on more recent (X2+) AMD CPUs, it seems the Intel-only patched apps are quicker. On my A64 3500+ single core (S939, 512 cache), last time I tested they were slower.

So it's not really such a clear picture for everybody, but like you said: go with what your tests tell you.

I'm currently running some more tests on that A64 system and will put the app on BOINC for a few days to compare run times.

<edit>
Short: same runtime
Medium: SSE2-generic was 6 seconds faster
Long: SSE2-generic was 10 seconds faster
Guess I won't be putting it on BOINC then ;) Like I said, very model-dependent...
</edit>

Regards,
Simon.
« Last Edit: 22 Nov 2006, 04:44:13 pm by Simon »

Offline KarVi

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 252
Simon:

What rev. is your A64?

I seem to recall something about AMD making some small optimizations to the core, when they added SSE3 ability, and that they pinpointed SIMD and FPU execution in the optimization.

Perhaps this makes a difference? Off-course if your A64 has SSE3, then thats not the case :-)

But if its the case, recomendations could be narrowed down to that if the chip supports SSE3, then the Intel "only" version is recommended (for anyone who is able to patch it).
A smile is the shortest distance between two peoble (Victor Borge).

BenHer

  • Guest
Patch? --- Patch?...what is this Patch thing you refer to Sir?  ::)

Offline KarVi

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 252
Its possible to modify the Intel "only" versions of the aplications, so that they can run on AMD chips. Often even faster than the generic SSE2 version.

But this requires that one knows specifically what to change, or that you have at little patch program or script, which changes the code for you.

Thats the patch I'm talking about.
A smile is the shortest distance between two peoble (Victor Borge).

Offline Vyper

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 376
Edited away because of stupidity of me..  :-[  Sorry
« Last Edit: 23 Nov 2006, 05:15:14 pm by Vyper »

Offline Simon

  • Ni!
  • Knight who says 'Ni!'
  • *****
  • Posts: 1045
    • Is it a bird? Is it a plane? No...its-the.net!
Vyper,

thanks...but, you might have gone to page 1 of this thread, where I posted the URL already.

Anyway, again (not meant personally, goes for everyone) -

We're aware of the situation, we have the relevant information, we've tested things extensively. We're however also slowly getting a bit exasperated with all the hubbub about patching or not.

My final words: I'm not going to put up patched executables, but in no way am I keeping you from doing what you have to.

Now, please, can we get back to more interesting topics? Also, please can everyone make sure to read the WHOLE thread? ;)

Regards,
Simon.

talaktalan

  • Guest
Hi,

I've run the v2 benchmark on a dual Intel PIII 1GHz (MMX and SSE-capable) system w/ Win2000 and the not supported apps are causing the benchmark application to crash (that is the SSE2 and SSE3 apps).

As a simple workaround I've replaced the not supported apps with an empty textfile renamed to the .exe filenames (while I was at it, I've also replaced one of the not supported apps with the standard Berkeley client for comparison ;D ).

As a suggestion: Is it possible to maybe include a button in the next version of your benchmark tool to disable one or more of the apps, before making the benchmark run?

Anyway, thx a lot for the tool. 8)

Regards

Alex

Offline Arnulf

  • Alpha Tester
  • Knight o' The Realm
  • ***
  • Posts: 63
I just would like to add that the potentially modified SSE2_Intel app have taken the RAC of my Opteron rig from 1250 to 1550 the last month.

Arnulf  ;D

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 257
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 235
Total: 235
Powered by EzPortal