+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: New apps based on code revision 2.2 'Noo? No, Ni!' have been released!  (Read 75075 times)

Offline Simon

  • Ni!
  • Knight who says 'Ni!'
  • *****
  • Posts: 1045
    • Is it a bird? Is it a plane? No...its-the.net!
Hi folks,

Code Revision 2.2 'Noo? No, Ni!' of the KWSN/Lunatics.at SETI@Home optimized apps has been released.

This new revision supersedes the previously released Rev. 2.0 apps for all systems as well as the 1.41 app for Core-2 based systems.

The people responsible for this new version are, in no particular order, Josef W. Segur, Ben Herndon and Alex Kan. A great big "thank you", guys!

Thanks also to the tireless members of the prerelease test community!

Windows optimized apps (static executables, 32/64-bit compatible)

There are various different optimized apps. Please use a program like CPU-Z to find out exactly what your system supports.

Choose the first, in order, your system supports among the two lists (AMD/Intel or Intel only).


Intel-only apps (do NOT run these on AMD systems!)

Core 2 SSSE3-optimized app (Conroe, Woodcrest, Clovertown, Kentsfield, Merom, but NOT Core Solo/Duo!)
Pentium M/Core Solo SSE2-optimized app
Only use this app on Pentium M-based systems like Dothan, Banias or Core Solo/Duo (NOT Core 2 Duo!).
Pentium 4/Pentium-D SSE3-optimized app
Pentium 4 SSE2-optimized app


AMD and Intel-compatible apps
SSE2-generic optimized app
SSE-optimized app
MMX-optimized app

As always, use these apps at your own risk. A pre-edited app_info.xml has been included with all files. Please unpack the files to a temporary location using 7-Zip (get it from http://www.7-zip.org), then read Instructions.txt in the folder it creates.

An Auto-Installer for these versions will come tomorrow or the day after, along with versions including graphics.

Happy crunching!
Regards,
Simon
« Last Edit: 12 Sep 2010, 12:20:22 am by Gecko »

Furex

  • Guest
What are the improvements of the newest release? I've read something about C2D, so isn't there anything new for older machines ?

Offline KarVi

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 252
Thanks to all the programmers for their great efforts! I'm already running the SSE client on my AthlonXP, and a patched Intel "only" SSE2 on my A64.

Looking forward to seeing how much it improves my times, but it seems that the AthlonXP likes the new app. a _lot_.

Also looking forward to seing reports about if the Intel SSE2 or SSE3 is the faster one for my A64.

Furex:

As far as remember there are some rather huge improvements for this release.
They found a memory read that was happening every cycle, and found a way to reduce these reads a lot. This is the biggest improvement, and should help _all_ processors. There is also some new SSE code in the release.
The numbers I've been hearing is improvements of 20-30% and maybe even more in total crunching time. I'm not 100% sure about these numbers, my memory can be a little inconsistent  ;D , but time will tell us about the improvements.
« Last Edit: 16 Feb 2007, 12:29:57 pm by KarVi »
A smile is the shortest distance between two peoble (Victor Borge).

popandbob

  • Guest
Seeing about a 10-20% improvement over 2.0 so far  ;D ;D

Thanks for the time you have put into this.

BoB

Furex

  • Guest
I had what appears to be a slight improvement on my Dual Core Opteron by using the patched IntelSSE3 over Generic SSE2 (still gathering stats to be true). - edit: I was talking about 2.0 of course

Is there a thread detailing changes in release 2.2 ? I wasn't able to find one
« Last Edit: 16 Feb 2007, 12:34:20 pm by Furex »

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Thanks to Simon and the Coop, as always. Another splendid effort.

One tiny wee bugette in CPU-ID - it thinks my Xeon 53xx 'Clovertown' is a Xeon 51xx 'Woodcrest' - but that's not important: it can be tidied up later, or not, as you see fit.

Offline KarVi

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 252
I've also found a tiny error in the output of WU's.

This is the features output for my A64 on the Intel SSE2 client.

Features: MMX, 3DNow!, 3DNow!+, SSE, SSE3, SSE3,

Notice that SSE3 seems to be supported twice, and SSE2 is not  :)
A smile is the shortest distance between two peoble (Victor Borge).

Ronon Dex

  • Guest
Thanx for the GOOD work!  :)


I have the Intel QX6700.
I use the SSSE3 Core2 app.

The <stderr_txt> isn't correct or?

--------------------------------------------------------
Optimized SETI@Home Enhanced application
Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra
Version: Windows SSE3 32-bit based on seti V5.15 'Noo? No - Ni!'
Revision: R-2.2|xT|FFT:IPP_SSE3|Ben-Joe
CPUID: Intel Xeon 51xx 'Woodcrest'
CPUs: 1, cores: 4, threads: 1 cache: L1=32K, L2=4096K, L3=0K
Features: MMX, SSE, SSE3, SSE3, SSSE3
speed: 2666 MHz -- read MB/s: L1=9951, L2=8617, RAM=5579
--------------------------------------------------------

Only for information!  :)


EDIT:
I downloaded (I have running (Task Manager)) the KWSN_2.2_SSSE3-C2_Ben-Joe.exe... (with the other files)
Or is the link not right for the SSSE3 app.? And I have the SSE3 app.?
Or the name of the/my app. is not rigtht? SSSE3 but is SSE3?
« Last Edit: 16 Feb 2007, 04:19:59 pm by Ronon Dex »

Offline Urs Echternacht

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 4121
  • ++
What are the improvements of the newest release? I've read something about C2D, so isn't there anything new for older machines ?
credit for wu: 57.17
Pentium3 1.4GHz@1.63GHz:
v1.3_SSE 21700secs
v2.2_SSE 17775secs
improvement appr. -18%

credit for wu: 60.86
Pentium M 2.0GHz@2.4GHz:
v2.0_SSE2_PM  7821secs
v2.2_SSE2_PM  6348secs
improvement appr. -19% (but first wu was validated INVALID)

Thanks to all the optimizers over here.
« Last Edit: 16 Feb 2007, 06:19:29 pm by Urs Echternacht »
_\|/_
U r s

Offline Simon

  • Ni!
  • Knight who says 'Ni!'
  • *****
  • Posts: 1045
    • Is it a bird? Is it a plane? No...its-the.net!
Eek!

Seems I made a typo when changing the stats output; the first "SSE3" should be SSE2, instead.

Guess I'll have to recompile the apps again :) Thanks for noticing. Since it's a cosmetic thing only (it doesn't affect any function choices), it won't be a required upgrade.

What are the improvements of the newest release? I've read something about C2D, so isn't there anything new for older machines ?

As for changes vs. 2.0 -

Improved pulse folding
Improved accuracy (especially on Core 2 systems vs. 1.41)
Benchmarking for the various folding versions
Some extra chirp functions adapted from Alex Kan's code (SSE and SSE2, was only SSE3 before)
Benchmark improvements as far as correct function choices go (the app tests each available function for sub-tasks like chirping, pulse folding, etc. up to the supported SSE level and uses the quickest, but did choose incorrectly sometimes, fixed)
Major efficiency improvement by Joe Segur - Not doing transpose when it's not needed
Doing transpose on 4 FFT chunks at a time rather than 1

and some others I probably forgot. Ben and Joe can complete the list or correct it.

How the apps will perform depends a lot on your host. On my Xeon 3.0 HT system, I saw an unbelievable jump of 60%. On my A64, around 15%, on my PD 805, around 20%, same on PD 9xx hosts. Around 10% quicker than 1.41 on Core 2 systems for most WUs, and around 35% quicker for the dreaded 58.7s, if I remember correctly. These are values I've seen in my benchmarks, so they may not reflect your results.

That said, you have my word you'll be pleasantly surprised because it is quicker than 2.0B and 1.41 on ALL hosts I've tested it on.

HTH,
Simon.
« Last Edit: 16 Feb 2007, 04:33:03 pm by Simon »

Offline Simon

  • Ni!
  • Knight who says 'Ni!'
  • *****
  • Posts: 1045
    • Is it a bird? Is it a plane? No...its-the.net!

CPUID: Intel Xeon 51xx 'Woodcrest'
Features: MMX, SSE, SSE3, SSE3, SSSE3

EDIT:
I downloaded (I have running (Task Manager)) the KWSN_2.2_SSSE3-C2_Ben-Joe.exe... (with the other files)
Or is the link not right for the SSSE3 app.? And I have the SSE3 app.?
Or the name of the/my app. is not rigtht? SSSE3 but is SSE3?

You have the correct app, and the stderr output is also correct (except for that small error others pointed out where SSE2 in the features line becomes SSE3, I've coloured it red above, and our CPUID table needs an update for quad-cores).

We have tested a lot of versions on Core 2 based systems. Try as we might, SSSE3 does not offer any usable functions for SETI@Home crunching, so calling the app SSSE3 really is not true as such. However, you can tell the Intel compiler to optimize the program for Core 2 systems, and this does work.
In the end, the SSE3 functions coupled with the Core 2 optimizations produced the fastest crunch times, so that's why we did it this way.

So that's why it says SSE3 on Core 2, and the app is called SSE3-C2.

All's well ;)

Simon.
« Last Edit: 16 Feb 2007, 04:27:27 pm by Simon »

Ronon Dex

  • Guest
You have the correct app, and the stderr output is also correct (except for that small error others pointed out where SSE2 in the features line becomes SSE3, I've coloured it red above, and our CPUID table needs an update for quad-cores).

We have tested a lot of versions on Core 2 based systems. Try as we might, SSSE3 does not offer any usable functions for SETI@Home crunching, so calling the app SSSE3 really is not true as such. However, you can tell the Intel compiler to optimize the program for Core 2 systems, and this does work.
In the end, the SSE3 functions coupled with the Core 2 optimizations produced the fastest crunch times, so that's why we did it this way.

So that's why it says SSE3 on Core 2, and the app is called SSE3-C2.

All's well ;)

Simon.

Thanx for explaining for the "ignorant" people, like me! ;)

Small suggestion :) :


In the <stderr_txt>:
instead of:
Version: Windows SSE3 32-bit based on seti V5.15 'Noo? No - Ni!'
that:
Version: Windows SSE3-Core2 32-bit based on seti V5.15 'Noo? No - Ni!'

And the name of the app:
instead of:
KWSN_2.2_SSSE3-C2_Ben-Joe.exe
that:
KWSN_2.2_SSE3-Core2_Ben-Joe.exe

That the people, like me, know that they have the correct app. ... ;)


Offline Simon

  • Ni!
  • Knight who says 'Ni!'
  • *****
  • Posts: 1045
    • Is it a bird? Is it a plane? No...its-the.net!
Thing is,

do you put it so it's correct or do you put it so people can more easily identify it? ;) It's always a trade-off.

In any case, an auto-installer package should come up shortly, as well as some recompiles with rectified stderr output.

Tsk, I really need more sleep :)

Regards,
Simon.

Offline KarVi

  • Alpha Tester
  • Knight Templar
  • ***
  • Posts: 252
Simon:

I had an idea it was only an cosmetic error, and found it a little funny that my CPU had 2xSSE3 (actually it has, since its dual core) :-)

Something else for anybody who is interested:

I have done the quite tiresome job of making all the new applications work with the previous auto-installer and tester (a lot of patching and renaming...), to find out which application does best on my A64 (CPUID: AMD Athlon 64 X2 'Toledo'). Its an Socket 939 Athlon64 X2 3800+ clocked at 2475Mhz.

After patching all the Intel "only" versions, and renaming them and the generic version to the correct old application names, I let the program run a medium size test.

The result was this.

Patched Intel "only" SSE3-P4:   214 seconds.
Patched Intel "only" SSE2-P4:   212 seconds.
Patched Intel "only" SSE2-PM:   205 seconds.
Generic SSE2:                         212 seconds.

It seems that the fastest version for my A64 this time is the SSE2-PM, and it seems to be quite a lot faster.

More testing has to be done, but until then, I'm running the SSE2-PM version, and letting it stretch its legs.
« Last Edit: 16 Feb 2007, 05:51:17 pm by KarVi »
A smile is the shortest distance between two peoble (Victor Borge).

Offline Simon

  • Ni!
  • Knight who says 'Ni!'
  • *****
  • Posts: 1045
    • Is it a bird? Is it a plane? No...its-the.net!
Thanks for those results, Karsten!

Quite interesting. I'd say that since the Pentium M has a short pipeline like the Athlon64s do, that may be the deciding factor for the speedup you're seeing.

I believe so far people have only patched the P4-SSEx versions. Good idea there.

Let us know how it goes!
Simon.

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 40
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 35
Total: 35
Powered by EzPortal