+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: About the Usefullness: Optimized SETI@apps for AMD64 Opteron with SSE4a  (Read 12906 times)

paxv

  • Guest
Hi to all:


Though not a complete noob I restarted crunching just recently (did crunch for the old Seti project before)

I was wondering:

Would it be usefull to have any SSE4a Optimized apps for the AMD Opteron 2376+ line?
I do know there is a SSE4a for AMD but does it enhance performance like the SSE4.1 and 4.2 used by Intel? ::)

And last question:

I'm crunching happily with my 2 Quad Core 2376 Opterons (2.3 Ghz) on SSE3 now though, for half the money of an 940 Core i7 processor based system. ;) With a good Mobo and 16 Gb of RAM it works nice. I use a Cuda videocard (Nvidia 9600GT 1Gb) but have 8 cores:
Can I crunch with two different systems at once: maybe enhanced on 1 processor, and 7* astropulse v5? If so: can one lock this?

Greetings,

Ronald Zaneveld aka PaxV

I hope to get to 9k-10k points a day average (using 20 hrs for an astropulse v5.0 unit times 7) so roughly I make 7*24 hrs/20 packages a day 8.4*1250pts=10,5k points a day average with one processorline working on Leiden and Einstein.  :o

Thanks to AK and all others for your inspiring work.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Would it be usefull to have any SSE4a Optimized apps for the AMD Opteron 2376+ line?
I do know there is a SSE4a for AMD but does it enhance performance like the SSE4.1 and 4.2 used by Intel? ::)
Do you know someone who will do SSE4a opt app? There is no SSE4a support for now so your question is very theoretical one.

paxv

  • Guest
I used to be a GIS/Database programmer, but I wasn't working on something like this...

For me it's a long time ago. It has been 10 years since I had to quit programming because of epilepsy (caused by CRT screens mostly), with TFT it's better now (no more refresh), but I'm working with a system that's a bit more powerful than the late 90's with the pentium 2's  and 1st athlons.

I wouldn't have a clue how to optimize the app. Reverse engenieer AP first and than recode on my computer? AP is 32 bit and I'm running a 64bit system. Vista 64bit and Linux (ubuntu 8.10 64 bit).

I am just wondering. If It WOULD be usefull I could think about things...

Greetz

PaxV

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Why reverse engenering?? Both AP and MB source code in C++ is available for download, both stock and optimized ones.
So everyone could download it, read it and maybe notice some points for improvement.

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
...
Would it be usefull to have any SSE4a Optimized apps for the AMD Opteron 2376+ line?
I do know there is a SSE4a for AMD but does it enhance performance like the SSE4.1 and 4.2 used by Intel?
...

The 6 instructions in SSE4a are listed in http://en.wikipedia.org/wiki/SSE4#SSE4a. Maybe 1 or 2 could possibly be useful, but not for any of the really intense computations done in either S@H Enhanced or Astropulse, so might at best provide a slight improvement.

The SSE3 instructions provide almost all the functionality needed for optimisation, Intel's SSSE3 and SSE4.1 provide some enhancement but not a huge amount for our work. SSE4.2 seems unlikely to help at all.

I wouldn't have a clue how to optimize the app. Reverse engenieer AP first and than recode on my computer? AP is 32 bit and I'm running a 64bit system. Vista 64bit and Linux (ubuntu 8.10 64 bit).
...

Reverse engineering isn't needed, these applications are GPL so source code for both stock and our optimized versions is available. There are certainly more optimization possibilities, and someone thinking specifically about 64 bit builds might find some opportunities to tune for that.
                                                                                   Joe


paxv

  • Guest
Thank you for your time and answer.

Lastly this question: a little of topic but still related to the former:

Boinc lists my processor as being SSE/SSE2 and
I know for a fact its a SSE/SSE2/SSE3/SSE4a
(2376 Opteron Shanghai)

Doesn't BOINC check that well or is it AMD/INTEL differences?



(SSSE3 and SSE4.1/SSE 4.2 are intel only)

Offline arkayn

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 1230
  • Aaaarrrrgggghhhh
    • My Little Place On The Internet
Thank you for your time and answer.

Lastly this question: a little of topic but still related to the former:

Boinc lists my processor as being SSE/SSE2 and
I know for a fact its a SSE/SSE2/SSE3/SSE4a
(2376 Opteron Shanghai)

Doesn't BOINC check that well or is it AMD/INTEL differences?



(SSSE3 and SSE4.1/SSE 4.2 are intel only)

BOINC only knows what the OS knows, XP only knows up SSE2.

paxv

  • Guest
I run windows Vista 64...

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
I run windows Vista 64...
Vista is "clever" only in unneeded areas it seems. Big big crap in short ;) ;D

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
I run windows Vista 64...

Then it may be reporting SSE3 as pni (Prescott New Instructions).
                                                                               Joe

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
I'd suggest a 'current generation' OS has little direct use for detection or implementation of anything beyond SSE2, given that the primary cacheability & streaming instructions useful in drivers etc. are SSE2 and below.  SSE3 unaligned loads are great, but since the OS must support SSE2 limited chips anyway, then that instruction becomes redundant, since the datasets would already be carefully aligned in things like drivers and runtime libraries. 

For the bulk of  SSE3 and above, the instructions become more application targetted and vendor specific, which applies to things like codecs and compression software (We love horizontal math too!) rather than OS function, so the necessity isn't really there for any detection, implementation or support at OS level.  Really, the same probably applies to Boinc too, since it's just a coarser grained version of the same thing  :o ... Only better  :P

paxv

  • Guest
So noted. Thank U all 4 Ur time

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 298
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 256
Total: 256
Powered by EzPortal