Seti@Home optimized science apps and information

Optimized Seti@Home apps => Discussion Forum => Topic started by: KarVi on 20 Jan 2007, 06:17:05 pm

Title: For the programmers:
Post by: KarVi on 20 Jan 2007, 06:17:05 pm: Since I'm not a programmer, this is totally incomprehensive for me but this link:

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF

is AMD's official optimization guide for the Athlon64 family.

If you don't allready use it, perhaps some gains could be found for the many AMD users?

I have no idea if its easy or not to implement the things written in this document, but it would be interresting to see what an application that was specifically optimized for the Athlon64 architecture could gain in performance.
Title: Re: For the programmers:
Post by: BenHer on 21 Jan 2007, 01:35:08 pm: Our current release (2.0b or c) currently screams on Athlon systems, and also on earlier Intel systems.

It doesn't do anywhere near as well on Core or Core2 chips. Since none of us programmers has one, we really have conceptual issues making the code faster on them. All of our routines that work really fast on the other chips I mentioned don't seem to speed up the crunching on the Core chips...and so far we have no brilliant insights as to why.

Let me explain why having one is necessary to the process.
To develop an optimized section of a program,
1. the programmer conceives of a method that *might* make it faster.
2. He/she then writes a seprate version of the source code with these hopeful improvements.
2.a. Re-coding
3. After coding they try to compile...
3.a. at which point they discover some errors they overlooked in their coding.
3.b. After maybe multiple compiles the executable finally compiles
4. Then the executable is run in either a short WU test or a test-bench attempted run
5. If the test-bench results are incorrect the programmer returns to step 2.a
6. If a test bench is used then a later short WU test must be run to verify the code works
7. At this point timing tests are also done to see if the changes improved the code speed..and if so by how much. If the results don't validate, however, extra speed doesn't matter.

So...if the programmer doesn't have the destination platform, in this case core or core2 chips, then between each of steps 4 5 6 and 7 they would have to email or post their executable and wait for someone who did have the chip to test, then post back results.

Hope this makes it all clearer.

P.S.: A core2 chip with motherboard + RAM + case would probably cost, in my area, about $170+80+50. Could I afford this...yes, easily. Am I interested enough in seti to buy it when I otherwise have no need for it...no.

P.P.S: Alex Kan has a Core2 based Macintosh - and he has written some very fast code for that. Crunch3r copied his source over to an Intel machine and modified it so it would compile for PC. He has an internal version only whose status we don't know about.
Title: Re: For the programmers:
Post by: KarVi on 22 Jan 2007, 10:37:53 am: Thanks for the explanation.

I had an idea about how developement work progresses, and it was pretty much as you describe here.

Reading AMD's optimization guide, could, I think, help the programmer in steps 1 and 2, in making code that doesn't execute in an unfortunate way.

I think its a proven point that Intels compilers don't do AMD chips any favours. So helping the AMD compiles becomes the programmers unfortunate job, and an optimization guide must be a worthwhile read?

I know time constraints are a serious factor, but then again having an optimization guide at hand, could be helpfull looking for examples of effective coding? And doing things the right way the first time (optimation vise) must be good?

You must excuse me if I in any way sound as if I don't respect the work you are allready doing, because I have the greatest respect for what you guys are doing.

I'm just the type myself that likes to read manuals, and think I benefit greatly by doing so, as I often know things about a product that others don't, because they dont take the time.

I'm just trying to help in my own very limited way.

If my input is not wanted I will stop giving it.
Title: Re: For the programmers:
Post by: BenHer on 22 Jan 2007, 02:54:46 pm: Sorry on my part Karvi ... I don't do it very often but sometimes I "go off". :o Heh.

You are definitely correct about Intel "short chaning" amd in certain compiles.
If you do some searching around the forums here you will find more details about that.

The actual placement of opcodes in the object code works pretty well on both AMD and intel, but some library functions are really built for Intel.

Personally I have a copy of that AMD optimization guide as well as the corresponding intel one. But the other programmers might not have gotten one...so good heads up.
Title: Re: For the programmers:
Post by: Josef W. Segur on 22 Jan 2007, 05:00:53 pm: Quote from: KarVi on 20 Jan 2007, 06:17:05 pm
Since I'm not a programmer, this is totally incomprehensive for me but this link:

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF

is AMD's official optimization guide for the Athlon64 family.

If you don't allready use it, perhaps some gains could be found for the many AMD users?

I have no idea if its easy or not to implement the things written in this document, but it would be interresting to see what an application that was specifically optimized for the Athlon64 architecture could gain in performance.

I fully agree with Ben's replies, and for the record I have that document plus an earlier (pre-64) version, and also both current and old versions of the Intel optimization advice. I don't claim to have fully absorbed the contents, though.

In terms of AMD specific optimisation, what would be interesting is a comparison of the AMD and Intel advice looking for contradictions and/or differences in emphasis. And neither set of documents goes very far into how much improvement can be expected from a specific kind of optimization, so it is hard to judge where to direct effort at improvement.

Ben provided a facility to test various versions of optimized code which runs when the 2.0 builds start. The version of each optimized function which is fastest on that system is used for crunching the WU. It is certainly true that on AMD systems different choices are made by that process than on most Intel systems, so to that extent the 2.0 builds already adapt to AMD. But the variety of optimized routines we've generated certainly don't cover all possibilities.
Joe