Forum > Discussion Forum

Interesting F/U on Intel Compiler vs. AMD issue

(1/3) > >>

Gecko_R7:
Old subject here, but very interesting "who" the FTC is consulting.....
Click the link for Agner's blog comments.


--- Quote ---THE US Federal Trade Commission (FTC) apparently is interested in the fact that Intel's compiler deliberately cripples performance for non-Intel processors such as those made by AMD and VIA.
Writing in his blog, programming expert Agner Fog said that it appears that Chipzilla's compiler can produce different versions of pieces of code, with each version being optimised for a specific processor and/or instruction set. The system detects which CPU it's running on and chooses the optimal code path accordingly.
But it also checks what instruction sets are supported by the CPU and it also checks the vendor ID string. If the string says 'GenuineIntel' then it uses the optimal code path. If the CPU is not from Intel then, in most cases, it will use the slowest version of the code it can find.
While this is known, few Intel compiler users actually seem to know about it. Chipzilla does not say that the compiler is Intel-specific, either.
Fog said that if more programmers knew this fact they would probably use another compiler as everyone wants their code to run just as well on AMD's processors as on Intel's.
Some benchmarking programs are affected by this, up to a point where benchmark results can differ greatly depending on how a processor identifies itself.
It seems that in the fine print of the AMD settlement Intel has agreed to fix this problem. But apparently the FTC will still be interested because VIA could still be disadvantaged.
--- End quote ---

Jason G:
Funnily enough, was looking at the most recent of Agner's comments last night (in a different context), and the situation indeed hasn't changed AFAICT.  As you know, the fact that Intel's dynamic dispatch mechanism is effectively 'broken' is why we avoid the issue by having, for multibeam, multiple platform targeted builds with single code paths only.  The net effect is that, without instantiating our own dispatch mechanism, we have the many MultiBeam builds which is a huge maintenance nightmare.

Now that the AstroPulse codebase is maturing somewhat, it too could potentially head down that road ... I'm looking at alternative methods... (Including those described by Agner of course)

KarVi:
I think its about time something was done about it!

It will be interesting to see if any changes are made, and if, how much speedup AMD/VIA will see.

Jason G:
For completion, and interest, here's the mentioned workarounds described by Agner in his optimization manuals:
- In Green are the approaches we already use for multibeam,  and the sole ICC compiled component  library of astropulse (fftw SSE, release Astropulse was always an MSVC build) ... These approaches require multiple platform specific builds.
- In Yellow, are what we could do to hopefully bring the build count back down
- In Orange is the true crux of the matter.

In short, we don't use the dynamic dispatch mechanisms in Intel compiler. Never have. So any fix they apply to this, which I hope they do, while it would reduce our build count, and probably save a lot of work for which the energy could be directed elsewhere, it won't directly influence the speed of our builds on any brand of CPU.

Optimizing software in C++
An optimization guide for Windows, Linux and Mac
platforms
By Agner Fog. Copenhagen University College of Engineering.
Copyright © 2009. Last updated 2009-09-26.
pp.126-127

--- Quote ---The behavior of the Intel compiler puts the programmer in a bad dilemma. You may prefer
to use the Intel compiler because it has many advanced optimizing features available, and
you may want to use the well optimized Intel function libraries, but who would like to put a
tag on his program saying that it doesn’t work well on non-Intel machines?
Possible solutions to this problem are the following:
• Compile for a specific instruction set, e.g. SSE2. The compiler will produce the
optimal code for this instruction set and insert only the SSE2 version of most library
functions without CPU dispatching. Only a few library functions still have a CPU
dispatcher in this case. Test if the program will run on an AMD CPU. If an error
message is issued then it is necessary to replace the CPU detection function as
described below. The program will not be compatible with old microprocessors.
• Compile with option /QxO. This will include a special version of certain library
functions for AMD processors with SSE2. This performs reasonably on AMD
processors but not optimally. A program compiled with /QxO will not run on any
processor prior to SSE2.
• Make two or more versions of the most critical part of the code and compile them
separately with the appropriate instruction set specified. Insert an explicit CPU
dispatching in the code to call the version that fits the microprocessor it is running
on.
• Replace the CPU detection function of the Intel compiler with another function with
the same name. This method is described below.
• Make calls directly to the CPU-specific versions of the library functions. The CPUspecific
functions typically have names ending in .J for the SSE2 version and .A for
the generic version. The dot in the function names is not allowed in C++ so you need
to use objconv or a similar utility for adding an alias to these library entry names.

• The ideal solution would be an open source library of well-optimized functions with a
performance that can compete with Intel’s libraries and with support for multiple
platforms and multiple instruction sets. I have no knowledge of any such library.
The performance on non-Intel processors can be improved by using one or more of the
above methods if the most time-consuming part of the program contains automatic CPU
dispatching or memory-intensive functions such as memcpy, memmove, memset, or
mathematical functions such as pow, log, exp, sin, etc.

--- End quote ---

Gizbar:
I'm not a programmer, and the last thing I managed to create by myself was in 'Basic' over 20 years ago. But even I can see that 'Chipzilla' isn't playing fairly. Are they so worried that their CPU's aren't competitive enough to cope? Team Green beat them with the Athlon XP and then with the 64-bit CPU's (for the desktop, at least). Then came C2D's and C2Q's and then on to Core i7 etc... They've taken the performance crown back by a country mile.

I would have thought that they would have taken these 'personalisations' out by now, so that they could beat everybody fairly on the same code base. They've got enough financial muscle to get the R&D done on these new chips, to employ the programmers to write the compilers, and to be fair, everyone else is struggling to keep up.

They seem to be playing low'n'dirty just to be seen to be the best, when they truly could be, just by playing by the rules. They would make every programmer's life easier because they wouldn't have to optimise 2 sets of code for every program.

I mean, all the 'clone' Intel CPU's (and I know AMD seem to be the only ones left who can even produce any! And yes, I do remember Cyrix producing the first 166Mhz pentium clone!) have to conform to the standard design/microcode etc. to be compatible, so why shouldn't any program written perform to it's utmost, whatever CPU it's running on?

regards, Gizbar.

Navigation

[0] Message Index

[#] Next page

Go to full version