THE US Federal Trade Commission (FTC) apparently is interested in the fact that Intel's compiler deliberately cripples performance for non-Intel processors such as those made by AMD and VIA.Writing in his blog, programming expert Agner Fog said that it appears that Chipzilla's compiler can produce different versions of pieces of code, with each version being optimised for a specific processor and/or instruction set. The system detects which CPU it's running on and chooses the optimal code path accordingly.But it also checks what instruction sets are supported by the CPU and it also checks the vendor ID string. If the string says 'GenuineIntel' then it uses the optimal code path. If the CPU is not from Intel then, in most cases, it will use the slowest version of the code it can find.While this is known, few Intel compiler users actually seem to know about it. Chipzilla does not say that the compiler is Intel-specific, either.Fog said that if more programmers knew this fact they would probably use another compiler as everyone wants their code to run just as well on AMD's processors as on Intel's.Some benchmarking programs are affected by this, up to a point where benchmark results can differ greatly depending on how a processor identifies itself.It seems that in the fine print of the AMD settlement Intel has agreed to fix this problem. But apparently the FTC will still be interested because VIA could still be disadvantaged.
The behavior of the Intel compiler puts the programmer in a bad dilemma. You may preferto use the Intel compiler because it has many advanced optimizing features available, andyou may want to use the well optimized Intel function libraries, but who would like to put atag on his program saying that it doesn’t work well on non-Intel machines?Possible solutions to this problem are the following:• Compile for a specific instruction set, e.g. SSE2. The compiler will produce theoptimal code for this instruction set and insert only the SSE2 version of most libraryfunctions without CPU dispatching. Only a few library functions still have a CPUdispatcher in this case. Test if the program will run on an AMD CPU. If an errormessage is issued then it is necessary to replace the CPU detection function asdescribed below. The program will not be compatible with old microprocessors.• Compile with option /QxO. This will include a special version of certain libraryfunctions for AMD processors with SSE2. This performs reasonably on AMDprocessors but not optimally. A program compiled with /QxO will not run on anyprocessor prior to SSE2.• Make two or more versions of the most critical part of the code and compile themseparately with the appropriate instruction set specified. Insert an explicit CPUdispatching in the code to call the version that fits the microprocessor it is runningon.• Replace the CPU detection function of the Intel compiler with another function withthe same name. This method is described below.• Make calls directly to the CPU-specific versions of the library functions. The CPUspecificfunctions typically have names ending in .J for the SSE2 version and .A forthe generic version. The dot in the function names is not allowed in C++ so you needto use objconv or a similar utility for adding an alias to these library entry names.• The ideal solution would be an open source library of well-optimized functions with aperformance that can compete with Intel’s libraries and with support for multipleplatforms and multiple instruction sets. I have no knowledge of any such library.The performance on non-Intel processors can be improved by using one or more of theabove methods if the most time-consuming part of the program contains automatic CPUdispatching or memory-intensive functions such as memcpy, memmove, memset, ormathematical functions such as pow, log, exp, sin, etc.
have to conform to the standard design/microcode etc. to be compatible, so why shouldn't any program written perform to it's utmost, whatever CPU it's running on?regards, Gizbar.
Cause for current CPUs x86 instruction set is pretty high-level. One can implement same x86 IA very differently. It says nothing about instruction reorder, for example, dividing signle x86 op to many micro-ops and then merging them and so on and so forth. x86 can be implemented differently and still be compatible. And for these different implementations instruction arrangement and choice of instructions themselves does matter a lot.For example, Athlon x64 supports SSE3. It can perform those instructions. But in so inefficient way that it turned out that SSE2-only build goes faster on this chip.
For completion, and interest, here's the mentioned workarounds described by Agner in his optimization manuals:- In Green are the approaches we already use for multibeam, and the sole ICC compiled component library of astropulse (fftw SSE, release Astropulse was always an MSVC build) ... These approaches require multiple platform specific builds.- In Yellow, are what we could do to hopefully bring the build count back down- In Orange is the true crux of the matter.In short, we don't use the dynamic dispatch mechanisms in Intel compiler. Never have. So any fix they apply to this, which I hope they do, while it would reduce our build count, and probably save a lot of work for which the energy could be directed elsewhere, it won't directly influence the speed of our builds on any brand of CPU.Optimizing software in C++An optimization guide for Windows, Linux and MacplatformsBy Agner Fog. Copenhagen University College of Engineering.Copyright © 2009. Last updated 2009-09-26.pp.126-127QuoteThe behavior of the Intel compiler puts the programmer in a bad dilemma. You may preferto use the Intel compiler because it has many advanced optimizing features available, andyou may want to use the well optimized Intel function libraries, but who would like to put atag on his program saying that it doesn’t work well on non-Intel machines?Possible solutions to this problem are the following:• Compile for a specific instruction set, e.g. SSE2. The compiler will produce theoptimal code for this instruction set and insert only the SSE2 version of most libraryfunctions without CPU dispatching. Only a few library functions still have a CPUdispatcher in this case. Test if the program will run on an AMD CPU. If an errormessage is issued then it is necessary to replace the CPU detection function asdescribed below. The program will not be compatible with old microprocessors.• Compile with option /QxO. This will include a special version of certain libraryfunctions for AMD processors with SSE2. This performs reasonably on AMDprocessors but not optimally. A program compiled with /QxO will not run on anyprocessor prior to SSE2.• Make two or more versions of the most critical part of the code and compile themseparately with the appropriate instruction set specified. Insert an explicit CPUdispatching in the code to call the version that fits the microprocessor it is runningon.• Replace the CPU detection function of the Intel compiler with another function withthe same name. This pmi certifications method is described below.• Make calls directly to the CPU-specific versions of the library functions. The CPUspecificfunctions typically have names ending in .J for the SSE2 version and .A forthe generic version. The dot in the function names is not allowed in C++ so you needto use objconv or a similar utility for adding an alias to these library entry oracle certification names.• The ideal solution would be an open source library of well-optimized functions with aperformance that can compete with Intel’s libraries and with support for multipleplatforms and multiple instruction sets. I have no knowledge of any such library.The performance on non-Intel microsoft certification processors can be improved by using one or more of theabove methods if the most time-consuming part of the program contains automatic CPUdispatching or memory-intensive functions such as memcpy, memmove, memset, ormathematical functions such as pow, log, exp, sin, etc.
The behavior of the Intel compiler puts the programmer in a bad dilemma. You may preferto use the Intel compiler because it has many advanced optimizing features available, andyou may want to use the well optimized Intel function libraries, but who would like to put atag on his program saying that it doesn’t work well on non-Intel machines?Possible solutions to this problem are the following:• Compile for a specific instruction set, e.g. SSE2. The compiler will produce theoptimal code for this instruction set and insert only the SSE2 version of most libraryfunctions without CPU dispatching. Only a few library functions still have a CPUdispatcher in this case. Test if the program will run on an AMD CPU. If an errormessage is issued then it is necessary to replace the CPU detection function asdescribed below. The program will not be compatible with old microprocessors.• Compile with option /QxO. This will include a special version of certain libraryfunctions for AMD processors with SSE2. This performs reasonably on AMDprocessors but not optimally. A program compiled with /QxO will not run on anyprocessor prior to SSE2.• Make two or more versions of the most critical part of the code and compile themseparately with the appropriate instruction set specified. Insert an explicit CPUdispatching in the code to call the version that fits the microprocessor it is runningon.• Replace the CPU detection function of the Intel compiler with another function withthe same name. This pmi certifications method is described below.• Make calls directly to the CPU-specific versions of the library functions. The CPUspecificfunctions typically have names ending in .J for the SSE2 version and .A forthe generic version. The dot in the function names is not allowed in C++ so you needto use objconv or a similar utility for adding an alias to these library entry oracle certification names.• The ideal solution would be an open source library of well-optimized functions with aperformance that can compete with Intel’s libraries and with support for multipleplatforms and multiple instruction sets. I have no knowledge of any such library.The performance on non-Intel microsoft certification processors can be improved by using one or more of theabove methods if the most time-consuming part of the program contains automatic CPUdispatching or memory-intensive functions such as memcpy, memmove, memset, ormathematical functions such as pow, log, exp, sin, etc.