Forum > Windows
ASM of compiled source of certain functions - by the Intel Compiler
Vyper:
--- Quote from: BenHer on 04 Aug 2006, 07:30:41 pm ---I am posting 2 sections of v_chirp - 1 is part of my sse2 vectorized one, and the other is the current enhanced one (copied from your downloads section). There is one item they still haven't corrected which is a bunch of math is/should be hoisted out of the main loop. I haven't looked closely enough at the assembly example to determine if the intel CPP hoisted it or not.
--- End quote ---
Well i have exchanged that part and is compiling right now to see if it produces similar results and eventually faster..
See if it works or if it errors out..
EDIT: Well ofcourse it errored out. The value time = (1/sample_rate); was erroring out..
EDIT2: Bah, the compiler freaked me out erroring out .. Don't know really what to do.. Aborting it.. :-(
//Vyper
Simon:
BenHer,
I tried to modify your code snippet to compile, but it really is missing quite a lot of variable declarations.
Trying to divine what types you intended them to be is kind of time-consuming and hasn't produced anything that builds yet ;)
In any case, I'd be delighted if you could post a link to an archive of your sources, or the full source file this appears in (plus possible headers).
Thanks!
Simon.
BenHer:
Simon,
Its all at this website (the CVS section of the sourceforge site I mentioned in the first post) in various files. The subdirectory (/opt) is where I put all of my changed or original code. Note its a CVS so there might be serveral code versions for each file. Code from javalizard is for Mac.
The names of the files should be indicative of what they contain and I documented most stuff I believe.
The design philosophy is to have different versions of each function that can benefit from different PCs abilities. For any functiion to be enhanced, the original file name is changed to orig_<func name>. A function pointer is created that has the origiinal funcion's name. Different enhanced versions are made of each function. So, if I was just improving a function for better multi execution units of FPU I would begin that function name with opt_, for SSE2 I would begin with sse2_ .
So you might have an orig_v_ChirpData, sse_v_ChirpData, amd_v_ChirpData (3dNow) versions of that function.
For all code I used the compiler's built in mnemonics for SSE and SSE2 opcodes, but encased in macros of my own naming. They all start with s_ (for simd). The compiler often re-organizes the opcode placement in the finished code for what it feels would be optimal (sometimes is, sometimes not)...so my placement of code is sometimes designed to get the compiler to put the opcodes where I want them after optimize.
I wrote many macros of my own for frequently used sequences of instructions such as s_copyRtoI (which duplicates the R value on top of the I value), or s_negR (which XORs the R value(s) of a simd reg with -1 negating).
Question: What speedup (as a percentage) does your code get with P4 - non HT? Back then (21 mo ago) I was getting about 55%.
Simon:
Ah,
thanks for reminding me, I forgot you already posted that link. Your organization of optimized functions is what I was planning myself - for general and specific optimization. Your structure seems pretty logical, and the opt/ subdir is exactly what I wanted to do.
So anyway, in the future I'll be emulating that structure once I figure out how exactly to do it. In addition to SSE1/2/3 specific optimizations even core-specific ones could be implemented (like Michael did in his hand-coded inline assembly that seems to work on P-D 8xx and later machines only).
When I find some time, I'll try and incorporate the sse2 chirpdata function as a start.
As for speedup, it all depends on how you calculate it. Also, remember enhanced already incorporates a lot of caching that did not exist in the standard apps back then.
Anyway, you may find the comparison tables useful. They don't have recent compilate results, but those aren't much more than 2-3% quicker at most.
Regards,
Simon.
Josef W. Segur:
--- Quote from: BenHer on 06 Aug 2006, 07:39:41 pm ---Question: What speedup (as a percentage) does your code get with P4 - non HT? Back then (21 mo ago) I was getting about 55%.
--- End quote ---
On my Willamette P4 1.6 GHz the time reduction is about 60%, but that's atypical. I'd say that 45 to 50 percent would be the comparable figure.
In case you didn't know, I'll note that Eric Korpela switched to DevC++/MinGW for the Windows builds starting with the 5.10 version. He'd been trying for some time to do that, when he succeeded those gcc builds were somewhat faster than Visual C++ on his Windows test systems.
Joe
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version