Forum > Windows
Current Profile Analysis and points to optimze
BenHer:
Hmm...just checked out the older version of the seti source by Alex Kan & Rick Berry optimized mac source code from their website http://writhe.org.uk/seti@home/. Note: the lastest modified file was 9-15-2005 so it was pre "enhanced" I'm guessing...
They not only optimized existing functions they cleaned up formatting, added documentation, re-wrote entire sections and changed the way computations were performed (chirping)...so apparently they have reviewed some of the math. ::)
They also commented many undocumented routines inside the source, so they seem to have worked through what Eric K. et al were trying to achieve with many of their functions.
Regarding an earlier question..."can some students be tasked with reviewing the math..." Alex is apparently a U.C. Berkeley engineering student.
Josef W. Segur:
--- Quote from: BenHer on 15 Aug 2006, 04:36:21 pm ---Hmm...just checked out the older version of the seti source by Alex Kan & Rick Berry optimized mac source code from their website http://writhe.org.uk/seti@home/. Note: the lastest modified file was 9-15-2005 so it was pre "enhanced" I'm guessing...
They not only optimized existing functions they cleaned up formatting, added documentation, re-wrote entire sections and changed the way computations were performed (chirping)...so apparently they have reviewed some of the math. ::)
They also commented many undocumented routines inside the source, so they seem to have worked through what Eric K. et al were trying to achieve with many of their functions.
--- End quote ---
I was impressed, too. Later source can be found at http://tbp.berkeley.edu/~alexkan/seti/. I'm wondering if I can restate some of the vectorized routines from the 6.1 source to compile with DevC++/MinGW. If I can get up to speed soon enough, I'll try to get at least some x86 SIMD variants into 5.17+. OTOH, you could probably do that much more efficiently than I...
--- Quote ---Regarding an earlier question..."can some students be tasked with reviewing the math..." Alex is apparently a U.C. Berkeley engineering student.
--- End quote ---
Graduate, now. I was reading the Macnn forum posts related to those optimized S@H apps, that was also quite interesting.
Joe
Josef W. Segur:
--- Quote from: BenHer on 15 Aug 2006, 02:32:53 pm ---Figured out how to tell ICC to super optimize v_getPowerSpectrum...hand coding could hardly improve on it.
--- End quote ---
Is that ippsPowerSpectr_32fc() ?
Joe
chboss:
Yes, Alex's Mac client is impressive....
MacMini G4 1.25GHz RAC 219
Athlon XP 2600+ (Linux) RAC 212
If some of their improvements can be brought over to the Linux version it would certainly be helpful.
BenHer:
I've gotten about a 20% improvement so far vs the Simon's SSE3 Athlon exe.
SIMD is only a part of it...many of the bottlenecks are simple programming optimization.
1st identify what is slow...2nd identify why...fix. Several have been float/int conversions that aren't needed...others if-then's inside of loops...big no no...another was an 'abs( )' inside a loop...big speed up from that.
I've also incorporated Alex's power spectrum re-ordered table from 5.17, but without using another table...its all inside of the original powerspectrum table.
Have to verify it all vs the test WUs now...am only testing against short WU 2 vs release-515 for general development. WU2 verifies strongly...time on my Athlon 64 3800 X2 - using only core #2 - 537 seconds
In my latest...find_pulse (and i'ts new sub functions) uses 19.02% of WU time...and Intel's FFT uses 17.92%...the cache misses for Pot functions are down to 15.7%.
Might be able to squeeze another 5-10% out...harder now though.
--- Quote ---Is that ippsPowerSpectr_32fc() ? - Joe
--- End quote ---
No...I just let Intel compiler vectorize the loop, but I gave it better hints that it could be vectorized.
Simon,
Suggest you check out the program AutoIt3 at http://www.autoitscript.com/autoit3/ for automating the testing...I'm going to write a short one myself...time seconds...etc.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version