+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: Current Profile Analysis and points to optimze  (Read 29706 times)

BenHer

  • Guest
Re: Current Profile Analysis and points to optimze
« Reply #15 on: 15 Aug 2006, 04:36:21 pm »
Hmm...just checked out the older version of the seti source by Alex Kan & Rick Berry optimized mac source code from their website http://writhe.org.uk/seti@home/Note: the lastest modified file was 9-15-2005 so it was pre "enhanced" I'm guessing...

They not only optimized existing functions they cleaned up formatting, added documentation, re-wrote entire sections and changed the way computations were performed (chirping)...so apparently they have reviewed some of the math.  ::)

They also commented many undocumented routines inside the source, so they seem to have worked through what Eric K. et al were trying to achieve with many of their functions.

Regarding an earlier question..."can some students be tasked with reviewing the math..."  Alex is apparently a U.C. Berkeley engineering student.

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: Current Profile Analysis and points to optimze
« Reply #16 on: 19 Aug 2006, 04:38:23 pm »
Hmm...just checked out the older version of the seti source by Alex Kan & Rick Berry optimized mac source code from their website http://writhe.org.uk/seti@home/Note: the lastest modified file was 9-15-2005 so it was pre "enhanced" I'm guessing...

They not only optimized existing functions they cleaned up formatting, added documentation, re-wrote entire sections and changed the way computations were performed (chirping)...so apparently they have reviewed some of the math.  ::)

They also commented many undocumented routines inside the source, so they seem to have worked through what Eric K. et al were trying to achieve with many of their functions.

I was impressed, too. Later source can be found at http://tbp.berkeley.edu/~alexkan/seti/. I'm wondering if I can restate some of the vectorized routines from the 6.1 source to compile with DevC++/MinGW. If I can get up to speed soon enough, I'll try to get at least some x86 SIMD variants into 5.17+. OTOH, you could probably do that much more efficiently than I...

Quote
Regarding an earlier question..."can some students be tasked with reviewing the math..."  Alex is apparently a U.C. Berkeley engineering student.

Graduate, now. I was reading the Macnn forum posts related to those optimized S@H apps, that was also quite interesting.
                                                                       Joe

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: Current Profile Analysis and points to optimze
« Reply #17 on: 19 Aug 2006, 04:49:35 pm »
Figured out how to tell ICC to super optimize v_getPowerSpectrum...hand coding could hardly improve on it.

Is that ippsPowerSpectr_32fc() ?
                                                                     Joe

chboss

  • Guest
Re: Current Profile Analysis and points to optimze
« Reply #18 on: 19 Aug 2006, 05:15:40 pm »
Yes, Alex's Mac client is impressive....

MacMini G4 1.25GHz  RAC 219
Athlon XP 2600+ (Linux) RAC 212

If some of their improvements can be brought over to the Linux version it would certainly be helpful.


BenHer

  • Guest
Re: Current Profile Analysis and points to optimze
« Reply #19 on: 20 Aug 2006, 01:43:04 am »
I've gotten about a 20% improvement so far vs the Simon's SSE3 Athlon exe.

SIMD is only a part of it...many of the bottlenecks are simple programming optimization.

1st identify what is slow...2nd identify why...fix.   Several have been float/int conversions that aren't needed...others if-then's inside of loops...big no no...another was an  'abs( )' inside a loop...big speed up from that.

I've also incorporated Alex's power spectrum re-ordered table from 5.17, but without using another table...its all inside of the original powerspectrum table.

Have to verify it all vs the test WUs now...am only testing against short WU 2 vs release-515 for general development.  WU2 verifies strongly...time on my Athlon 64 3800 X2 - using only core #2 -   537 seconds

In my latest...find_pulse (and i'ts new sub functions) uses 19.02% of WU time...and Intel's FFT uses 17.92%...the cache misses for Pot functions are down to 15.7%.

Might be able to squeeze another 5-10% out...harder now though.

Quote
Is that ippsPowerSpectr_32fc() ?    - Joe
No...I just let Intel compiler vectorize the loop, but I gave it better hints that it could be vectorized.


Simon,
Suggest you check out the program AutoIt3 at http://www.autoitscript.com/autoit3/  for automating the testing...I'm going to write a short one myself...time seconds...etc.


Offline Simon

  • Ni!
  • Knight who says 'Ni!'
  • *****
  • Posts: 1045
    • Is it a bird? Is it a plane? No...its-the.net!
Re: Current Profile Analysis and points to optimze
« Reply #20 on: 20 Aug 2006, 09:14:52 am »
Hi Ben,

Auto-It is pretty impressive stuff. Even more, so, the 20% you said you got out of the 5.15 sources :) Any chance of getting an archive of your changes or a full source snapshot anytime soon? If I seem eager, I am ;)

Also, do those 20% translate to Intel systems too or is it AMD-only?

About telling ICC to vectorize things - are you doing that with "#pragma vector aligned" or "#pragma vector always"?

Regards,
Simon.
« Last Edit: 20 Aug 2006, 09:17:29 am by Simon »

BenHer

  • Guest
Re: Current Profile Analysis and points to optimze
« Reply #21 on: 20 Aug 2006, 01:52:28 pm »
Simon,

I use this code to tell it what pointers point to aligned buffers (in powerspectrum its both)
Code: [Select]
#ifdef __INTEL_COMPILER
#define ALIGNED_YES( buffer_ ) __assume_aligned( buffer_, SIMD_ALIGN );
#else
#define ALIGNED_YES( buffer_ )
#endif

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: Current Profile Analysis and points to optimze
« Reply #22 on: 21 Aug 2006, 12:04:12 am »
For approximate comparison, I built 5.17 on DevC++/MinGW with profiling enabled. I had to drop optimization to O2 because the profiling code won't work with -fomit_frame_pointer. So FWIW here are some values from running WU2 with chirp limits 10 and 25, about 3 hours 41 minutes on my 1.4 GHz Pentium-m:

37.90% find_pulse()
11.09% v_Transpose4()
 6.04% v_ChirpData()
 5.28% CalcTrigArray()
 5.24% GaussFit()
 5.22% f_GetChiSq()
 4.71% f_GetTrueMean()
 3.61% FindSpikes()
 3.29% f_GetPeak()
 2.57% lcgf()
 2.51% find_triplets()
 2.36% v_GetPowerSpectrum()
 1.95% float_to_uchar()
 1.62% t_funct()
 1.53% GetFixedPoT()
 1.27% analyze_pot()
                                                                   Joe

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 6
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 238
Total: 238
Powered by EzPortal