+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: ASM of compiled source of certain functions - by the Intel Compiler  (Read 29537 times)

Offline Simon

  • Ni!
  • Knight who says 'Ni!'
  • *****
  • Posts: 1045
    • Is it a bird? Is it a plane? No...its-the.net!
Using the complete sse2_v_chirpdata function, analyzeFuncs.cpp compiles fine for me.
So next up is a quick test run vs. my own SSE2-optimized build without this edit :)

Simon.

<edit>seems I posted too soon, it didn't finish linking. Needs some more work to get it to produce a valid executable.</edit>
« Last Edit: 06 Aug 2006, 10:10:31 pm by Simon »

BenHer

  • Guest
Uhh...Simon

If you grabbed the sse2_v_chirpdata from the sse2_opt.cpp, then youve gotten "Evandro Menezes" version.  He did join the sourceforge project and was an authorized submitter so those were his latest versions.

My latest version was the sse_ v_chirpdata version (faster than his if I recall).

I just read it now, it doesn't include Tetsuji's sin/cos tables or any caching, so it will probably be slower.

Offline Simon

  • Ni!
  • Knight who says 'Ni!'
  • *****
  • Posts: 1045
    • Is it a bird? Is it a plane? No...its-the.net!
Lol :)

Oops...I was wondering where the caching was, too...
So anyway, it helps being less tired than I was when I tried it.

Will try again with the file you pointed out.
Simon.

<edit>It's still a bit tough to integrate your function as it uses different variable types and a different number of arguments. Enhanced by default uses this:
Code: [Select]
extern int v_ChirpData(
    sah_complex * cx_DataArray,
    sah_complex *  cx_ChirpDataArray,
    int ChirpRateInd,
    double ChirpRate,
    int  ul_NumDataPoints,
    double sample_rate
  );

Yours uses this:
Code: [Select]
extern int v_ChirpData(
    float * fp_DataArray,
    float *  fp_ChirpDataArray,
    float f_ChirpRate,
    int  ul_NumDataPoints,
    double sample_rate
  );

Which is giving me all sorts of trouble about incompatible arguments. So for now, I'm going to put it in the "to do" drawer unless you want to jump in and incorporate it yourself (or maybe someone with more C++ skills than me does the same).</edit>
« Last Edit: 07 Aug 2006, 09:56:09 am by Simon »

BenHer

  • Guest
To incorporate the cache features into my code would take a little work...will check it out.

To verify it I would, of course, have to do all those things I mentioned in earlier post ;)

Offline korpela

  • Volunteer Developer
  • Knight o' The Realm
  • *****
  • Posts: 53
Hi Ben,

Sorry to be replying to an old thread. Just getting around to looking at this stuff.  Somehow I missed your checkin of the vectorization stuff at sourceforge.  I thought I was on the mailing list for checkins.  Apparently not....

Looks like you and Alex have been busy.   Don't know if you've seen more recent versions of the source that check speeds of at least some functions and use the fastest.  (in the client/vector directory)  Right now it justs tests GetPowerSpectrum, ChirpData, Transpose, and BaselineSmooth.  (Baseline smooth should be removed since it really only gets called once.)

I'd like to extend this to more functions (gaussfit, pulse_find), but the problem is that those functions might generate output while being tested.  We'd need to modify them to either suppress the output or compartmentalize them so the tested routines don't include the output.  At any rate if you can any of your routines you want added into the new format, please do so (you can use analyzeFuncs_sse.cpp and analyzeFuncs_altivec.cpp as guides. 

I'm also adding functions hostinfo_have_altivec(), hostinfo_have_sse(), etc to the boinc api.  Unfortunately, as always I'm swamped with other work.  If there are other threads that I should be looking at, let me know.

Eric

Offline korpela

  • Volunteer Developer
  • Knight o' The Realm
  • *****
  • Posts: 53
I also forgot to mention that you should feel free to create analyzeFuncs_mmx.cpp, analyzeFunct_3dnow.cpp, analyzeFuncs_sse2.cpp, analyzeFuncs_sse3.cpp, and whatever else you feel like adding.

Eric

Offline Byron Leigh Hatch @ team Carl Sagan

  • Knave
  • Posts: 8
  • Tolerance , Peace and Best Wishes to All
    • My Computers


sorry to be off Topic

Hi Eric

Just wanted to say Hello and thank you and your colleagues for SETI@home

Best Wishes
Byron
Carl Sagan Wrote 
''When johannes Kepler found his long-cherished belief did not agree with the most precise observation, he accepted the uncomfortable fact.  He preferred the hard truth  to his dearest illusions , that is the heart of science"

Offline Simon

  • Ni!
  • Knight who says 'Ni!'
  • *****
  • Posts: 1045
    • Is it a bird? Is it a plane? No...its-the.net!
Hi Eric,

your access level is bumped. You should now see quite a bit more material to peruse, especially the pre-release boards.
Thanks for joining us here!

Regards,
Simon.

BenHer

  • Guest
Hi Eric,

Yea I had a look at your 5.17 source for the compare routine.

My source has been fully posted here on this board at this thread.

I have my own function speed testing routine, but it is generic and requires very little code for each new function to be tested or extra code to test it.  Its inside the Optimizer/benchmark.cpp source.
Code: [Select]
struct bench_lst
    {
    f_token     token;
    _simd_type  simd_used;
    bool        tested;
    char        *name;
    void        *theFunc;
    }   bench_list[] =
    {
    PWRSPEC,    _fpu,   true,   "GetPowerSpectrum--",   &std_v_GetPowerSpectrum,
    F_SUM,      _fpu,   false,  "unroll4",      &opt_f_sum,
    CHI_SQ,     _fpu,   false,  "hoisted+abs(", &opt_f_GetChiSq,

#if defined( __SSE__ )
    CHIRP,      _sse,   true,   "sse_chirp",    &sse_ChirpData,
    SUM2_TBL,   _sse,   false,  "hand_sse",     &sse_tableSum2,
    F_SUM,      _sse,   false,  "hand_sse",     &sse_f_sum,

    SUM2_TBL,   _3DNow, true,   "hand_3Dnow",   &amd_tableSum2,
    F_SUM,      _3DNow, true,   "hand_3Dno",    &amd_f_sum,

    CHIRP,      _sse2,  false,  "sse2_chirp",   &sse2_ChirpData,
#elif defined( __ALTIVEC__ )
    SUM2_TBL,   _Altivec, true,   "hand_altv",   &altv_tableSum2,
    F_SUM,      _Altivec, true,   "hand_altv",    &altv_f_sum,

    CHIRP,      _Altivec,  false,  "altv_chirp",   &altv_ChirpData,
#endif


The advantage of this combined table format is that all functions for a given SIMD, on say powerpc vs  Intel can be all grouped together and conditionally compiled in one batch.

I have written a full CPUID class which has been tested with virtually all CPUs out there...99% correct.  I made some Linux code for it and Hans Dorn has made all the necessary corrections, compiled  and run it on Linux with ICC and GCC.  You may recall I wrote one a while back also on sourceforge.

This can easily be incorporated into BOINC and get rid of all those O/S named CPUs which are quite variable and annoying.  I will be modifying it to use an external text file for its CPU defiinitions.  This way if new CPUs are released it is easy to just update the text file and have boinc download it.  Add an MD5 sum or some such to reduce tampering.

We have working function pointer replacements on optimized FPU, SSE for f_sum (summing of a table is used many places), the loops inside of find_pulse, v_chirpdata, getpowespectrum, f_getChiSq, f_GetPeak.  We have some additional functions for  SSE2, and SSE3 where appropriate (SSE3 is borrowed from Alex).

As I said over on 'beta' boards, I've figured a way to do the 'transpose' function without using a separate table or even a separate function (I do it inside of getpowerspectrum).  And I've figured how to reduce the impact of all those nasty cache misses...non-temporal store instructions.

Non-temporals can actually be used to speed up several functions but I haven't gotten around to it.  Some functions make use of the cached data, but many don't, and for these non-temp is the way to go.





Offline korpela

  • Volunteer Developer
  • Knight o' The Realm
  • *****
  • Posts: 53
Hi Ben,

By "source has been fully posted here on this board at this thread" do you mean this thread, or is a link missing.  If this thread means this thread, do you mean the stuff at the sourceforge site?

Eric 

BenHer

  • Guest
Sorry,

The developers on this board discuss pre-release window info on this forum and Unix Info on this forum.

The source is posted there.  Meant to attach a link to that previous post...the link on this line goes to the latest source.
« Last Edit: 12 Oct 2006, 03:52:19 pm by BenHer »

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 86
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 181
Total: 181
Powered by EzPortal