Forum > Windows

ASM of compiled source of certain functions - by the Intel Compiler

<< < (5/6) > >>

korpela:
I also forgot to mention that you should feel free to create analyzeFuncs_mmx.cpp, analyzeFunct_3dnow.cpp, analyzeFuncs_sse2.cpp, analyzeFuncs_sse3.cpp, and whatever else you feel like adding.

Eric

Byron Leigh Hatch @ team Carl Sagan:


sorry to be off Topic

Hi Eric

Just wanted to say Hello and thank you and your colleagues for SETI@home

Best Wishes
Byron

Simon:
Hi Eric,

your access level is bumped. You should now see quite a bit more material to peruse, especially the pre-release boards.
Thanks for joining us here!

Regards,
Simon.

BenHer:
Hi Eric,

Yea I had a look at your 5.17 source for the compare routine.

My source has been fully posted here on this board at this thread.

I have my own function speed testing routine, but it is generic and requires very little code for each new function to be tested or extra code to test it.  Its inside the Optimizer/benchmark.cpp source.

--- Code: ---struct bench_lst
    {
    f_token     token;
    _simd_type  simd_used;
    bool        tested;
    char        *name;
    void        *theFunc;
    }   bench_list[] =
    {
    PWRSPEC,    _fpu,   true,   "GetPowerSpectrum--",   &std_v_GetPowerSpectrum,
    F_SUM,      _fpu,   false,  "unroll4",      &opt_f_sum,
    CHI_SQ,     _fpu,   false,  "hoisted+abs(", &opt_f_GetChiSq,

#if defined( __SSE__ )
    CHIRP,      _sse,   true,   "sse_chirp",    &sse_ChirpData,
    SUM2_TBL,   _sse,   false,  "hand_sse",     &sse_tableSum2,
    F_SUM,      _sse,   false,  "hand_sse",     &sse_f_sum,

    SUM2_TBL,   _3DNow, true,   "hand_3Dnow",   &amd_tableSum2,
    F_SUM,      _3DNow, true,   "hand_3Dno",    &amd_f_sum,

    CHIRP,      _sse2,  false,  "sse2_chirp",   &sse2_ChirpData,
#elif defined( __ALTIVEC__ )
    SUM2_TBL,   _Altivec, true,   "hand_altv",   &altv_tableSum2,
    F_SUM,      _Altivec, true,   "hand_altv",    &altv_f_sum,

    CHIRP,      _Altivec,  false,  "altv_chirp",   &altv_ChirpData,
#endif


--- End code ---

The advantage of this combined table format is that all functions for a given SIMD, on say powerpc vs  Intel can be all grouped together and conditionally compiled in one batch.

I have written a full CPUID class which has been tested with virtually all CPUs out there...99% correct.  I made some Linux code for it and Hans Dorn has made all the necessary corrections, compiled  and run it on Linux with ICC and GCC.  You may recall I wrote one a while back also on sourceforge.

This can easily be incorporated into BOINC and get rid of all those O/S named CPUs which are quite variable and annoying.  I will be modifying it to use an external text file for its CPU defiinitions.  This way if new CPUs are released it is easy to just update the text file and have boinc download it.  Add an MD5 sum or some such to reduce tampering.

We have working function pointer replacements on optimized FPU, SSE for f_sum (summing of a table is used many places), the loops inside of find_pulse, v_chirpdata, getpowespectrum, f_getChiSq, f_GetPeak.  We have some additional functions for  SSE2, and SSE3 where appropriate (SSE3 is borrowed from Alex).

As I said over on 'beta' boards, I've figured a way to do the 'transpose' function without using a separate table or even a separate function (I do it inside of getpowerspectrum).  And I've figured how to reduce the impact of all those nasty cache misses...non-temporal store instructions.

Non-temporals can actually be used to speed up several functions but I haven't gotten around to it.  Some functions make use of the cached data, but many don't, and for these non-temp is the way to go.




korpela:
Hi Ben,

By "source has been fully posted here on this board at this thread" do you mean this thread, or is a link missing.  If this thread means this thread, do you mean the stuff at the sourceforge site?

Eric 

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version