Forum > Windows
ASM of compiled source of certain functions - by the Intel Compiler
korpela:
I also forgot to mention that you should feel free to create analyzeFuncs_mmx.cpp, analyzeFunct_3dnow.cpp, analyzeFuncs_sse2.cpp, analyzeFuncs_sse3.cpp, and whatever else you feel like adding.
Eric
Byron Leigh Hatch @ team Carl Sagan:
sorry to be off Topic
Hi Eric
Just wanted to say Hello and thank you and your colleagues for SETI@home
Best Wishes
Byron
Simon:
Hi Eric,
your access level is bumped. You should now see quite a bit more material to peruse, especially the pre-release boards.
Thanks for joining us here!
Regards,
Simon.
BenHer:
Hi Eric,
Yea I had a look at your 5.17 source for the compare routine.
My source has been fully posted here on this board at this thread.
I have my own function speed testing routine, but it is generic and requires very little code for each new function to be tested or extra code to test it. Its inside the Optimizer/benchmark.cpp source.
--- Code: ---struct bench_lst
{
f_token token;
_simd_type simd_used;
bool tested;
char *name;
void *theFunc;
} bench_list[] =
{
PWRSPEC, _fpu, true, "GetPowerSpectrum--", &std_v_GetPowerSpectrum,
F_SUM, _fpu, false, "unroll4", &opt_f_sum,
CHI_SQ, _fpu, false, "hoisted+abs(", &opt_f_GetChiSq,
#if defined( __SSE__ )
CHIRP, _sse, true, "sse_chirp", &sse_ChirpData,
SUM2_TBL, _sse, false, "hand_sse", &sse_tableSum2,
F_SUM, _sse, false, "hand_sse", &sse_f_sum,
SUM2_TBL, _3DNow, true, "hand_3Dnow", &amd_tableSum2,
F_SUM, _3DNow, true, "hand_3Dno", &amd_f_sum,
CHIRP, _sse2, false, "sse2_chirp", &sse2_ChirpData,
#elif defined( __ALTIVEC__ )
SUM2_TBL, _Altivec, true, "hand_altv", &altv_tableSum2,
F_SUM, _Altivec, true, "hand_altv", &altv_f_sum,
CHIRP, _Altivec, false, "altv_chirp", &altv_ChirpData,
#endif
--- End code ---
The advantage of this combined table format is that all functions for a given SIMD, on say powerpc vs Intel can be all grouped together and conditionally compiled in one batch.
I have written a full CPUID class which has been tested with virtually all CPUs out there...99% correct. I made some Linux code for it and Hans Dorn has made all the necessary corrections, compiled and run it on Linux with ICC and GCC. You may recall I wrote one a while back also on sourceforge.
This can easily be incorporated into BOINC and get rid of all those O/S named CPUs which are quite variable and annoying. I will be modifying it to use an external text file for its CPU defiinitions. This way if new CPUs are released it is easy to just update the text file and have boinc download it. Add an MD5 sum or some such to reduce tampering.
We have working function pointer replacements on optimized FPU, SSE for f_sum (summing of a table is used many places), the loops inside of find_pulse, v_chirpdata, getpowespectrum, f_getChiSq, f_GetPeak. We have some additional functions for SSE2, and SSE3 where appropriate (SSE3 is borrowed from Alex).
As I said over on 'beta' boards, I've figured a way to do the 'transpose' function without using a separate table or even a separate function (I do it inside of getpowerspectrum). And I've figured how to reduce the impact of all those nasty cache misses...non-temporal store instructions.
Non-temporals can actually be used to speed up several functions but I haven't gotten around to it. Some functions make use of the cached data, but many don't, and for these non-temp is the way to go.
korpela:
Hi Ben,
By "source has been fully posted here on this board at this thread" do you mean this thread, or is a link missing. If this thread means this thread, do you mean the stuff at the sourceforge site?
Eric
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version