Forum > Windows
First Time build try, 2.2B [&2.3S9] science App
Jason G:
thanks, Haven't really started yet though! just exploring the sources etc.. I'm using Intel [ ICC & IPP] , on VS 2005 pro, which seems to require a tweak here and there to compile. It seems to be easier to build this later one, as the /Qx & SSEn configuration seems to be set up to apply itself to each project.... much better.
Jason G:
Okay, some crude attempts at profiling [2.3S9] with the dummy workunits seem to be leading me straight to the already heavily vectorised SSE code (several sum and transposie functions). They look damn good at asm level.
I am a bit surprised that a lot of time (about 11% on my system) seems to be spent in intel's implementation of memcpy. Haven't worked out why yet. I'm pretty sure I've seen better vectorised version of that, but can't be sure...there seems to be a littlle something extra in that function...on a hunch I really think msvc's version might possibly be faster [due to that something extra].
The pulse finding and chirping functions themselves showed much lower down on the list as far as percent of total execution is concerned.
I guess this might be because I'm using dummy test WUs. at some stage I think I'll have to test with a few copies of real ones out of my boinc cache as I may be being led up the garden path :D.
Jason
Josef W. Segur:
--- Quote from: j_groothu on 03 Oct 2007, 04:30:07 pm ---Okay, some crude attempts at profiling [2.3S9] with the dummy workunits seem to be leading me straight to the already heavily vectorised SSE code (several sum and transposie functions). They look damn good at asm level.
I am a bit surprised that a lot of time (about 11% on my system) seems to be spent in intel's implementation of memcpy. Haven't worked out why yet. I'm pretty sure I've seen better vectorised version of that, but can't be sure...there seems to be a littlle something extra in that function...on a hunch I really think msvc's version might possibly be faster [due to that something extra].
--- End quote ---
Ben Herndon started (but didn't complete) an effort to add memcpy routines to the set of tested functions. It might be a useful addition.
--- Quote ---The pulse finding and chirping functions themselves showed much lower down on the list as far as percent of total execution is concerned.
I guess this might be because I'm using dummy test WUs. at some stage I think I'll have to test with a few copies of real ones out of my boinc cache as I may be being led up the garden path :D.
Jason
--- End quote ---
The shortened test WUs do emphasize what's done during startup and give more weight to the zero chirp testing. I agree that profiling to determine the hot spots would be more accurate with full WUs, but getting a spread of angle ranges is important too.
Joe
Jason G:
Hmm thanks again Joe. After I collect some more data (to see if memcpy is really as hot as it looks ,,, well at least on my old p4 beast) I'll see if i still have some of my old memcpy versions on backups, and try to figure out some comparisons. I vaguely remember there was an MMX version worked out faster than either regular or SSE/SSE2 versions for data blocks from 1mb to 200mb. If I can find it I'll try and figure out if it could be applicable.
_heinz:
Hi Jason,
if you find a very fast memcopy for mmx it would be great, have a diskless (2GB Compactflash) dual 200MMX crunching as testmachine.
here you can see it.
regards heinz
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version