Forum > Windows

sources with Orcas

<< < (8/10) > >>

Josef W. Segur:
It hasn't been mentioned lately that in 2004 Eric Korpela set up a setiboinc sourceforge project to encourage submission of optimized code. Ben Herndon participated in that, and his code in ../setiboinc/client/opt is where I got the original form of the GetPSO functions. I just made it into four separate versions to compare the prefetching. On some systems and some builds there have been clear differences particularly on my Pentium-M system and on Core 2 systems. On P4 or PD systems, OTOH, the differences have been small.

Francois Piednoel's PowerSpectrum code from the "Who? optimizations Part 2" NC thread may be even better. I hope he'll release source soon but that part would be fairly easy to fit into our codebase anyhow.
                                                     Joe

Jason G:
Thanks Joe,
    Though I did run maybe a couple of workunits when sah classic first started (I remember the news release on tv here in Oz... Incidently, could never rember the email I used then....), I just came back around last November....

 so really '2004' is indeed before my time... Thanks for the link, more reading is always handy.

Just opened the Pulse Timing project for the first time, Looks like I'll be occupied for the weekend trying to get a working build,  Looks Extremely handy for testing some of the things been worked on behind the scenes. 

Jason

_heinz:
Looking now to vectorize the most used functions of S@H.
Here I show you a modified FillTrigArray as it performs as part of chirpfft.cpp
Have fun   ;D
Heinz
---------------------------------------------------------------------------
PUBLIC   ?FillTrigArray@@YAXH@Z            ; FillTrigArray
; Function compile flags: /Ogtpy
;   COMDAT ?FillTrigArray@@YAXH@Z
_TEXT   SEGMENT
_k$ = 8                     ; size = 4
?FillTrigArray@@YAXH@Z PROC            ; FillTrigArray, COMDAT

; 731  :          CurrentTrig[k].Sin = ((CurrentTrig[k].Sin * TrigStep[k].Cos) + (CurrentTrig[k].Cos * TrigStep[k].Sin));

   mov   edx, DWORD PTR ?TrigStep@@3PAUSinCosArray@@A ; TrigStep
   mov   ecx, DWORD PTR ?CurrentTrig@@3PAUSinCosArray@@A ; CurrentTrig
   mov   eax, DWORD PTR _k$[esp-4]
   shl   eax, 4
   movsd   xmm1, QWORD PTR [eax+edx+8]
   mulsd   xmm1, QWORD PTR [eax+ecx]
   movsd   xmm0, QWORD PTR [eax+ecx+8]
   mulsd   xmm0, QWORD PTR [eax+edx]
   addsd   xmm0, xmm1
   movsd   QWORD PTR [eax+ecx], xmm0

; 732  :             CurrentTrig[k].Cos = ((CurrentTrig[k].Cos * TrigStep[k].Cos) - (CurrentTrig[k].Sin * TrigStep[k].Sin));

   mov   edx, DWORD PTR ?TrigStep@@3PAUSinCosArray@@A ; TrigStep
   movsd   xmm0, QWORD PTR [eax+edx+8]
   push   esi
   mov   esi, DWORD PTR ?CurrentTrig@@3PAUSinCosArray@@A ; CurrentTrig
   movsd   xmm1, QWORD PTR [eax+esi]
   mulsd   xmm0, QWORD PTR [eax+esi+8]
   mulsd   xmm1, QWORD PTR [eax+edx]
   lea   ecx, DWORD PTR [eax+esi+8]
   subsd   xmm0, xmm1
   movsd   QWORD PTR [ecx], xmm0
   pop   esi

; 733  :     }

   ret   0
?FillTrigArray@@YAXH@Z ENDP            ; FillTrigArray
_TEXT   ENDS

_heinz:
I.  Going parallel or how to cut the leek !
This morning I was in the kitchen to make a salad of leek. After washing the leafs I took one to cut it in  fit parts.
But how big is a fit part ? 1mm, 10mm, 100mm ?
I have a relative big leaf of 24 cm so we choose 10mm= 1cm as a fit part. Now we know I must cut it into 24 parts.
How todo that ?
1. we take the knife and cut it into 24 pieces one after the other. Wee need 24-1 = 23 cuts
We have stiil one tree to cut. This means sequential works.
or we do following-->
2. we cut the leaf in two eaqual parts(1 cut), then lay both parts parallel to each other and cut it. We need 12-1 = 11 cuts +1 extra cut from the first. Summary 12 cuts.
This means parallel work. We have 2 parallel trees to cut.The one extra cut and lay both parts parallel is the overhead(organize parallel work).
3. we cut the leaf into 2 parts (1 cut), laying both parts parallel, cut now again into 2 parts(1 cut), laying again the 2 parts parallel(have now 4 parallel) and cut it. We need still 6-1 cuts plus the 2 extra cuts = summary 7 cuts.
This means much more parallel. We have 4 trees. The overhead is now grown to 2 cuts plus laying 4 parts parallel.
4. we cut the leaf into 2 parts(1cut) laying both parts parallel, cut now again into 2 parts(1 cut), laying again both parts parallel and cut it into 2 parts(1 cut), laying the two parallel(have now 8 parts parallel) and cut. We need 3-1=2 cuts plus 3 extra cuts summary 5 cuts. The overhead grows now to 3 cuts plus laying 3 times (2³ = 8parts) parallel.

Summary of all:
1. sequential    = 23 cuts --> no overhead
2. parallel (2)    = 11+1=12 cuts  (1 cut overhead)
3. parallel (4)    =   5+2=7 cuts  (2 cuts overhead)
4. parallel ( 8 )  =   2+3=5 cuts  (3 cuts overhead)
-----------------------------------------------------------------------
In this way (4.) we can do the same work with still 5 cuts against (1.) 23 cuts if we use sequential work.
But attention the overhead with 3 is bigger as the real work-cuts(2) and we run 8 trees parallel.
This method of organize parallel work is called blocking. The problem is to determine the length of the pieces(fit parts) and the choose of parallel trees to get maximal performance. I believe every max performance solution is for every given work (1000), (100 000), (1 000 000) and machine a other. Therefore this thema is relative complex and so difficult to handle.

Believe me, the parallel cutted salad has a fine taste.  ;D

Have fun  ;D  ;D  ;D

regards heinz


Jason G:
 8)...Or the 'other' kind of coarse grained parallelism ... where you ask your mum to make you a sandwich instead, then you watch TV in parallel ;D  [ 0 cuts + 1 small communication overhead]

Jason

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version