Forum > Windows

optimized sources

<< < (50/179) > >>

Jason G:

--- Quote --- 4- Chirp function Block Prefetch, memcpy++ zerocase & 3phase chirp                  Generic x86   Untested        ~?.?%
--- End quote ---

Took a quick look between school and work, looks like this may be easier than I thought to try.  On my configuration the consistantly selected chirping function is the outstanding "sse2_ChirpData_ak".  nice one.

The structure is already there for potential 3 phase processing, though it is currently straight SSE2 rendering it vectorised SIMD as far as I can see. The existing prefetch, processing and writing sections are all SSE2, clearly laid out and exhibit the clean crystal vase like 'niceness' quality that make you reluctant to tamper :D

With few other adaptations, adjusting the prefetch, changing the processing to FPU, and suitably adjusting the streaming writes should do the trick,
  ... though for the p4 I would like to try to keep the aliasing issue in mind which might just dictate some of the block sizes and order they are processed.

Oh for the weekend :D

Jason G:
First run of original code [ Will need run more times for baseline though ] : ( Very Nice function already )

--------------------------------------------------------------------------------------
Testing xN SSE2 Build.

sse2_ChirpData_ak:

NumDataPoints = 1024*1024
test_points = 32768

Timer Frequency in:

Hz  =       3579545
MHz =       3.57955
GHz =    0.00358

Start Time =    1585115997106 Ticks
Stop Time  =    1585116003199 Ticks

Duration in Ticks   =  6093
Duration in seconds =  0.0017021716447

--------------------------------------------------------------------------------------

Inner loop executes 8192 times

_heinz:
measure its the best to try code and find optimal variants.  ;D

the loop construct in pulsefind.cpp is ready now, but not measured.
Today I will squeeze the case-construct code.
have still some good ideas to eleminate code else and there...we will see...

Jason G:

--- Quote from: seti_britta on 07 Nov 2007, 11:47:04 am ---measure its the best to try code and find optimal variants.  ;D

the loop construct in pulsefind.cpp is ready now, but not measured.
Today I will squeeze the case-construct code.
have still some good ideas to eleminate code else and there...we will see...



--- End quote ---

Great!, a pulsefind baseline will be good too. for underneath pulsefind  It seems my machine also selects always AK folding routines and spends much of its time in the x2AL version..  I am running vtune on the chirp one now to look for any p4 specific slowdowns, wickedly fast code though :D

_heinz:

--- Quote from: j_groothu on 07 Nov 2007, 12:14:29 pm ---
--- Quote from: seti_britta on 07 Nov 2007, 11:47:04 am ---


--- End quote ---

 I am running vtune on the chirp one now to look for any p4 specific slowdowns, wickedly fast code though :D


--- End quote ---
have a strong modified chirpfft.cpp which we can try  too

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version