Forum > Windows
optimized sources
Jason G:
--- Quote --- 4- Chirp function Block Prefetch, memcpy++ zerocase & 3phase chirp Generic x86 Untested ~?.?%
--- End quote ---
Took a quick look between school and work, looks like this may be easier than I thought to try. On my configuration the consistantly selected chirping function is the outstanding "sse2_ChirpData_ak". nice one.
The structure is already there for potential 3 phase processing, though it is currently straight SSE2 rendering it vectorised SIMD as far as I can see. The existing prefetch, processing and writing sections are all SSE2, clearly laid out and exhibit the clean crystal vase like 'niceness' quality that make you reluctant to tamper :D
With few other adaptations, adjusting the prefetch, changing the processing to FPU, and suitably adjusting the streaming writes should do the trick,
... though for the p4 I would like to try to keep the aliasing issue in mind which might just dictate some of the block sizes and order they are processed.
Oh for the weekend :D
Jason G:
First run of original code [ Will need run more times for baseline though ] : ( Very Nice function already )
--------------------------------------------------------------------------------------
Testing xN SSE2 Build.
sse2_ChirpData_ak:
NumDataPoints = 1024*1024
test_points = 32768
Timer Frequency in:
Hz = 3579545
MHz = 3.57955
GHz = 0.00358
Start Time = 1585115997106 Ticks
Stop Time = 1585116003199 Ticks
Duration in Ticks = 6093
Duration in seconds = 0.0017021716447
--------------------------------------------------------------------------------------
Inner loop executes 8192 times
_heinz:
measure its the best to try code and find optimal variants. ;D
the loop construct in pulsefind.cpp is ready now, but not measured.
Today I will squeeze the case-construct code.
have still some good ideas to eleminate code else and there...we will see...
Jason G:
--- Quote from: seti_britta on 07 Nov 2007, 11:47:04 am ---measure its the best to try code and find optimal variants. ;D
the loop construct in pulsefind.cpp is ready now, but not measured.
Today I will squeeze the case-construct code.
have still some good ideas to eleminate code else and there...we will see...
--- End quote ---
Great!, a pulsefind baseline will be good too. for underneath pulsefind It seems my machine also selects always AK folding routines and spends much of its time in the x2AL version.. I am running vtune on the chirp one now to look for any p4 specific slowdowns, wickedly fast code though :D
_heinz:
--- Quote from: j_groothu on 07 Nov 2007, 12:14:29 pm ---
--- Quote from: seti_britta on 07 Nov 2007, 11:47:04 am ---
--- End quote ---
I am running vtune on the chirp one now to look for any p4 specific slowdowns, wickedly fast code though :D
--- End quote ---
have a strong modified chirpfft.cpp which we can try too
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version