AVX Optimized App Development

Forum > Discussion Forum

<< < (13/33) > >>

arkayn:
From the Q8200

Fredericx51:
And from a i7-2600, without and with BOINC (6.10.60)

Claggy:
Here's my E8500's J39 run. (5 runs with Boinc and apps running, 5 runs with Boinc and apps shut down)

Claggy

Josef W. Segur:
So here's J40, modified the "_b" AVX folding but expect it will probably still be slower than the "_a" version. For the 4 float SIMD folding it was beter to use non-SIMD for the very shortest cases, AVX looks like it may be better just to handle all sizes as 8 floats with masking at the end. Anyhow, I reduced my guess about how small is too small to be efficient on AVX.

Also added SSE3 and SSE1 modified chirping based on AKv8. There are two variants for SSE1, one uses the Estrin method for the polynomials, the other Horner. Estrin has one fewer instruction but Horner needs fewer registers. On my Pentium-M it's a wash, either one may be marginally faster for a single run. But perhaps on even older systems where SSE1 is the best capability it may make a difference, or perhaps some newer systems will also react in surprising ways.

I've left the AVX chirping unchanged. Of the 6 tests on AVX capable systems a was chosen twice, b twice, and c twice. The largest difference between the slowest and fastest AVX version on one test was about 12%, so it's worth gathering more data.
Joe
Edit: Attachment deleted, newer version in later post.

arkayn:
First up the Q8200

The the X4 630

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version