Forum > Discussion Forum
AVX Optimized App Development
Jason G:
Similar result here on the E8400 (of course). Darn, now I'm CPU shopping ::)
Josef W. Segur:
--- Quote from: arkayn on 28 Apr 2011, 09:44:19 pm ---Runs fine on my Q8200
...
--- End quote ---
Thanks, that's a better basis for comparison since it includes the SSE3 chirp which 'most everyone will see. And although I'm not particularly concerned about the 13 lines of assembly code which checks CPU and OS to decide whether AVX is supported, confirmation that Win7 SP1 by itself isn't enough is good.
Joe
Josef W. Segur:
From dnolan via PM at NC, result on his i7 2600 w/W7 64 SP1:
--- Code: ---Ftst_v7 started.
Optimal function choices:
-------------------------------------------------------
name timing error
-------------------------------------------------------
v_BaseLineSmooth (no other)
v_GetPowerSpectrum 0.00010 0.00000 test
v_vGetPowerSpectrum 0.00005 0.00000 test
v_vGetPowerSpectrum2 0.00006 0.00000 test
v_vGetPowerSpectrumUnrolled 0.00005 0.00000 test
v_vGetPowerSpectrumUnrolled2 0.00007 0.00000 test
v_avxGetPowerSpectrum 0.00004 38.07197 test
v_vGetPowerSpectrumUnrolled 0.00005 0.00000 choice
v_ChirpData 0.00444 0.00000 test
fpu_ChirpData 0.01053 0.00000 test
fpu_opt_ChirpData 0.00444 0.00000 test
v_vChirpData_x86_64 0.05060 0.00000 test
sse1_ChirpData_ak 0.00590 0.00000 test
sse2_ChirpData_ak 0.00567 0.00000 test
sse3_ChirpData_ak 0.00556 0.00000 test
avx_ChirpData_a 0.00230 0.85637 test
avx_ChirpData_b 0.00231 0.85637 test
v_ChirpData 0.00444 0.00000 choice
v_Transpose 0.00270 0.00000 test
v_Transpose2 0.00292 0.00000 test
v_Transpose4 0.00149 0.00000 test
v_Transpose8 0.00271 0.00000 test
v_pfTranspose2 0.00161 0.00000 test
v_pfTranspose4 0.00149 0.00000 test
v_pfTranspose8 0.00313 0.00000 test
v_vTranspose4 0.00088 0.00000 test
v_vTranspose4np 0.00114 0.00000 test
v_vTranspose4ntw 0.00716 0.00000 test
v_vTranspose4x8ntw 0.00298 0.00000 test
v_vTranspose4x16ntw 0.00085 0.00000 test
v_vpfTranspose8x4ntw 0.00719 0.00000 test
v_avxTranspose8x4ntw 0.00299 0.00000 test
v_avxTranspose8x8ntw 0.00232 9696326.77324 test
v_vTranspose4x16ntw 0.00085 0.00000 choice
FPU opt folding 0.00204 0.00000 test
AK SSE folding 0.00045 0.00000 test
BH SSE folding 0.00043 0.00000 test
BH SSE folding 0.00043 0.00000 choice
Test duration 2.53 seconds
Ftst_v7 completed successfully.
--- End code ---
Nice speedups on the Chirp functions, but I obviously need to rework data shuffling.
Joe
Jason G:
--- Quote from: Josef W. Segur on 29 Apr 2011, 11:17:30 am ---Nice speedups on the Chirp functions, but I obviously need to rework data shuffling.
--- End quote ---
Numbered bottlecaps help with that for me. Good to see some hints that with work the architecture additions may perform very well.
Jason
Claggy:
--- Quote from: Jason G on 28 Apr 2011, 10:16:04 pm ---Similar result here on the E8400 (of course). Darn, now I'm CPU shopping ::)
--- End quote ---
This is what an E8500 @ 4.14GHz gets (with Boinc, v7 Seti Beta CPU apps, an NV Seti Cuda MB app and an ATI OpenCL Seti MB app running)(ran it 5 times):
--- Code: ---Ftst_v7 started.
Optimal function choices:
-------------------------------------------------------
name timing error
-------------------------------------------------------
v_BaseLineSmooth (no other)
v_GetPowerSpectrum 0.00013 0.00000 test
v_vGetPowerSpectrum 0.00006 0.00000 test
v_vGetPowerSpectrum2 0.00006 0.00000 test
v_vGetPowerSpectrumUnrolled 0.00005 0.00000 test
v_vGetPowerSpectrumUnrolled2 0.00006 0.00000 test
v_vGetPowerSpectrumUnrolled 0.00005 0.00000 choice
v_ChirpData 0.03146 0.00000 test
fpu_ChirpData 0.01685 0.00000 test
fpu_opt_ChirpData 0.02659 0.00000 test
v_vChirpData_x86_64 0.04977 0.00000 test
sse1_ChirpData_ak 0.00881 0.00000 test
sse2_ChirpData_ak 0.00886 0.00000 test
sse3_ChirpData_ak 0.00829 0.00000 test
sse3_ChirpData_ak 0.00829 0.00000 choice
v_Transpose 0.00389 0.00000 test
v_Transpose2 0.00476 0.00000 test
v_Transpose4 0.00464 0.00000 test
v_Transpose8 0.01212 0.00000 test
v_pfTranspose2 0.00397 0.00000 test
v_pfTranspose4 0.00477 0.00000 test
v_pfTranspose8 0.01263 0.00000 test
v_vTranspose4 0.00396 0.00000 test
v_vTranspose4np 0.00585 0.00000 test
v_vTranspose4ntw 0.00690 0.00000 test
v_vTranspose4x8ntw 0.00649 0.00000 test
v_vTranspose4x16ntw 0.00532 0.00000 test
v_vpfTranspose8x4ntw 0.00568 0.00000 test
v_Transpose 0.00389 0.00000 choice
FPU opt folding 0.00194 0.00000 test
AK SSE folding 0.00072 0.00000 test
BH SSE folding 0.00071 0.00000 test
BH SSE folding 0.00071 0.00000 choice
Test duration 4.21 seconds
Ftst_v7 completed successfully.
--- End code ---
Claggy
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version