Forum > Discussion Forum
AVX Optimized App Development
Josef W. Segur:
We're now working toward releases using FFTW 3.3 and/or recent versions of IPP with AVX support for doing FFTs for both Astropulse v6 and SETI@home v7. At least some of the AVX routines developed for this thread should go in too, with some modifications.
Meanwhile there have been some AMD Bulldozer CPUs with AVX support released. I'd like to see some test runs of J45 (from my earlier post) on those CPUs, the AVX implementation is likely to respond differently than Intel's. Win 7 SP1 / Windows Server 2008 R2 or later is still needed, MS will certainly not backport the changes needed for OS support to Vista or XP.
The AMD optimization manuals do hint at some of the ways their implementation differs, I might try some further variations if the tests indicate a need.
Joe
arkayn:
FX-4100
=========================================================
Ftst_v7_J45 started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_BaseLineSmooth (no other)
v_GetPowerSpectrum 0.000165 0.00000 test
v_vGetPowerSpectrum 0.000064 0.00000 test
v_vGetPowerSpectrum2 0.000072 0.00000 test
v_vGetPowerSpectrumUnrolled 0.000052 0.00000 test
v_vGetPowerSpectrumUnrolled2 0.000065 0.00000 test
v_avxGetPowerSpectrum 0.000089 0.00000 test
v_vGetPowerSpectrumUnrolled 0.000052 0.00000 choice
v_ChirpData 0.009393 0.00000 test
fpu_ChirpData 0.017479 0.00000 test
fpu_opt_ChirpData 0.009286 0.00000 test
v_vChirpData_x86_64 0.053607 0.00000 test
sse1_ChirpData_ak 0.010594 0.00000 test
sse1_ChirpData_ak8e 0.007210 0.00000 test
sse1_ChirpData_ak8h 0.007588 0.00000 test
sse2_ChirpData_ak 0.007590 0.00000 test
sse2_ChirpData_ak8 0.004558 0.00000 test
sse3_ChirpData_ak 0.007020 0.00000 test
sse3_ChirpData_ak8 0.004675 0.00000 test
avx_ChirpData_a 0.003773 0.00000 test
avx_ChirpData_b 0.003815 0.00000 test
avx_ChirpData_c 0.004179 0.00000 test
avx_ChirpData_d 0.003993 0.00000 test
avx_ChirpData_a 0.003773 0.00000 choice
v_Transpose 0.009390 0.00000 test
v_Transpose2 0.004041 0.00000 test
v_Transpose4 0.004842 0.00000 test
v_Transpose8 0.008043 0.00000 test
v_pfTranspose2 0.003915 0.00000 test
v_pfTranspose4 0.003701 0.00000 test
v_pfTranspose8 0.007678 0.00000 test
v_vTranspose4 0.002108 0.00000 test
v_vTranspose4np 0.002115 0.00000 test
v_vTranspose4ntw 0.006964 0.00000 test
v_vTranspose4x8ntw 0.003505 0.00000 test
v_vTranspose4x16ntw 0.002523 0.00000 test
v_vpfTranspose8x4ntw 0.006799 0.00000 test
v_avxTranspose4x8ntw 0.003511 0.00000 test
v_avxTranspose4x16ntw 0.002197 0.00000 test
v_avxTranspose8x4ntw 0.006822 0.00000 test
v_avxTranspose8x8ntw_a 0.003503 0.00000 test
v_avxTranspose8x8ntw_b 0.003660 0.00000 test
v_vTranspose4 0.002108 0.00000 choice
FPU opt folding 0.003458 0.00000 test
AK SSE folding 0.000858 0.00000 test
BH SSE folding 0.000726 0.00000 test
JS AVX_a folding 0.000736 0.00000 test
JS AVX_c folding 0.000820 0.00000 test
BH SSE folding 0.000726 0.00000 choice
Test duration 4.63 seconds
Ftst_v7 completed successfully.
KarVi:
AMD FX-8150 @4.3GHz
=========================================================
Ftst_v7_J45 started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_BaseLineSmooth (no other)
v_GetPowerSpectrum 0.000138 0.00000 test
v_vGetPowerSpectrum 0.000053 0.00000 test
v_vGetPowerSpectrum2 0.000059 0.00000 test
v_vGetPowerSpectrumUnrolled 0.000042 0.00000 test
v_vGetPowerSpectrumUnrolled2 0.000054 0.00000 test
v_avxGetPowerSpectrum 0.000066 0.00000 test
v_vGetPowerSpectrumUnrolled 0.000042 0.00000 choice
v_ChirpData 0.007723 0.00000 test
fpu_ChirpData 0.014326 0.00000 test
fpu_opt_ChirpData 0.007722 0.00000 test
v_vChirpData_x86_64 0.043058 0.00000 test
sse1_ChirpData_ak 0.006840 0.00000 test
sse1_ChirpData_ak8e 0.005882 0.00000 test
sse1_ChirpData_ak8h 0.005998 0.00000 test
sse2_ChirpData_ak 0.006243 0.00000 test
sse2_ChirpData_ak8 0.003763 0.00000 test
sse3_ChirpData_ak 0.005858 0.00000 test
sse3_ChirpData_ak8 0.003847 0.00000 test
avx_ChirpData_a 0.003160 0.00000 test
avx_ChirpData_b 0.003138 0.00000 test
avx_ChirpData_c 0.003387 0.00000 test
avx_ChirpData_d 0.003302 0.00000 test
avx_ChirpData_b 0.003138 0.00000 choice
v_Transpose 0.007775 0.00000 test
v_Transpose2 0.003264 0.00000 test
v_Transpose4 0.003892 0.00000 test
v_Transpose8 0.006481 0.00000 test
v_pfTranspose2 0.003249 0.00000 test
v_pfTranspose4 0.003022 0.00000 test
v_pfTranspose8 0.006095 0.00000 test
v_vTranspose4 0.001745 0.00000 test
v_vTranspose4np 0.001746 0.00000 test
v_vTranspose4ntw 0.005688 0.00000 test
v_vTranspose4x8ntw 0.003089 0.00000 test
v_vTranspose4x16ntw 0.002110 0.00000 test
v_vpfTranspose8x4ntw 0.005689 0.00000 test
v_avxTranspose4x8ntw 0.003090 0.00000 test
v_avxTranspose4x16ntw 0.001814 0.00000 test
v_avxTranspose8x4ntw 0.005711 0.00000 test
v_avxTranspose8x8ntw_a 0.003089 0.00000 test
v_avxTranspose8x8ntw_b 0.003108 0.00000 test
v_vTranspose4 0.001745 0.00000 choice
FPU opt folding 0.002837 0.00000 test
AK SSE folding 0.000670 0.00000 test
BH SSE folding 0.000603 0.00000 test
JS AVX_a folding 0.000613 0.00000 test
JS AVX_c folding 0.000682 0.00000 test
BH SSE folding 0.000603 0.00000 choice
Test duration 3.66 seconds
Ftst_v7 completed successfully.
Josef W. Segur:
I did sort of a survey of Beta hosts running stock S@H v7 with Bulldozer and Sandy Bridge CPUs to see which chirp variants were chosen most. Bulldozer were about 8% a, 34% b, 10% c, and 49% d. Sandy Bridge were about 13% a, 9% b, 9% c, and 69% d. That was for 277 results on Bulldozer and 296 on Sandy Bridge, so may be at least roughly meaningful.
I'll attach a J46 version of the test which has two added chirp variants which might possibly be even better than the d which was obviously the previous best. I left out the other kinds of functions this time, haven't figured out any significant improvements for those. But each run of the program now does the chirp tests 3 times.
What I'm aiming at, short term, is one best AVX chirp function which can be put into the existing Lunatics CPU code for a targeted AVX build. Hopefully we'll be able to use some dispatch functionality in future to keep the number of different builds down, but that's not ready yet.
Edit: attachment removed, see later posts for current chirp only version.
Joe
arkayn:
FX-4100
=========================================================
Ftst_v7_J46_Chirponly started.
Ignored: j46.txt
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.009781 0.00000 test
fpu_ChirpData 0.017815 0.00000 test
fpu_opt_ChirpData 0.009309 0.00000 test
sse1_ChirpData_ak 0.008972 0.00000 test
sse1_ChirpData_ak8e 0.007175 0.00000 test
sse1_ChirpData_ak8h 0.007744 0.00000 test
sse2_ChirpData_ak 0.007798 0.00000 test
sse2_ChirpData_ak8 0.004583 0.00000 test
sse3_ChirpData_ak 0.007205 0.00000 test
sse3_ChirpData_ak8 0.004993 0.00000 test
avx_ChirpData_a 0.003887 0.00000 test
avx_ChirpData_b 0.004019 0.00000 test
avx_ChirpData_c 0.004198 0.00000 test
avx_ChirpData_d 0.004099 0.00000 test
avx_ChirpData_e 0.004393 0.00000 test
avx_ChirpData_f 0.003727 0.00000 test
avx_ChirpData_f 0.003727 0.00000 choice
Second run
v_ChirpData 0.009942 0.00000 test
fpu_ChirpData 0.017879 0.00000 test
fpu_opt_ChirpData 0.009913 0.00000 test
sse1_ChirpData_ak 0.008894 0.00000 test
sse1_ChirpData_ak8e 0.007540 0.00000 test
sse1_ChirpData_ak8h 0.007731 0.00000 test
sse2_ChirpData_ak 0.007974 0.00000 test
sse2_ChirpData_ak8 0.004620 0.00000 test
sse3_ChirpData_ak 0.007186 0.00000 test
sse3_ChirpData_ak8 0.004808 0.00000 test
avx_ChirpData_a 0.004083 0.00000 test
avx_ChirpData_b 0.003978 0.00000 test
avx_ChirpData_c 0.004161 0.00000 test
avx_ChirpData_d 0.004288 0.00000 test
avx_ChirpData_e 0.003972 0.00000 test
avx_ChirpData_f 0.003840 0.00000 test
avx_ChirpData_f 0.003840 0.00000 choice
Third run
v_ChirpData 0.009758 0.00000 test
fpu_ChirpData 0.018261 0.00000 test
fpu_opt_ChirpData 0.009494 0.00000 test
sse1_ChirpData_ak 0.009149 0.00000 test
sse1_ChirpData_ak8e 0.007363 0.00000 test
sse1_ChirpData_ak8h 0.007963 0.00000 test
sse2_ChirpData_ak 0.007715 0.00000 test
sse2_ChirpData_ak8 0.004633 0.00000 test
sse3_ChirpData_ak 0.007329 0.00000 test
sse3_ChirpData_ak8 0.004750 0.00000 test
avx_ChirpData_a 0.004010 0.00000 test
avx_ChirpData_b 0.004000 0.00000 test
avx_ChirpData_c 0.004277 0.00000 test
avx_ChirpData_d 0.004212 0.00000 test
avx_ChirpData_e 0.004129 0.00000 test
avx_ChirpData_f 0.003745 0.00000 test
avx_ChirpData_f 0.003745 0.00000 choice
Test duration 9.91 seconds
Ftst_v7 completed successfully.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version