Forum > Discussion Forum
AVX Optimized App Development
arkayn:
FX-4100
BOINC idle
=========================================================
Ftst_v7_J48a_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.008908 0.00000 test
fpu_ChirpData 0.017459 0.00000 test
fpu_opt_ChirpData 0.008598 0.00000 test
sse1_ChirpData_ak8e 0.007179 0.00000 test
sse2_ChirpData_ak8 0.004598 0.00000 test
sse3_ChirpData_ak8 0.004693 0.00000 test
avx_ChirpData_a 0.003840 0.00000 test
avx_ChirpData_b 0.003823 0.00000 test
avx_ChirpData_c 0.004082 0.00000 test
avx_ChirpData_d 0.004008 0.00000 test
avx_ChirpData_e 0.003937 0.00000 test
avx_ChirpData_f2 0.003716 0.00000 test
avx_ChirpData_f3 0.003702 0.00000 test
avx_ChirpData_f4 0.003782 0.00000 test
avx_ChirpData_f5 0.003708 0.00000 test
avx_ChirpData_f6 0.003687 0.00000 test
avx_ChirpData_fn 0.003996 0.00000 test
avx_ChirpData_f6 0.003687 0.00000 choice
Second run
v_ChirpData 0.008775 0.00000 test
fpu_ChirpData 0.017391 0.00000 test
fpu_opt_ChirpData 0.008710 0.00000 test
sse1_ChirpData_ak8e 0.007160 0.00000 test
sse2_ChirpData_ak8 0.004587 0.00000 test
sse3_ChirpData_ak8 0.004651 0.00000 test
avx_ChirpData_a 0.003827 0.00000 test
avx_ChirpData_b 0.003842 0.00000 test
avx_ChirpData_c 0.004111 0.00000 test
avx_ChirpData_d 0.004003 0.00000 test
avx_ChirpData_e 0.003927 0.00000 test
avx_ChirpData_f2 0.003724 0.00000 test
avx_ChirpData_f3 0.003698 0.00000 test
avx_ChirpData_f4 0.003698 0.00000 test
avx_ChirpData_f5 0.003685 0.00000 test
avx_ChirpData_f6 0.003682 0.00000 test
avx_ChirpData_fn 0.004034 0.00000 test
avx_ChirpData_f6 0.003682 0.00000 choice
Third run
v_ChirpData 0.008919 0.00000 test
fpu_ChirpData 0.017299 0.00000 test
fpu_opt_ChirpData 0.008712 0.00000 test
sse1_ChirpData_ak8e 0.007167 0.00000 test
sse2_ChirpData_ak8 0.004582 0.00000 test
sse3_ChirpData_ak8 0.004661 0.00000 test
avx_ChirpData_a 0.003819 0.00000 test
avx_ChirpData_b 0.003813 0.00000 test
avx_ChirpData_c 0.004114 0.00000 test
avx_ChirpData_d 0.003980 0.00000 test
avx_ChirpData_e 0.003898 0.00000 test
avx_ChirpData_f2 0.003759 0.00000 test
avx_ChirpData_f3 0.003696 0.00000 test
avx_ChirpData_f4 0.003692 0.00000 test
avx_ChirpData_f5 0.003704 0.00000 test
avx_ChirpData_f6 0.003698 0.00000 test
avx_ChirpData_fn 0.003895 0.00000 test
avx_ChirpData_f4 0.003692 0.00000 choice
Test duration 9.58 seconds
Ftst_v7 completed successfully.
=========================================================
i3-2120
BOINC idle
=========================================================
Ftst_v7_J48a_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.004566 0.00000 test
fpu_ChirpData 0.012321 0.00000 test
fpu_opt_ChirpData 0.004345 0.00000 test
sse1_ChirpData_ak8e 0.005710 0.00000 test
sse2_ChirpData_ak8 0.004189 0.00000 test
sse3_ChirpData_ak8 0.004102 0.00000 test
avx_ChirpData_a 0.002084 0.00000 test
avx_ChirpData_b 0.002054 0.00000 test
avx_ChirpData_c 0.002103 0.00000 test
avx_ChirpData_d 0.001930 0.00000 test
avx_ChirpData_e 0.001936 0.00000 test
avx_ChirpData_f2 0.002078 0.00000 test
avx_ChirpData_f3 0.002079 0.00000 test
avx_ChirpData_f4 0.002053 0.00000 test
avx_ChirpData_f5 0.002058 0.00000 test
avx_ChirpData_f6 0.002103 0.00000 test
avx_ChirpData_fn 0.002185 0.00000 test
avx_ChirpData_d 0.001930 0.00000 choice
Second run
v_ChirpData 0.004545 0.00000 test
fpu_ChirpData 0.012302 0.00000 test
fpu_opt_ChirpData 0.004352 0.00000 test
sse1_ChirpData_ak8e 0.005705 0.00000 test
sse2_ChirpData_ak8 0.004183 0.00000 test
sse3_ChirpData_ak8 0.004084 0.00000 test
avx_ChirpData_a 0.002081 0.00000 test
avx_ChirpData_b 0.002047 0.00000 test
avx_ChirpData_c 0.002099 0.00000 test
avx_ChirpData_d 0.001930 0.00000 test
avx_ChirpData_e 0.001931 0.00000 test
avx_ChirpData_f2 0.002081 0.00000 test
avx_ChirpData_f3 0.002056 0.00000 test
avx_ChirpData_f4 0.002053 0.00000 test
avx_ChirpData_f5 0.002057 0.00000 test
avx_ChirpData_f6 0.002049 0.00000 test
avx_ChirpData_fn 0.002185 0.00000 test
avx_ChirpData_d 0.001930 0.00000 choice
Third run
v_ChirpData 0.004597 0.00000 test
fpu_ChirpData 0.012295 0.00000 test
fpu_opt_ChirpData 0.004325 0.00000 test
sse1_ChirpData_ak8e 0.005713 0.00000 test
sse2_ChirpData_ak8 0.004178 0.00000 test
sse3_ChirpData_ak8 0.004086 0.00000 test
avx_ChirpData_a 0.002077 0.00000 test
avx_ChirpData_b 0.002046 0.00000 test
avx_ChirpData_c 0.002098 0.00000 test
avx_ChirpData_d 0.001929 0.00000 test
avx_ChirpData_e 0.001934 0.00000 test
avx_ChirpData_f2 0.002077 0.00000 test
avx_ChirpData_f3 0.002056 0.00000 test
avx_ChirpData_f4 0.002051 0.00000 test
avx_ChirpData_f5 0.002050 0.00000 test
avx_ChirpData_f6 0.002109 0.00000 test
avx_ChirpData_fn 0.002182 0.00000 test
avx_ChirpData_d 0.001929 0.00000 choice
Test duration 8.87 seconds
Ftst_v7 completed successfully.
KarVi:
FX8150@4.5
=========================================================
Ftst_v7_J48a_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.007238 0.00000 test
fpu_ChirpData 0.013833 0.00000 test
fpu_opt_ChirpData 0.007087 0.00000 test
sse1_ChirpData_ak8e 0.005670 0.00000 test
sse2_ChirpData_ak8 0.003693 0.00000 test
sse3_ChirpData_ak8 0.003748 0.00000 test
avx_ChirpData_a 0.003090 0.00000 test
avx_ChirpData_b 0.003036 0.00000 test
avx_ChirpData_c 0.003295 0.00000 test
avx_ChirpData_d 0.003220 0.00000 test
avx_ChirpData_e 0.003145 0.00000 test
avx_ChirpData_f2 0.003021 0.00000 test
avx_ChirpData_f3 0.002997 0.00000 test
avx_ChirpData_f4 0.002983 0.00000 test
avx_ChirpData_f5 0.002976 0.00000 test
avx_ChirpData_f6 0.002961 0.00000 test
avx_ChirpData_fn 0.003203 0.00000 test
avx_ChirpData_f6 0.002961 0.00000 choice
Second run
v_ChirpData 0.007260 0.00000 test
fpu_ChirpData 0.013809 0.00000 test
fpu_opt_ChirpData 0.007061 0.00000 test
sse1_ChirpData_ak8e 0.005671 0.00000 test
sse2_ChirpData_ak8 0.003688 0.00000 test
sse3_ChirpData_ak8 0.003738 0.00000 test
avx_ChirpData_a 0.003092 0.00000 test
avx_ChirpData_b 0.003046 0.00000 test
avx_ChirpData_c 0.003294 0.00000 test
avx_ChirpData_d 0.003224 0.00000 test
avx_ChirpData_e 0.003141 0.00000 test
avx_ChirpData_f2 0.003026 0.00000 test
avx_ChirpData_f3 0.003007 0.00000 test
avx_ChirpData_f4 0.002989 0.00000 test
avx_ChirpData_f5 0.002971 0.00000 test
avx_ChirpData_f6 0.002952 0.00000 test
avx_ChirpData_fn 0.003204 0.00000 test
avx_ChirpData_f6 0.002952 0.00000 choice
Third run
v_ChirpData 0.007199 0.00000 test
fpu_ChirpData 0.013817 0.00000 test
fpu_opt_ChirpData 0.007057 0.00000 test
sse1_ChirpData_ak8e 0.005661 0.00000 test
sse2_ChirpData_ak8 0.003693 0.00000 test
sse3_ChirpData_ak8 0.003740 0.00000 test
avx_ChirpData_a 0.003092 0.00000 test
avx_ChirpData_b 0.003045 0.00000 test
avx_ChirpData_c 0.003293 0.00000 test
avx_ChirpData_d 0.003222 0.00000 test
avx_ChirpData_e 0.003143 0.00000 test
avx_ChirpData_f2 0.003030 0.00000 test
avx_ChirpData_f3 0.003001 0.00000 test
avx_ChirpData_f4 0.002982 0.00000 test
avx_ChirpData_f5 0.002973 0.00000 test
avx_ChirpData_f6 0.002966 0.00000 test
avx_ChirpData_fn 0.003040 0.00000 test
avx_ChirpData_f6 0.002966 0.00000 choice
Test duration 7.68 seconds
Ftst_v7 completed successfully.
Josef W. Segur:
Thanks, it's good to be sure the test allocations weren't causing the problem.
For J49 I've collapsed the f subvariants back to a single with the same prefetch as a through e (4 cache lines ahead). Even though that may not be the best, it makes comparison easier.
Added test g, which loads data in two 128 bit chunks rather than full 256 bit chunks. That's a technique some Intel documents recommend, though it's not expected to make a large difference.
Added test h, which does TLB priming to eliminate delays crossing page boundaries, and prefetches a whole page sized block at once, like the Astropulse TWINDECHIRP. I have hopes that might make a significant difference.
The sse3_ChirpData_ak8 variant didn't have prefetch, so was often slower than sse2_ChirpData_ak8. I've put the prefetch in.
Although I've reviewed the changes to the AVX routines several times, they're significant enough there's some risk of crashing if I missed something. I hope not.
Joe
arkayn:
FX-4100@3.6
BOINC idle
=========================================================
Ftst_v7_J49_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.008692 0.00000 test
fpu_ChirpData 0.017149 0.00000 test
fpu_opt_ChirpData 0.008383 0.00000 test
sse1_ChirpData_ak8e 0.007095 0.00000 test
sse2_ChirpData_ak8 0.004506 0.00000 test
sse3_ChirpData_ak8 0.004396 0.00000 test
avx_ChirpData_a 0.003789 0.00000 test
avx_ChirpData_b 0.003719 0.00000 test
avx_ChirpData_c 0.004055 0.00000 test
avx_ChirpData_d 0.003989 0.00000 test
avx_ChirpData_e 0.003886 0.00000 test
avx_ChirpData_f 0.003646 0.00000 test
avx_ChirpData_g 0.003612 0.00000 test
avx_ChirpData_h 0.004376 0.00000 test
avx_ChirpData_g 0.003612 0.00000 choice
Second run
v_ChirpData 0.008600 0.00000 test
fpu_ChirpData 0.017273 0.00000 test
fpu_opt_ChirpData 0.008386 0.00000 test
sse1_ChirpData_ak8e 0.007135 0.00000 test
sse2_ChirpData_ak8 0.004525 0.00000 test
sse3_ChirpData_ak8 0.004410 0.00000 test
avx_ChirpData_a 0.003801 0.00000 test
avx_ChirpData_b 0.003851 0.00000 test
avx_ChirpData_c 0.004085 0.00000 test
avx_ChirpData_d 0.003954 0.00000 test
avx_ChirpData_e 0.003857 0.00000 test
avx_ChirpData_f 0.003661 0.00000 test
avx_ChirpData_g 0.003586 0.00000 test
avx_ChirpData_h 0.004445 0.00000 test
avx_ChirpData_g 0.003586 0.00000 choice
Third run
v_ChirpData 0.008727 0.00000 test
fpu_ChirpData 0.017132 0.00000 test
fpu_opt_ChirpData 0.008475 0.00000 test
sse1_ChirpData_ak8e 0.007107 0.00000 test
sse2_ChirpData_ak8 0.004575 0.00000 test
sse3_ChirpData_ak8 0.004390 0.00000 test
avx_ChirpData_a 0.003800 0.00000 test
avx_ChirpData_b 0.003817 0.00000 test
avx_ChirpData_c 0.004079 0.00000 test
avx_ChirpData_d 0.003987 0.00000 test
avx_ChirpData_e 0.003887 0.00000 test
avx_ChirpData_f 0.003646 0.00000 test
avx_ChirpData_g 0.003586 0.00000 test
avx_ChirpData_h 0.004411 0.00000 test
avx_ChirpData_g 0.003586 0.00000 choice
Test duration 7.99 seconds
Ftst_v7 completed successfully.
i3-2120@3.3
=========================================================
Ftst_v7_J49_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.004536 0.00000 test
fpu_ChirpData 0.012313 0.00000 test
fpu_opt_ChirpData 0.004323 0.00000 test
sse1_ChirpData_ak8e 0.005770 0.00000 test
sse2_ChirpData_ak8 0.004188 0.00000 test
sse3_ChirpData_ak8 0.004051 0.00000 test
avx_ChirpData_a 0.002107 0.00000 test
avx_ChirpData_b 0.002045 0.00000 test
avx_ChirpData_c 0.002098 0.00000 test
avx_ChirpData_d 0.001930 0.00000 test
avx_ChirpData_e 0.001941 0.00000 test
avx_ChirpData_f 0.002060 0.00000 test
avx_ChirpData_g 0.002071 0.00000 test
avx_ChirpData_h 0.002659 0.00000 test
avx_ChirpData_d 0.001930 0.00000 choice
Second run
v_ChirpData 0.004538 0.00000 test
fpu_ChirpData 0.012747 0.00000 test
fpu_opt_ChirpData 0.004351 0.00000 test
sse1_ChirpData_ak8e 0.005769 0.00000 test
sse2_ChirpData_ak8 0.004441 0.00000 test
sse3_ChirpData_ak8 0.004123 0.00000 test
avx_ChirpData_a 0.002079 0.00000 test
avx_ChirpData_b 0.002045 0.00000 test
avx_ChirpData_c 0.002101 0.00000 test
avx_ChirpData_d 0.001932 0.00000 test
avx_ChirpData_e 0.001932 0.00000 test
avx_ChirpData_f 0.002049 0.00000 test
avx_ChirpData_g 0.002067 0.00000 test
avx_ChirpData_h 0.002657 0.00000 test
avx_ChirpData_d 0.001932 0.00000 choice
Third run
v_ChirpData 0.004533 0.00000 test
fpu_ChirpData 0.012322 0.00000 test
fpu_opt_ChirpData 0.004320 0.00000 test
sse1_ChirpData_ak8e 0.005764 0.00000 test
sse2_ChirpData_ak8 0.004190 0.00000 test
sse3_ChirpData_ak8 0.004021 0.00000 test
avx_ChirpData_a 0.002085 0.00000 test
avx_ChirpData_b 0.002050 0.00000 test
avx_ChirpData_c 0.002098 0.00000 test
avx_ChirpData_d 0.001937 0.00000 test
avx_ChirpData_e 0.001938 0.00000 test
avx_ChirpData_f 0.002049 0.00000 test
avx_ChirpData_g 0.002071 0.00000 test
avx_ChirpData_h 0.002658 0.00000 test
avx_ChirpData_d 0.001937 0.00000 choice
Test duration 7.45 seconds
Ftst_v7 completed successfully.
KarVi:
FX8150@4.5
=========================================================
Ftst_v7_J49_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.007393 0.00000 test
fpu_ChirpData 0.013810 0.00000 test
fpu_opt_ChirpData 0.007195 0.00000 test
sse1_ChirpData_ak8e 0.005659 0.00000 test
sse2_ChirpData_ak8 0.003708 0.00000 test
sse3_ChirpData_ak8 0.003581 0.00000 test
avx_ChirpData_a 0.003086 0.00000 test
avx_ChirpData_b 0.003037 0.00000 test
avx_ChirpData_c 0.003292 0.00000 test
avx_ChirpData_d 0.003217 0.00000 test
avx_ChirpData_e 0.003131 0.00000 test
avx_ChirpData_f 0.002977 0.00000 test
avx_ChirpData_g 0.003055 0.00000 test
avx_ChirpData_h 0.003575 0.00000 test
avx_ChirpData_f 0.002977 0.00000 choice
Second run
v_ChirpData 0.007355 0.00000 test
fpu_ChirpData 0.013808 0.00000 test
fpu_opt_ChirpData 0.007272 0.00000 test
sse1_ChirpData_ak8e 0.005665 0.00000 test
sse2_ChirpData_ak8 0.003700 0.00000 test
sse3_ChirpData_ak8 0.003645 0.00000 test
avx_ChirpData_a 0.003090 0.00000 test
avx_ChirpData_b 0.003037 0.00000 test
avx_ChirpData_c 0.003290 0.00000 test
avx_ChirpData_d 0.003215 0.00000 test
avx_ChirpData_e 0.003135 0.00000 test
avx_ChirpData_f 0.002972 0.00000 test
avx_ChirpData_g 0.003060 0.00000 test
avx_ChirpData_h 0.003573 0.00000 test
avx_ChirpData_f 0.002972 0.00000 choice
Third run
v_ChirpData 0.007349 0.00000 test
fpu_ChirpData 0.013834 0.00000 test
fpu_opt_ChirpData 0.007261 0.00000 test
sse1_ChirpData_ak8e 0.005659 0.00000 test
sse2_ChirpData_ak8 0.003695 0.00000 test
sse3_ChirpData_ak8 0.003576 0.00000 test
avx_ChirpData_a 0.003093 0.00000 test
avx_ChirpData_b 0.003040 0.00000 test
avx_ChirpData_c 0.003294 0.00000 test
avx_ChirpData_d 0.003216 0.00000 test
avx_ChirpData_e 0.003132 0.00000 test
avx_ChirpData_f 0.002972 0.00000 test
avx_ChirpData_g 0.003054 0.00000 test
avx_ChirpData_h 0.003583 0.00000 test
avx_ChirpData_f 0.002972 0.00000 choice
Test duration 6.49 seconds
Ftst_v7 completed successfully.
Mine seems to like f, where arkayns prefers g. Not very conclusive :)
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version