Forum > Discussion Forum
AVX Optimized App Development
KarVi:
Yes.
Still it is a bit puzzling, that mine definately prefers d8.
Perhaps my SRQ/UNB/L3-cache is taxed harder, because of more cores asking for work?
But this test is singlethreaded (as far as I know), so that shouldn't be the case. Maybe the clockspeeds of L3 or memory play a role here?
Anyhow the differences are small, so I can live with whatever Josef chooses to work with.
Another thing, it is daunting to see an i7@3.4 comletely dominate my 8150@4.5. AMD has much catching up to do.
arkayn:
My i3 is also cruising past the FX computers.
=========================================================
Ftst_v7_J54_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.004736 0.00000 test mintime= 0.003079
fpu_ChirpData 0.012392 0.00000 test mintime= 0.012347
fpu_opt_ChirpData 0.004540 0.00000 test mintime= 0.002796
sse1_ChirpData_ak8e 0.005779 0.00000 test mintime= 0.005765
sse2_ChirpData_ak8 0.004182 0.00000 test mintime= 0.004173
sse3_ChirpData_ak8 0.004011 0.00000 test mintime= 0.003991
avx_ChirpData_a 0.002091 0.00000 test mintime= 0.002079
avx_ChirpData_b 0.002050 0.00000 test mintime= 0.002033
avx_ChirpData_c 0.002109 0.00000 test mintime= 0.002099
avx_ChirpData_d 0.001937 0.00000 test mintime= 0.001930
avx_ChirpData_e 0.001919 0.00000 test mintime= 0.001915
avx_ChirpData_f 0.002059 0.00000 test mintime= 0.002043
avx_ChirpData_g 0.002114 0.00000 test mintime= 0.002072
avx_ChirpData_h 0.002664 0.00000 test mintime= 0.002657
avx_ChirpData_i 0.002322 0.00000 test mintime= 0.002216
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_e 0.001919 0.00000 choice
Second run
v_ChirpData 0.004711 0.00000 test mintime= 0.003087
fpu_ChirpData 0.012465 0.00000 test mintime= 0.012372
fpu_opt_ChirpData 0.004542 0.00000 test mintime= 0.002788
sse1_ChirpData_ak8e 0.005808 0.00000 test mintime= 0.005765
sse2_ChirpData_ak8 0.004187 0.00000 test mintime= 0.004172
sse3_ChirpData_ak8 0.004033 0.00000 test mintime= 0.003997
avx_ChirpData_a 0.002120 0.00000 test mintime= 0.002079
avx_ChirpData_b 0.002092 0.00000 test mintime= 0.002032
avx_ChirpData_c 0.002111 0.00000 test mintime= 0.002100
avx_ChirpData_d 0.001945 0.00000 test mintime= 0.001933
avx_ChirpData_e 0.001928 0.00000 test mintime= 0.001918
avx_ChirpData_f 0.002057 0.00000 test mintime= 0.002042
avx_ChirpData_g 0.002103 0.00000 test mintime= 0.002072
avx_ChirpData_h 0.002668 0.00000 test mintime= 0.002656
avx_ChirpData_i 0.002222 0.00000 test mintime= 0.002214
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_e 0.001928 0.00000 choice
Third run
v_ChirpData 0.004706 0.00000 test mintime= 0.003076
fpu_ChirpData 0.012670 0.00000 test mintime= 0.012353
fpu_opt_ChirpData 0.004944 0.00000 test mintime= 0.002788
sse1_ChirpData_ak8e 0.005822 0.00000 test mintime= 0.005767
sse2_ChirpData_ak8 0.004212 0.00000 test mintime= 0.004173
sse3_ChirpData_ak8 0.004047 0.00000 test mintime= 0.003995
avx_ChirpData_a 0.002284 0.00000 test mintime= 0.002082
avx_ChirpData_b 0.002036 0.00000 test mintime= 0.002034
avx_ChirpData_c 0.002104 0.00000 test mintime= 0.002100
avx_ChirpData_d 0.001941 0.00000 test mintime= 0.001931
avx_ChirpData_e 0.001917 0.00000 test mintime= 0.001916
avx_ChirpData_f 0.002052 0.00000 test mintime= 0.002042
avx_ChirpData_g 0.002077 0.00000 test mintime= 0.002072
avx_ChirpData_h 0.002668 0.00000 test mintime= 0.002657
avx_ChirpData_i 0.002220 0.00000 test mintime= 0.002213
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_e 0.001917 0.00000 choice
Test duration 8.06 seconds
Ftst_v7 completed successfully.
KarVi:
Just proves my point :)
It would be nice if Josef had the time for a more specific AMD build, now he is running AMD specific FMA4 anyway.
A build made to take advantage of any strengths, and avoid (m)any weakneses, specifically for BD, would probably perform even better.
I'm not a programmer, but I have read a little of what Agner Fog has written about optimizing code, and it seems there are many do's and don'ts for Bulldozer, and they don't allways correspond well with the do's and dont's for the i-series.
I don't now how AMD specific he is doing things, and AMD specific development would probably require that he had access to a BD based system, and I don't think he has that. The current remote testing is a bit slow :)
Under the circumstances I actually believe he's doing an excellent job.
But I hope he keeps up his efforts, IŽll keep an eye and test all that I can, to help him in his work.
Claggy:
i7-2600K @4.7GHz (Boinc running):
=========================================================
Ftst_v7_J54_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.013263 0.00000 test mintime= 0.008009
fpu_ChirpData 0.016376 0.00000 test mintime= 0.015240
fpu_opt_ChirpData 0.010888 0.00000 test mintime= 0.004957
sse1_ChirpData_ak8e 0.006530 0.00000 test mintime= 0.006036
sse2_ChirpData_ak8 0.005245 0.00000 test mintime= 0.004845
sse3_ChirpData_ak8 0.005638 0.00000 test mintime= 0.005391
avx_ChirpData_a 0.003428 0.00000 test mintime= 0.002866
avx_ChirpData_b 0.003293 0.00000 test mintime= 0.003004
avx_ChirpData_c 0.003464 0.00000 test mintime= 0.002649
avx_ChirpData_d 0.003401 0.00000 test mintime= 0.003068
avx_ChirpData_e 0.003336 0.00000 test mintime= 0.002747
avx_ChirpData_f 0.003858 0.00000 test mintime= 0.002764
avx_ChirpData_g 0.003393 0.00000 test mintime= 0.002854
avx_ChirpData_h 0.004357 0.00000 test mintime= 0.003769
avx_ChirpData_i 0.003741 0.00000 test mintime= 0.003195
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_b 0.003293 0.00000 choice
Second run
v_ChirpData 0.011919 0.00000 test mintime= 0.005614
fpu_ChirpData 0.017020 0.00000 test mintime= 0.015932
fpu_opt_ChirpData 0.012288 0.00000 test mintime= 0.005588
sse1_ChirpData_ak8e 0.006986 0.00000 test mintime= 0.006516
sse2_ChirpData_ak8 0.005728 0.00000 test mintime= 0.005103
sse3_ChirpData_ak8 0.005483 0.00000 test mintime= 0.005053
avx_ChirpData_a 0.003443 0.00000 test mintime= 0.003055
avx_ChirpData_b 0.003343 0.00000 test mintime= 0.002985
avx_ChirpData_c 0.003370 0.00000 test mintime= 0.002868
avx_ChirpData_d 0.003293 0.00000 test mintime= 0.002583
avx_ChirpData_e 0.003045 0.00000 test mintime= 0.002483
avx_ChirpData_f 0.003491 0.00000 test mintime= 0.003050
avx_ChirpData_g 0.003368 0.00000 test mintime= 0.002979
avx_ChirpData_h 0.004322 0.00000 test mintime= 0.003799
avx_ChirpData_i 0.003393 0.00000 test mintime= 0.002930
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_e 0.003045 0.00000 choice
Third run
v_ChirpData 0.011498 0.00000 test mintime= 0.006496
fpu_ChirpData 0.017043 0.00000 test mintime= 0.016015
fpu_opt_ChirpData 0.012360 0.00000 test mintime= 0.006857
sse1_ChirpData_ak8e 0.006964 0.00000 test mintime= 0.006531
sse2_ChirpData_ak8 0.005478 0.00000 test mintime= 0.004993
sse3_ChirpData_ak8 0.005408 0.00000 test mintime= 0.005107
avx_ChirpData_a 0.003465 0.00000 test mintime= 0.003220
avx_ChirpData_b 0.003389 0.00000 test mintime= 0.002860
avx_ChirpData_c 0.003296 0.00000 test mintime= 0.002902
avx_ChirpData_d 0.002841 0.00000 test mintime= 0.002393
avx_ChirpData_e 0.003209 0.00000 test mintime= 0.002488
avx_ChirpData_f 0.003274 0.00000 test mintime= 0.002586
avx_ChirpData_g 0.003199 0.00000 test mintime= 0.002958
avx_ChirpData_h 0.003922 0.00000 test mintime= 0.003441
avx_ChirpData_i 0.003587 0.00000 test mintime= 0.003252
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_d 0.002841 0.00000 choice
Test duration 8.87 seconds
Ftst_v7 completed successfully.
=========================================================
i7-2600K @4.7GHz (Boinc suspended):
=========================================================
Ftst_v7_J54_Chirponly started.
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_ChirpData 0.003799 0.00000 test mintime= 0.002485
fpu_ChirpData 0.008702 0.00000 test mintime= 0.008688
fpu_opt_ChirpData 0.003864 0.00000 test mintime= 0.002353
sse1_ChirpData_ak8e 0.004217 0.00000 test mintime= 0.004188
sse2_ChirpData_ak8 0.003165 0.00000 test mintime= 0.003149
sse3_ChirpData_ak8 0.002986 0.00000 test mintime= 0.002965
avx_ChirpData_a 0.001482 0.00000 test mintime= 0.001472
avx_ChirpData_b 0.001644 0.00000 test mintime= 0.001640
avx_ChirpData_c 0.001485 0.00000 test mintime= 0.001483
avx_ChirpData_d 0.001398 0.00000 test mintime= 0.001375
avx_ChirpData_e 0.001535 0.00000 test mintime= 0.001533
avx_ChirpData_f 0.001587 0.00000 test mintime= 0.001578
avx_ChirpData_g 0.001639 0.00000 test mintime= 0.001632
avx_ChirpData_h 0.002034 0.00000 test mintime= 0.002018
avx_ChirpData_i 0.001738 0.00000 test mintime= 0.001735
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_d 0.001398 0.00000 choice
Second run
v_ChirpData 0.003811 0.00000 test mintime= 0.002484
fpu_ChirpData 0.008764 0.00000 test mintime= 0.008694
fpu_opt_ChirpData 0.003714 0.00000 test mintime= 0.002344
sse1_ChirpData_ak8e 0.004225 0.00000 test mintime= 0.004195
sse2_ChirpData_ak8 0.003157 0.00000 test mintime= 0.003152
sse3_ChirpData_ak8 0.002983 0.00000 test mintime= 0.002963
avx_ChirpData_a 0.001472 0.00000 test mintime= 0.001471
avx_ChirpData_b 0.001644 0.00000 test mintime= 0.001639
avx_ChirpData_c 0.001484 0.00000 test mintime= 0.001481
avx_ChirpData_d 0.001377 0.00000 test mintime= 0.001374
avx_ChirpData_e 0.001533 0.00000 test mintime= 0.001530
avx_ChirpData_f 0.001586 0.00000 test mintime= 0.001580
avx_ChirpData_g 0.001633 0.00000 test mintime= 0.001630
avx_ChirpData_h 0.002026 0.00000 test mintime= 0.002010
avx_ChirpData_i 0.001737 0.00000 test mintime= 0.001734
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_d 0.001377 0.00000 choice
Third run
v_ChirpData 0.003803 0.00000 test mintime= 0.002487
fpu_ChirpData 0.008781 0.00000 test mintime= 0.008689
fpu_opt_ChirpData 0.003718 0.00000 test mintime= 0.002357
sse1_ChirpData_ak8e 0.004211 0.00000 test mintime= 0.004175
sse2_ChirpData_ak8 0.003170 0.00000 test mintime= 0.003150
sse3_ChirpData_ak8 0.002982 0.00000 test mintime= 0.002965
avx_ChirpData_a 0.001477 0.00000 test mintime= 0.001472
avx_ChirpData_b 0.001644 0.00000 test mintime= 0.001640
avx_ChirpData_c 0.001492 0.00000 test mintime= 0.001482
avx_ChirpData_d 0.001377 0.00000 test mintime= 0.001374
avx_ChirpData_e 0.001535 0.00000 test mintime= 0.001532
avx_ChirpData_f 0.001589 0.00000 test mintime= 0.001575
avx_ChirpData_g 0.001633 0.00000 test mintime= 0.001630
avx_ChirpData_h 0.002043 0.00000 test mintime= 0.002011
avx_ChirpData_i 0.001739 0.00000 test mintime= 0.001733
avx_fma4_ChirpData_a not supported by system
avx_fma4_ChirpData_d4 not supported by system
avx_fma4_ChirpData_d6 not supported by system
avx_fma4_ChirpData_d8 not supported by system
avx_fma4_ChirpData_e not supported by system
avx_ChirpData_d 0.001377 0.00000 choice
Test duration 5.67 seconds
Ftst_v7 completed successfully.
Claggy
Josef W. Segur:
Although there are still puzzles from the tests so far, with the attached J55 I've added another dimension to the tests. J54 and earlier have been doing full Mebisample chirping as needed before doing Gaussian, Pulse, and Triplet finding. For cases where that's not needed, AK_v8 becomes more cache friendly by subdividing. So I modified all the chirp functions to support that, and J55 does testing at 128K and 32K in addition. The timings ought to be about 1/8 and 1/32 of the full length tests.
I do appreciate the testing, and am glad the Ivy Bridge system reacted like other Intel CPUs. Whatever form of dispatch is eventually used, keeping the number of code paths low will be more efficient.
Joe
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version