Hmm...so David is finally finishing his vector class? Or perhaps he's doing __asm { }, or maybe what Alex K and I do which is compiler intrinsics (with his vector class he tended to favor __asm ).
Well, from my studies and profiling, I think profiling is the way to go. You think simd will help in one situation but you find that a 50/50 conditional branch mispredict is really eating your cycles.
He said people could post versions of functions, but he only has testing platform for 3 now. One I've bypassed, one (smooth) I don't see being used in 5.15 (maybe 5.18), and the other pulse_find I think, is the one to shoot for.
Its easier to add new functions to test in my testing bench though