Ok, results from one of glennaxl hosts, with 2 GPUs one:

What expected: higher load on first 4 CPUs.
What unexpected - sometimes bigger load on CPUs with higher numbers. Here both groups sometime over-loaded and sometimes not - it's strange.
EDIT:
Actually, cause CPU tasks go w/o affinity, from task to task CPU, assigned for particular bench number can change. So CPU results completely expected! 4 cores always have higher load than 4 another.
Interesting to test x4 build on same host. Here I would expect only 2 overloaded cores instead of 4.
Elapsed times for GPU apps don't allow to chose the best app IMO.
If additional tests on that host possible what I would love to have:
1) benchmark script replaced on something more new, possible samples attached to this post. Lack of CPU times for GPU app is very sad.
2) test-wu6 can be excluded completely. It VLAR-killed anyway.
3) No need so much work on CPU now. GPU loaded only ~350 seconds and CPU loaded ~1600 seconds total. If CPU would be loaded slightly longer that GPU it would be OK for my purposes and save time for productive crunching

(although nothing wrong with doing all test WUs on CPU too).
4) Slightly changed experiment conditions:
a) single GPU0 run, w/o CPU loaded at all.
b) single GPU0 run with CPU fully loaded as here.
c) separate (it's important) run for V12 with both GPU + all CPU loaded.
d) again, separate run for V12b both GPU, all CPU.
e) separate run for V12b x4, both GPU, all CPU.
Is it possible to perform these additional tests?
[attachment deleted by admin]