How to test is certainly a difficult issue. Our methods using shortened WUs have in general worked out fairly well, but the extreme shortening used in the automatic installer tests is admittedly likely to choose something other than the strictly best version.
OTOH, testing with any single "typical" full length WU is unlikely to be a lot better. The ideal method is to run an app on the project for a week or two while recording time vs. angle range, then switch to another and do the same. After that, some comparisons of the data can give a reliable estimate of the comparative speed. Or you could test each app for about 35 days to get the RAC within 5% of it's &hnah value and go by that.
As to which is better on Core 2 systems, I'll just note that there are several used in our pretesting and the recommended version is based on those tests. Some of those systems are overclocked and/or have been tuned for best performance, but not all.
&nbp0; Joe