C-60 APU and Radeon HD6920

Forum > GPU crunching

<< < (4/7) > >>

Jason G:

--- Quote from: Raistmer on 04 Jan 2012, 12:28:16 pm ---I would not talk about any convergence after these tests done. Looks like divergence much more adequate term...
Look for number of builds that DON'T run on these new AMD chips...
Whole x64 AKv8b2 set fails to even report error...
--- End quote ---

When Intel specific target chip libraries & builds were used by design, and it runs at all (let alone better in some cases), I find that surprising, since a static Intel build should run badly if at all on the wrong chip, even within Intel silicon due to micro-architectural optimisation being on the heavy side. With the instruction sets, I'm more referring that general SSE3 performs across the board pretty well on newer chips from both vendors, where neither with Core2, Athlon nor PhenomI/II was 'plain Intel SSE3' a good choice of code & libraries.

With juggling, We *should* find, that a static build for each x86 & x64, each with generically optimised SSE3, SSE2 with static IPP in both flavours & FFTW, should have a workable combination for most chips except Core2 & AVX. Obviously AVX availability being OS dependant, that would need to be tacked on with proper detection, so similar with SSSE3 seems viable.

Jason

Mike:
I would really like to see how the FX performs with a 64 bit app.
I´m almost certain it would benefit more as an Intel to be honest.

Mike

Jason G:

--- Quote from: Mike on 04 Jan 2012, 07:42:40 pm ---I would really like to see how the FX performs with a 64 bit app.
I´m almost certain it would benefit more as an Intel to be honest.

Mike
--- End quote ---
I tend to agree there would be some benefit, though would expect only around 10% by just changing bittage, solely because much of the hot code is 128 bit vectorised SIMD already anyway (and will be 256 bit with AVX support additions later). The periphery CPU non-vectorised code (which would become 64 bit) only has a marginal possible impact in this kind of application.

It's working out how to approach that correctly given 64 bit Intel libraries don't contain suitable generically optimised 64 bit libraries, that becomes the technical challenge. For a 64 bit build, at the moment it looks as though a worthy option to try will be bypassing the issues by using newest FFTW, with automatic AVX support inclusive, ironically built with Intel compiler, & use generic arch:SSE3 optimisations for the core, with a arch:SSE2 path for some earlier chips. Where I talk about convergence is that it's looking as though i3-i7 might also work well under that arrangement, so only poking around as Raistmer's V7 updates are merged in will really point to the best methods.

All I can say with absolute certainty at this point, is that separate per chip builds for every chip, or even class of chips , would not be a sustainable approach, though it worked well in the past. Even 'simply' rebuilding the full AKv8b2 set for the bugfix, was far too much work for that amount of subtle difference. That knowledge could & should be embedded in one app per platform instead, as stock does.

Jason

Mike:
10% isn´t to shabby IMHO.

Jason G:

--- Quote from: Mike on 04 Jan 2012, 08:20:26 pm ---10% isn´t to shabby IMHO.
--- End quote ---

In micro-architectural optimisation terms it's a bucketload. For more impact it's quite possible that newer FFTW may fly on that chip compared to Intels x86 SSE3 (Pentium 4!!!!) library that appears to work best on it now. We'll just have to make the comparisons easy with switches or similar, then hard wire defaults to suit the findings later.

Jason

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version