Forum > Discussion Forum
AVX Optimized App Development
Frizz:
--- Quote from: Raistmer on 14 Feb 2011, 04:53:54 pm ---@Frizz
Check your arithmetic.
SSE allows only 4 float instructions per register, not 8.
--- End quote ---
Darn ... I had 4 first, then later modified it to 8 ... got confused with number of registers vs. floating point numbers per register ;)
Point is:
- AVX (Intel flavour) doesn't double the number of operations - only doubles the width of the register files (128 -> 256)
- AVX (AMD flavour) allows to split, so effectively doubles the number of operations performed in parallel compared to SSE.
Raistmer:
Maybe, I'm not looked into AVX ISA yet, I just reading and making corrections ;)
Jason G:
--- Quote from: Frizz on 14 Feb 2011, 04:57:57 pm ---
--- Quote from: Raistmer on 14 Feb 2011, 04:53:54 pm ---@Frizz
Check your arithmetic.
SSE allows only 4 float instructions per register, not 8.
--- End quote ---
Darn ... I had 4 first, then later modified it to 8 ;)
Point is:
- AVX (Intel flavour) doesn't double the number of operations - only doubles the width of the register files (128 -> 256)
- AVX (AMD flavour) allows to split, so effectively doubles the number of operations performed in parallel compared to SSE.
--- End quote ---
Which Is what I am saying code dependancies prevent in legacy SSE code, unless the chip has a special magic loop unroller that will change the number of loop interations.
Raistmer:
AFAIK outlaw made AVX build on SETI forums.
But I didn't see any benchmarks so far... This "just rebuild" approach could give starting point at least, but for now we have no even such point.
Frizz:
--- Quote from: Jason G on 14 Feb 2011, 05:04:54 pm ---Which Is what I am saying code dependancies prevent in legacy SSE code, unless the chip has a special magic loop unroller that will change the number of loop interations.
--- End quote ---
I am aware of the fact the the code needs (more) hand optimization, ifdefs for AVX, Intel, AMD , etc. ... and that we don't get this for free (the magic loop unroller that you mentioned *g*).
Point is:
- It won't matter for Intel AVX (we still only have 4 operations in parallel)
- It might (will imho) matter for AMD AVX (we will have 8 operations in parallel)
No?
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version