Forum > Discussion Forum

AVX Optimized App Development

<< < (3/33) > >>

Frizz:

--- Quote from: Raistmer on 14 Feb 2011, 04:53:54 pm ---@Frizz
Check your arithmetic.
SSE allows only 4 float instructions per register, not 8.

--- End quote ---

Darn ... I had 4 first, then later modified it to 8 ... got confused with number of registers vs. floating point numbers per register ;)

Point is:

- AVX (Intel flavour) doesn't double the number of operations - only doubles the width of the register files (128 -> 256)

- AVX (AMD flavour) allows to split, so effectively doubles the number of operations performed in parallel compared to SSE.

Raistmer:
Maybe, I'm not looked into AVX ISA yet, I just reading and making corrections ;)

Jason G:

--- Quote from: Frizz on 14 Feb 2011, 04:57:57 pm ---
--- Quote from: Raistmer on 14 Feb 2011, 04:53:54 pm ---@Frizz
Check your arithmetic.
SSE allows only 4 float instructions per register, not 8.

--- End quote ---

Darn ... I had 4 first, then later modified it to 8  ;)

Point is:

- AVX (Intel flavour) doesn't double the number of operations - only doubles the width of the register files (128 -> 256)

- AVX (AMD flavour) allows to split, so effectively doubles the number of operations performed in parallel compared to SSE.

--- End quote ---

Which Is what I am saying code dependancies prevent in legacy SSE code, unless the chip has a special magic loop unroller that will change the number of loop interations.

Raistmer:
AFAIK outlaw made AVX build on SETI forums.
But I didn't see any benchmarks so far... This "just rebuild" approach could give starting point at least, but for now we have no even such point.

Frizz:

--- Quote from: Jason G on 14 Feb 2011, 05:04:54 pm ---Which Is what I am saying code dependancies prevent in legacy SSE code, unless the chip has a special magic loop unroller that will change the number of loop interations.

--- End quote ---

I am aware of the fact the the code needs (more) hand optimization, ifdefs for AVX, Intel, AMD , etc. ... and that we don't get this for free (the magic loop unroller that you mentioned *g*).

Point is:

- It won't matter for Intel AVX (we still only have 4 operations in parallel)

- It might (will imho) matter for AMD AVX (we will have 8 operations in parallel)

No?

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version