And here is initial stage of HD6950 ffa_block=ffa_block_fetch parameters curve. Again, it's clear that minimal default params not good for this GPU, more domain size for FFA kernels required. Red vertical line marks point where each CU will have 1 wavefront. Actually, cause workgroup size of all kernels != 64, some CU will have no wavefronts at all and some will have 4 waves in that point.