Loading APU to the limit: performance considerations

Forum > Discussion Forum

<< < (2/2)

Raistmer:
Same on single graph for direct comparison.

On full CPU load A10-5700 more(!) than twice slower vs i5-3470 for PG MultiBeam set :-\
Hardly it's twice cheaper or consumes twice less power... Having 2 of such APUs and only single IvyBridge feel disappointed

Raistmer:
And C-60 Loveland data.

Picture quite different here (vs Trinity). CPU load scaling almost linear, GPU almost doubled total device throughput/performance.
CPU part load practically doesn't affect GPU part and vice versa.
Though C-60 quite weak device per se (it's netbook APU), its CPU and GPU parts truly augment each other.

Raistmer:
Thanks to EdwardPF's hint about his own config I'll continue exploration of AMD's Bulldozer CPU architecture (with shared FPU and L1 cahes between pair of "cores") on example of Trinity A10-5700 APU (its CPU part).
Now to explore affinity influence I will return to older methodology where additional load comes not from background BOINC processes but from multiple bench instances.
Each bench instance will be pinned to single logical CPU (different CPUs will be used for different runs).
To avoid overall system overload I'll do test with only 2 simultaneous tasks first. So, in some runs tasks will share same "module" of 2 "cores" and for other they will be placed in really different cores.
Another change is Marco Franceschini's FFTW 3.3.5 x64 AVX DLL binary (thanks for it!) that tested to generate AVX codelets on this hardware ("stock" x64 DLL generates only SSE2/register ones).
To allow continuous load each benchmark ended with pair of renamed full BLC tasks (so last PG VHAR could be partially paired with that BLC). As before, CPU throughput will be measure in PS_set/s units.

Raistmer:
Finally I processed all data.
it's amazing how repeatable were that benches.
3 strongly-separated groups of results:
pinned to different modules, pinned to same Bulldozer module, not pinned (superposition of those 2 states).
The size of performance drop for pinned to single module doesn't allow to call those CPUs in the same module as "cores" - they too interrelated for that. Overall picture much more resemble 2-core CPU with kind of hyperthreading than 4-core one.

So, for the SETI project Bulldozer-based AMD CPUs have very impaired performance.

Mike:

Don`t forget you have first generation APU based on Bulldozer.
Kaveri and Carrizo APU`s have better Management.
Same for FX CPU`s.
Steamroller based CPU`s are better than Bulldozer.

Navigation

[0] Message Index

[*] Previous page

Go to full version