GPU/CPU performance dependence from AR value

Forum > GPU crunching

(1/6) > >>

Raistmer:
I did some extensive benhmarking of artifical tasks that differ only in AR value.
GPU/CPU performance ratio depends from AR non-monotonic as showed on next picture:

That is, not only VLARs but all tasks with AR more than 1.127 (with precision of this experiment) experience performance drop when executed on GPU (relatively their execution on CPU).
Also, VLAR limit can be refined from this graph.

It's worth to do GPU->CPU "rebranding" for low and high ARs while doing CPU->GPU "rebranding" for midrange ARs.
This can substantionaly increase host performance.

(for performance ratio estimation two metrics was used:
1) CPU elapsed time to GPU elapsed time ratio
2) CPU elapsed time to (GPU elapsed time+ CPU tme of GPU task) ratio.
I think second metric is more accurate cause when GPU task actively using CPU it leaves GPU idle and interfere with task processed by CPU thus lowering not only its own performance but performance of another task too. That's why I sum GPU task total elapsed time and its CPU time)

Gecko_R7:
Your results show a valid reason for better task management & WU allocation/assignment between CPU & GPU.
A "legitimate" use for cherry-picking to say, that would have a net (+) for the project, as well as the user.
This is a good study & direction to explore further.

Josef W. Segur:
It would be good to be more specific about what CPU and GPU are being compared. I believe that the general shape of the curves might apply to all such pairs, but a slow CPU and fast GPU or vice versa would make a significant difference in the ratios. IOW, I think there could be systems where it is always better to do the work on GPU even if that requires most of the CPU power to support, and also systems where the GPU provides only some small productivity increment because both are used but GPU takes a negligible amount of CPU support.
Joe

Raistmer:

--- Quote from: Josef W. Segur on 17 Apr 2009, 04:47:11 pm ---It would be good to be more specific about what CPU and GPU are being compared. I believe that the general shape of the curves might apply to all such pairs, but a slow CPU and fast GPU or vice versa would make a significant difference in the ratios. IOW, I think there could be systems where it is always better to do the work on GPU even if that requires most of the CPU power to support, and also systems where the GPU provides only some small productivity increment because both are used but GPU takes a negligible amount of CPU support.
Joe

--- End quote ---
CPU and GPU listed in picture footnotes.
Just dubbing here:
Host: CPU Q9450 stock freq, GPU 9600GSO 705/1900/2220 MHz

Completely agreed, absolute numbers will be different for different GPU/CPU pairs but I think for anygiven CPU/GPU pair one range of AR will be better to do on GPU and another one on CPU.
Moreover, these AR ranges will be the same as for my config. That is, though Y-axis values depends from system used in test, X-axis depends only from GPU and CPU architecture differencies themselvs and current CPU and CUDA realizations of SETI MB algorithm. One could renormalize Y-axis to 1 and recive some normalized GPU/CPU perfomance ratio picture.

About extreme cases:
1) GPU very sow CPU is very fast. One have choice not to use GPU at all, but if he will use it better to use it in most effective range of ARs. This range separation will apply.
2) GPU(s) is(are) very fast, CPU is very slow - well, this scenario can lead to situation when no tasks assigned to CPU at all, all that CPU does is feeding of GPU(s). No rebranding needed but CPU doesn't do any tasks.

In other cases this AR separation between CPU and GPU should increase host performance.

Richard Haselgrove:
Can I urge a note of caution in re-branding VHAR tasks from GPU to CPU? You'll need to put some alternative means of cache size limitation in place too.

Everybody who's been around for more than a few months will know, by experience, that Matt is quite capable of serving up a 'shorty storm' - injudicious choice of tapes can mean that all, or a very high proportion, of tasks come into the VHAR category.

Consider what would happen. BOINC requests CUDA work: gets mainly VHAR. Re-brand those to the CPU queue: the CUDA queue is now light on work, and will be re-filled. Rinse and repeat.

You'll end up with a huge number of VHAR queuing for the CPU(s). BOINC will run them in High Priority, of course, but that's no guarantee that they can all be processed within 7 days. I'm not exactly certain whether CPU High Priority would inhibit CUDA work fetch for the project - I suspect not, and I think that most people here would argue that it shouldn't. So there would be no inhibition on runaway cache filling except splitter capacity and host daily quota. Anyone got a CPU that can process 1,000 tasks a day, even shorties?

I'm seeing this in a limited way with VLARs. I was away over the holiday weekend, so I set my cache high so I could re-brand any VLARs before I left. 8 days later, one box still has over 50 VLARs from that initial batch slowly chugging through the CPU. They won't be a problem (03 May deadline, and I have plenty of spare firepower on the Quad if things ever get tight), but it's an example of the tight squeezes you can get into with re-branding.

Given the propensity we've seen here for SETIzens to download anything in sight with go-faster stripes on, and turn all the knobs up to 11 without reading the instruction manual first, some sort of limitation - especially for the slow CPU / fast GPU mis-matched hosts - would avoid a lot of disappointment.

Navigation

[0] Message Index

[#] Next page

Go to full version