It would be good to be more specific about what CPU and GPU are being compared. I believe that the general shape of the curves might apply to all such pairs, but a slow CPU and fast GPU or vice versa would make a significant difference in the ratios. IOW, I think there could be systems where it is always better to do the work on GPU even if that requires most of the CPU power to support, and also systems where the GPU provides only some small productivity increment because both are used but GPU takes a negligible amount of CPU support. Joe
...Joe, if you're watching, is there any way of knowing how many times these big pulse PoT arrays are processed during a run at different ARs? If there's a massive array at 40960, but it's only run once, the problem is much less than the arrays at 32768 which seem to run forever at VLAR....
FFTLen Stepsize NumCfft 8 17.072753 11 16 8.536377 23 32 4.268188 47 64 2.134094 93 128 1.067047 187 256 0.533524 375 512 0.266762 749 1024 0.133381 1499 2048 0.066690 2999 4096 0.033345 5997 8192 0.016673 11995
FFTLen AR<=0.05 AR=0.08 AR=0.16 ------------- -------------- ------------- 8 462(@40960) 16 1035(@40960) 2070(@20480) 32 1457(@32768) 4371(@20480) 8742(@10240) 64 5859(@16384) 17577(@10240) 35154(@5120) 128 23749(@8192) 71247(@5120) 142494(@2560) 256 95625(@4096) 286875(@2560) 573750(@1280) 512 382739(@2048) 1148217(@1280) 2296434(@640) 1024 1.53e6(@1024) 4.6e6(@640) 9.2e6(@320) 2048 6.14e6(@512) 1.84e7(@320) 3.68e7(@160) 4096 2.46e7(@256) 7.37e7(@160) 1.47e8(@80) 8192 9.83e7(@128) 2.95e8(@80) 5.9e8(@40)
Also, another thought. Several people (myself included) have observed and reported that VHAR tasks don't scale well on multi-core hosts: there's a big penalty for running 8 x VHAR on my dual Xeon (memory bus saturation, even with quad-channel FB-DIMMs, we think). Your efficiency cross-over point may be different if you measure with all CPU cores saturated with VHAR work - which would likely be the case after re-branding.
Raistmer, could you provide the raw data from which your chart was derived? I'd like to be able to correlate things better. Joe
That is, no such action should be taken for 10 day cache at all. And better is such rebranding will occur on regular basis pertty often but by pretty small chunks to not to deceive BOINC work amount estimate too much each time.
...In order to re-brand tasks, you have to shut down and restart the BOINC core client: and that's one of the most inefficient operations around....
The first one I tried, BOINC (as expected) immediately fetched more work to re-fill the gaps in the CUDA queue caused by the re-branding. I got mostly VLAR. So I re-branded that, and BOINC replaced it - with VHAR. So I re-branded that, and .... you get the picture.