It would be good to be more specific about what CPU and GPU are being compared. I believe that the general shape of the curves might apply to all such pairs, but a slow CPU and fast GPU or vice versa would make a significant difference in the ratios. IOW, I think there could be systems where it is always better to do the work on GPU even if that requires most of the CPU power to support, and also systems where the GPU provides only some small productivity increment because both are used but GPU takes a negligible amount of CPU support.CPU and GPU listed in picture footnotes.
Joe
...Yes, it is possible. First, here's a table which applies to all LAR pulse finding:
Joe, if you're watching, is there any way of knowing how many times these big pulse PoT arrays are processed during a run at different ARs? If there's a massive array at 40960, but it's only run once, the problem is much less than the arrays at 32768 which seem to run forever at VLAR.
...
FFTLen Stepsize NumCfft
8 17.072753 11
16 8.536377 23
32 4.268188 47
64 2.134094 93
128 1.067047 187
256 0.533524 375
512 0.266762 749
1024 0.133381 1499
2048 0.066690 2999
4096 0.033345 5997
8192 0.016673 11995
FFTLen AR<=0.05 AR=0.08 AR=0.16
------------- -------------- -------------
8 462(@40960)
16 1035(@40960) 2070(@20480)
32 1457(@32768) 4371(@20480) 8742(@10240)
64 5859(@16384) 17577(@10240) 35154(@5120)
128 23749(@8192) 71247(@5120) 142494(@2560)
256 95625(@4096) 286875(@2560) 573750(@1280)
512 382739(@2048) 1148217(@1280) 2296434(@640)
1024 1.53e6(@1024) 4.6e6(@640) 9.2e6(@320)
2048 6.14e6(@512) 1.84e7(@320) 3.68e7(@160)
4096 2.46e7(@256) 7.37e7(@160) 1.47e8(@80)
8192 9.83e7(@128) 2.95e8(@80) 5.9e8(@40)
Yes, multicore (as mult-GPU) consideration can complicate this picture much.
Also, another thought. Several people (myself included) have observed and reported that VHAR tasks don't scale well on multi-core hosts: there's a big penalty for running 8 x VHAR on my dual Xeon (memory bus saturation, even with quad-channel FB-DIMMs, we think). Your efficiency cross-over point may be different if you measure with all CPU cores saturated with VHAR work - which would likely be the case after re-branding.
Sure, will mail it to you.
Raistmer, could you provide the raw data from which your chart was derived? I'd like to be able to correlate things better.
Joe
That is, no such action should be taken for 10 day cache at all. And better is such rebranding will occur on regular basis pertty often but by pretty small chunks to not to deceive BOINC work amount estimate too much each time.
...
In order to re-brand tasks, you have to shut down and restart the BOINC core client: and that's one of the most inefficient operations around.
...
The first one I tried, BOINC (as expected) immediately fetched more work to re-fill the gaps in the CUDA queue caused by the re-branding. I got mostly VLAR. So I re-branded that, and BOINC replaced it - with VHAR. So I re-branded that, and .... you get the picture.Not nessesary to rebrand all tasks. Some limit can be setted up. In short, in boundaries of deadline such script would improve performance. That is, our goal will write such script and ensure that its work will not lead to deadline misses (for more or less conscious user, even app_info can be devastating weapon in hands of "no-clue" user...)
Do I understand right that only yellow lines should be changed for CPU<->GPU rebranding ?
<rubbish>
...
</rubbish>
<file_info>
<name>27dc08ab.32733.481.6.8.9</name>
<url>http://boinc2.ssl.berkeley.edu/sah/download_fanout/f1/27dc08ab.32733.481.6.8.9</url>
<md5_cksum>d8bf53ae5251691603446976bd9e757d</md5_cksum>
<nbytes>375323</nbytes>
</file_info>
<workunit>
<rsc_fpops_est>23780000000000.000000</rsc_fpops_est>
<rsc_fpops_bound>237800000000000.000000</rsc_fpops_bound>
<rsc_memory_bound>33554432.000000</rsc_memory_bound>
<rsc_disk_bound>33554432.000000</rsc_disk_bound>
<name>27dc08ab.32733.481.6.8.9</name>
<app_name>setiathome_enhanced</app_name>
<file_ref>
<file_name>27dc08ab.32733.481.6.8.9</file_name>
<open_name>work_unit.sah</open_name>
</file_ref>
</workunit>
...
<file>
<WU>
<file>
<WU>
...
<file_info>
<name>27dc08ab.32733.481.6.8.9_1_0</name>
<generated_locally/>
<upload_when_present/>
<max_nbytes>65536</max_nbytes>
<url>http://setiboincdata.ssl.berkeley.edu/sah_cgi/file_upload_handler</url>
<xml_signature>
b849d6e0adcc332ad1601a97d75f4d073ea633ae16b00663e6aa98bac0477c08
3742e0330aae2deee62f2406ddcd1020b3ff02e3cf6f7f77482a97dbc453a489
21fe18199095dda88f172da2d97b1d1cddff23272c832be8e44ba10b38212700
0e5ff950052f3a870c850bb3efa7cefcee57ce02ddcb6473d55526a34ba2dc4f
.
</xml_signature>
</file_info>
<result>
<report_deadline>1240773971</report_deadline>
<wu_name>27dc08ab.32733.481.6.8.9</wu_name>
<name>27dc08ab.32733.481.6.8.9_1</name>
<file_ref>
<file_name>27dc08ab.32733.481.6.8.9_1_0</file_name>
<open_name>result.sah</open_name>
</file_ref>
<platform>windows_intelx86</platform>
<version_num>608</version_num>
<plan_class>cuda</plan_class>
</result>
...
<file>
<result>
<file>
<result>
...
<rubbish>
...
</rubbish>
Fred's script just replaces "608" with "603" in both the <workunit> and <result> sections (matching ones, of course), and deletes the <plan_class>cuda</plan_class> line completely - making it look exactly like a 603 directly allocated to the CPU by the server. That seems the simplest solution: but I'm intrigued by Josef's suggestion. That might be worth a look.Yes, mine too.
How AR can be matched with this field value?
Identify a WU for re-branding by <rsc_fpops_est> - this is a VHAR
Identify a WU for re-branding by <rsc_fpops_est> - this is a VHAR
How AR can be matched with this field value?
Any table or formula exist?