Seti@Home optimized science apps and information

Optimized Seti@Home apps => Windows => Topic started by: Geek@Play on 23 Aug 2010, 11:27:32 am

Title: Qustion about flops
Post by: Geek@Play on 23 Aug 2010, 11:27:32 am
If I understand correctly the SETI servers are attempting to calculate a flops value for each of our science apps and passes that value along with the work unit.

We are attempting to calculate a value for flops to enter into the app_info file for each science app to display estimated crunch times correctly.

Shouldn't these two values of flops be the same and why are we duplicating our efforts?
Title: Re: Qustion about flops
Post by: Jason G on 23 Aug 2010, 12:05:57 pm
If I understand correctly the SETI servers are attempting to calculate a flops value for each of our science apps and passes that value along with the work unit.

We are attempting to calculate a value for flops to enter into the app_info file for each science app to display estimated crunch times correctly.

Shouldn't these two values of flops be the same and why are we duplicating our efforts?

AFAIK correct, though I try to leave that kind of bizzare Boinc design exploration to Joe, for the most part, who I'm sure will chip in.  We all have our strengths and he's just good at that stuff  ;)

There are apparently some unapplied server changes still to come which will display such a figure on the application details page.  Once applied you should be able to use the figure for each of the applications as <flops> entries.  My understanding suggests that might be a good reasonable approximation to keep your project DCF around 1.0, though I need a much more stable and independent characterisation for my needs down the road, and in no small part to keep this 480 fed stably.

The server side changes should eventually work out such that things stay in the ballpark, but I don't expect them to cope particularly well with environmental change including hardware upgrades etc.  All of that remains to be seen I guess, and I'd like to be proven wrong.  I have had issue from the start with uncoupling estimates from computational complexity models (used in computer science) and replacing it with averages, but it should at least stabilise to something fit for purpose.

Jason

Title: Re: Qustion about flops
Post by: Josef W. Segur on 23 Aug 2010, 05:09:49 pm
If I understand correctly the SETI servers are attempting to calculate a flops value for each of our science apps and passes that value along with the work unit.

We are attempting to calculate a value for flops to enter into the app_info file for each science app to display estimated crunch times correctly.

Shouldn't these two values of flops be the same and why are we duplicating our efforts?
For those running stock, the servers supply a complete app_version including a flops value, but running anonymous platform for a project makes the core client use app_version information from the app_info.xml rather than from the servers. A sched_reply just says which of the existing app_versions to use.

Obviously the core client could be modified to accept flops from the servers, and while that change was gradually being adopted by users there would be some which could make use of a server-specified flops and some which couldn't. There are already some server-side actions which depend on what version of the core client sent the request, so it's not impossible, just messy. Dr. Anderson decided that the server scaling should be applied to rsc_fpops_est since that didn't require any core client change at all.

I consider the <flops> entries in app_info.xml an opportunity to control how the core client interacts with the S@H servers. Had we been able to get those entries right before the servers started supplying scaled estimates, IMO the only remaining problems would be from rescheduling work CPU <-> GPU.  Well, a new optimized application with a huge speedup may cause some difficulty too...

Granted, it does also add to the difficulty of what we're doing here, because we're supplying binaries without requiring any kind of test to actually determine how the target system will respond to those binaries. The general intent of the anonymous platform mechanism was that users running a platform for which the project doesn't supply an application could build and test their own offline, after which a suitable app_info.xml could be written. We're outside that intent, so have to do what we can to provide a usable app_info.xml.

For those upgrading from stock, it might be reasonable to just parse the existing app_version sections in client_state.xml and put those into a generated app_info.xml with suitable modifications, including adjusting the flops upward by whatever factor we feel our applications are faster than stock on average. For those upgrading from earlier optimized versions, it may be possible to do something similar but it's difficult to adequately define all the variants which might need special handling.
                                                                                      Joe
Title: Re: Qustion about flops
Post by: Geek@Play on 23 Aug 2010, 07:45:01 pm
Silly me.............I thought there was a simple answer!   8)
Title: Re: Qustion about flops
Post by: Jim_S on 26 Aug 2010, 03:22:16 pm
Is there need to change anything if you are only doing CPU crunching with an OPTI APP? :P
Title: Re: Qustion about flops
Post by: Josef W. Segur on 26 Aug 2010, 09:48:45 pm
Is there need to change anything if you are only doing CPU crunching with an OPTI APP? :P

If you're only doing CPU MB and no Astropulse, the system will work fine. That actually applies to any project where you're using only a single application, the DCF can be reasonably stable without flops. But even doing both MB CPU and Astropulse CPU at S@H is likely to be somewhat problematic. The relative rsc_fpops_est values produced by the splitters were calibrated at SETI Beta before the first release of Astropulse so the combination without flops makes AP estimates high, then for hosts which have done more than 10 MB tasks but fewer than 10 AP tasks the unscaled AP estimates makes the ratio even worse. The long AP estimates inhibit work fetch, a nuisance magnified by 3 day outages.

For those of us using relatively modest hardware, even the AP situation can be handled fairly easily when it arises by editing the AP task's rsc_fpops_est in client_state.xml to make the runtime estimate about right (in effect doing what the servers will do once they have 10 validations averaged). That's the way I intend to deal with it on my Win98SE host, it's running an old BOINC to be compatible with my Win95 host so neither supports flops in an app_version.
                                                                                    Joe