Forum > Discussion Forum
When corrupted results get validated...
Richard Haselgrove:
Could I ask all Lunatics to review http://setiathome.berkeley.edu/forum_thread.php?id=62573 ?
The old problem of false -9 overflows, caused by outdated applications running on Fermi GPUs, is still with us, and still polluting the database with junk science.
Back in June when this thread was started, the major problem was the stock applications. We got the project to clean up its act, and the stock apps are now working properly for the 'set-and-forget' types.
Which means that the remaining problems are attributable, almost exclusively, to third-party applications: V12 vlar-autokill, in the linked thread.
As I've written in that thread, vlar-autokill was a fit and proper app in its time, and I have no criticism of Raistmer for releasing it. But it's now an embarrassment.
So, how can we cork the beastie back into its bottle? Unfortunately, I can't think of a way - short of the nuclear option of the project blocking all anonymous platform apps. Maybe some bespoke programming could be put into the scheduler to selectively block known bad app/hardware pairings, but I can't imagine project staff being happy about diverting scarce development time into doing that.
But I think there are two things we ought to consider doing.
The first is to be much harder on the ill-informed message board "advisors" - people like Sutaru and skildude - who advocate optimised applications as the cure-all for everything, but consistently fail to pass on the associated responsibility for understanding and long-term monitoring.
And secondly, how about building a 'suicide pill' into the apps themselves? Maybe in the first place for Beta apps - nobody should run a Beta app for longer than, say, one month: and if they are still actively testing after that time, an enforced re-install isn't too much of a problem.
The trouble is that I don't think that anything short of a physical block (suicide pills are common for trialware) will catch the sort of users I've linked in that thread - no message board activity, no team membership. And I'm sorry, but NO: I'm not going to start sending out unsolicited PMs and emails.
The thing that worries me most of all is that I can't see users like that coming here and collecting optimised apps on their own (even though they would at least see the warnings if they did). I'm beginning to wonder how many re-hosting websites there might be out there - overclockers, BOINC team sites, that sort of thing - which might be distributing Lunatics apps with no 'best practice' advice whatsoever.
Postscript: while previewing, I saw that the previous post in this thread concerned the very same host 5293938 that also featured today. So that's over four months the problem has continued unchecked.
Jason G:
Perhaps redefining/updating to a new version/planclass, disabling work for all existing ones is an option. Forcing a stock app update, and matching planclass update for newer opt apps. I don't know enough about that Boinc app distribution mechanism to know if & how that would work.
I wouldn't mind developing an autoupdater for future production releases here. There will be those that will still circumvent both the stock & opt update anyway, some 'legitimately' , some to defiantly run what they want anyway. The science process itself needs to catch these with the validation & quota mechanisms (and subsequent science process of course), since user specific configuration & 'jiggering' might be considered as having similar destructive potential as anywhere from a cosmic ray bit-flip to a massive hardware failure. That goes for any app, not just GPUs. I'm sure there are brand machines that just shouldn't crunch at all, people that just should not be allowed near computers. Unfortunately we're not the PC police, though maybe we should be ;)
Promoting use of outdated known buggy builds, old drivers & outright 'jiggering' has gone on in the past. Especially when directed toward inexperienced users I've always found it more than a bit frustrating, and had to put a stop to it in one specific occasion I've seen it here. In one particular instance massive argument ensued & only ended with me banning the user to think about it, which sadly escalated the argument, forcing Admins hand (not mine) to permanently delete the users' account. Along with security concerns, that also resulted, in part, in the tightening of beta participation requirements & restrictions here to more select group.
While we aren't the computer police, we don't have to put up with bad advice here, and can do our best to correct faulty advice where we spot it, and try to come up with ways to encourage doing the right thing. Unfortunately in the case of problems inherited from the Fermi incompatibility, I don't see a lot of ways to encourage that other than simply making newer releases better, more widely compatible, AND faster, which is proving to be quite long road.
Jason
Raistmer:
I agree, we are not computer policy, not M$ and sometime our development time scarce too, btw ;)
Effectively disabling malfunctioning participants is BOINC (I repeat, BOINC, not project staff) prerogative. We need framwork for doing common things with it, not just bloatware as new BOINC versions become more and more alike. I seriously thinking sometime to write perl script to process all tasks in directory W/O BOINC and launch it only for network communications.
If plan class/version limits not effective - new means should be integrated in BOINC IMO. It's impossible to create app that will work on every still not even existed hardware where some idiots would like it to run. I'm truly can't understand how someone can use not FERMI-compatible app on FERMI GPU if it gives errors alomost constantly, people just never look in result page maybe?...
About "suicide pill" - if it's implemented as library with enough easy to use interface I'm ready to include it in my builds. But have no intentions to develop such thing.
P.S. And about bad advices... It's true problem, IMHO, but recently I'm just tired to argument with bad advice. I'm just trying to give more correct answer to original poster w/o discussions with uneducated but active ones. Life is short....
Jason G:
--- Quote from: Raistmer on 30 Dec 2010, 12:36:22 pm ---About "suicide pill" - if it's implemented as library with enough easy to use interface I'm ready to include it in my builds. But have no intentions to develop such thing.
--- End quote ---
I'll 'consider' it as possibility for future release, though it doesn't, of course, solve either the outdated release, or the intentionally 'jiggered' environments, so I'm approaching it (the whole idea) with some scepticism (probably a good thing).
--- Quote ---I'm truly can't understand how someone can use not FERMI-compatible app on FERMI GPU if it gives errors alomost constantly, people just never look in result page maybe?...
--- End quote ---
Yes, partly that. And now add to that certain people espousing overriding the stock app with stock Cuda23 (via app_info) insisting that's the fastest... got it ? ( Problem Logical conclusions evolving from that may include that v12 VLArKill would be a good idea to use on Fermi ... IT ISN'T for anyone reading, don't do it! )
Raistmer:
BTW, if I recall right, there was some similar problem with Einstein project and one of Akosf (not sure I reproduced nickname right, Akos ) opt builds. It failed to process new data correctly.
Also same problem appeared at least once on MW, again, with 3-rd party app.
What they (project admins) did in those cases? Maybe SETI project can learn from it ?
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version