Forum > Discussion Forum

When corrupted results get validated...

<< < (19/21) > >>

sunu:

--- Quote from: Jason G on 31 Dec 2010, 06:40:11 pm ---Yes the more I think about it, falling back to the slowest, most reliable & proven possible code for -9's and other obvious problems seems like the best way (for the moment) to enforce some kind of sanity.  I don't mind the extra work for that kindof development, so will gear up in that direction as I move toward adding performance improvements we already isolated.

Jason

--- End quote ---

I don't think I like it. So if we are in the middle of a high AR storm the optimized app will be slower even from the stock app since the work will be done twice? Unless I didn't understand well.

perryjay:
Sunu,
as I understand it, a bad  -9 overflow only runs a few seconds . What is being talked about is falling back to the CPU to try running it again just like those that give out of memory messages. Though I am probably wrong about that. It will only effect -9s and will keep a faulty machine from sending in hundreds of them. Those of us with clean running machines shouldn't have any problem with this approach.

Jason G:

--- Quote from: sunu on 31 Dec 2010, 07:17:56 pm ---I don't think I like it. So if we are in the middle of a high AR storm the optimized app will be slower even from the stock app since the work will be done twice? Unless I didn't understand well.

--- End quote ---

Lol, no I wouldn't bother going to effort if it was going to make regular crunching slower  ;).  I would of course just throw a hard error code instead (which likewise avoids contaminating the results, but damages quota & wastes crunch time in another way)

For the most part we're really talking about properly handling situations that shouldn't really ever occur on properly configured, entact hardware.  The genuine -9's are the exception, for which at most the 1 whole CFFT pair where the overflow appeared, rather than the whole task, would be reprocessed (fractions of a second, rather than 100's of seconds).

Miep:
Who's keeping the list?

[Edit: Oh sorry, all already there ::) missed the lastest list over the holidays... - still wondering about the 6.02 though]

I think I found two more hosts with V12 on a GTX460 after a complaint of inconclusives against GPU on NC.
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5305178
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5257703

pulling some more from the database, probably duplicates from when we last checked.
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5293938
5472266
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5149058


Also host 5508489 is running '6.02' ?????? http://setiathome.berkeley.edu/result.php?resultid=1766879380
And doing inconclisives againd x32f - just found another host with 6.02. Ouch.

also quite a few very different counts between x32f and 6.09 - how often should that happen?! I'll better stop looking through inconclusives now...

Richard Haselgrove:
Joe Segur posted a list in number crunching - I've linked from the new thread. I think all your Fermis are already known, though 6.02 is a new (or newly identified) problem.

Edit - the app details for http://setiathome.berkeley.edu/host_app_versions.php?hostid=5508489 indicate it's actually running stock v6.03. Do I vaguely remember that Eric forgot to bump the internal version number on that build, just as stock v6.10 Fermi reports v6.09 in stderr_txt? In any event, although the host clearly has problems, it isn't a mis-use of anonymous platform that's causing it.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version