+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: When corrupted results get validated...  (Read 70994 times)

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: When corrupted results get validated...
« Reply #90 on: 31 Dec 2010, 07:17:56 pm »
Yes the more I think about it, falling back to the slowest, most reliable & proven possible code for -9's and other obvious problems seems like the best way (for the moment) to enforce some kind of sanity.  I don't mind the extra work for that kindof development, so will gear up in that direction as I move toward adding performance improvements we already isolated.

Jason

I don't think I like it. So if we are in the middle of a high AR storm the optimized app will be slower even from the stock app since the work will be done twice? Unless I didn't understand well.

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: When corrupted results get validated...
« Reply #91 on: 31 Dec 2010, 07:34:09 pm »
Sunu,
as I understand it, a bad  -9 overflow only runs a few seconds . What is being talked about is falling back to the CPU to try running it again just like those that give out of memory messages. Though I am probably wrong about that. It will only effect -9s and will keep a faulty machine from sending in hundreds of them. Those of us with clean running machines shouldn't have any problem with this approach.

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: When corrupted results get validated...
« Reply #92 on: 31 Dec 2010, 07:35:47 pm »
I don't think I like it. So if we are in the middle of a high AR storm the optimized app will be slower even from the stock app since the work will be done twice? Unless I didn't understand well.

Lol, no I wouldn't bother going to effort if it was going to make regular crunching slower  ;).  I would of course just throw a hard error code instead (which likewise avoids contaminating the results, but damages quota & wastes crunch time in another way)

For the most part we're really talking about properly handling situations that shouldn't really ever occur on properly configured, entact hardware.  The genuine -9's are the exception, for which at most the 1 whole CFFT pair where the overflow appeared, rather than the whole task, would be reprocessed (fractions of a second, rather than 100's of seconds).
« Last Edit: 31 Dec 2010, 07:47:36 pm by Jason G »

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: When corrupted results get validated...
« Reply #93 on: 07 Jan 2011, 05:30:16 am »
Who's keeping the list?

[Edit: Oh sorry, all already there ::) missed the lastest list over the holidays... - still wondering about the 6.02 though]

I think I found two more hosts with V12 on a GTX460 after a complaint of inconclusives against GPU on NC.
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5305178
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5257703

pulling some more from the database, probably duplicates from when we last checked.
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5293938
5472266
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5149058


Also host 5508489 is running '6.02' ?????? http://setiathome.berkeley.edu/result.php?resultid=1766879380
And doing inconclisives againd x32f - just found another host with 6.02. Ouch.

also quite a few very different counts between x32f and 6.09 - how often should that happen?! I'll better stop looking through inconclusives now...
« Last Edit: 07 Jan 2011, 05:43:08 am by Miep »
The road to hell is paved with good intentions

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: When corrupted results get validated...
« Reply #94 on: 07 Jan 2011, 05:49:34 am »
Joe Segur posted a list in number crunching - I've linked from the new thread. I think all your Fermis are already known, though 6.02 is a new (or newly identified) problem.

Edit - the app details for http://setiathome.berkeley.edu/host_app_versions.php?hostid=5508489 indicate it's actually running stock v6.03. Do I vaguely remember that Eric forgot to bump the internal version number on that build, just as stock v6.10 Fermi reports v6.09 in stderr_txt? In any event, although the host clearly has problems, it isn't a mis-use of anonymous platform that's causing it.
« Last Edit: 07 Jan 2011, 05:54:56 am by Richard Haselgrove »

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: When corrupted results get validated...
« Reply #95 on: 07 Jan 2011, 05:55:18 am »
also quite a few very different counts between x32f and 6.09 - how often should that happen?! I'll better stop looking through inconclusives now...

Hehe :)
Usually we all stop to looking for inconclusives right after app release.... And maybe it's very bad practice :)

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: When corrupted results get validated...
« Reply #96 on: 07 Jan 2011, 05:58:36 am »
And more seriously - we have some fancy statistic from SETI servers but few very important pieces are missed completely.
For example, counters that describe inconclusives and invalids rates per host per app version.
If we would have such we could do app "profiling" on quite different level of quality.

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: When corrupted results get validated...
« Reply #97 on: 07 Jan 2011, 06:04:33 am »
Joe Segur posted a list in number crunching - I've linked from the new thread. I think all your Fermis are already known, though 6.02 is a new (or newly identified) problem.

Edit - the app details for http://setiathome.berkeley.edu/host_app_versions.php?hostid=5508489 indicate it's actually running stock v6.03. Do I vaguely remember that Eric forgot to bump the internal version number on that build, just as stock v6.10 Fermi reports v6.09 in stderr_txt? In any event, although the host clearly has problems, it isn't a mis-use of anonymous platform that's causing it.

Yes, thank Richard, saw your reply there, that's when I amended my post here.

'That build' has a problem then - there were quite a few CPU to GPU inconclusives over multiple hosts showing up with 6.02 on CPU - crosschecking

ok, difficult to say what it's valid against, with results being purged so quickly atm, but hosts with this build have difficulties against 6.09 and x32f - I've seen valids against V12 :P
Also valids against 6.09 ::). should have opend a new thread...
« Last Edit: 07 Jan 2011, 06:35:41 am by Miep »
The road to hell is paved with good intentions

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: When corrupted results get validated...
« Reply #98 on: 07 Jan 2011, 06:56:23 am »
Isn't that what we're already talking about in http://lunatics.kwsn.net/gpu-crunching/08jn10ad-4151-19449-3-10-56-test-case.0.html ? (development area link, not available to all)

Offline Miep

  • Global Moderator
  • Knight who says 'Ni!'
  • *****
  • Posts: 964
Re: When corrupted results get validated...
« Reply #99 on: 07 Jan 2011, 07:40:37 am »
If that's stock 6.03 with dodgy stderr showing wrong version number... maybe?

most of inconclusives are GPU -9 and some diverging signal reports plus a few where signal reported match, so something the validator checks that isn't in stderr?
alltogether lots of inconclusives from that corner :(
The road to hell is paved with good intentions

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: When corrupted results get validated...
« Reply #100 on: 08 Jan 2011, 12:13:21 am »
If that's stock 6.03 with dodgy stderr showing wrong version number... maybe?

Yes, Richard recalled correctly; you need to look a few lines above where it says "Application version   SETI@home Enhanced v6.03" to know the actual version number.

IIRC the only difference between 6.02 and 6.03 was an SSE folding variant which had to be commented out because it sometimes crashed.

Quote
most of inconclusives are GPU -9 and some diverging signal reports plus a few where signal reported match, so something the validator checks that isn't in stderr?
alltogether lots of inconclusives from that corner :(

Yes, even when running the intended software, the CUDA cards sometimes produce false result_overflow cases. For that matter, some CPU processing does too, though that's fairly rare. I'll attach an archive with text copies of a WU page and its five task detail pages which is mind-boggling and illustrative of the weird things which can happen.

Most inconclusives get resolved with a correct result being assimilated. This thread is about cases which are exceptions to that rule, plus cases where both of the first two results are almost certainly wrong but agree.

The only thing the Validator looks for in stderr is "result_overflow" and that's only used to set a flag when the canonical result is assimilated. Aside from that, stderr could be a quote from Nietzsche and it would make no difference to validation. It's some details of the signals in the uploaded result file which are checked by the Validator.
                                                                                           Joe

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 40
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 29
Total: 29
Powered by EzPortal