+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: When corrupted results get validated...  (Read 59101 times)

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
When corrupted results get validated...
« on: 02 Jun 2010, 05:52:39 pm »
... and valid results get thrown out of the window...

See this workunit http://setiathome.berkeley.edu/workunit.php?wuid=609263674

My result was the only valid result from all that garbage. And was marked as invalid  :(

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: When corrupted results get validated...
« Reply #1 on: 02 Jun 2010, 05:58:35 pm »
Hm.... looks like our hope that incorrect overflow gives quite random pulses not fulfilled :(
So such disrupted GPU state even more dangerous for project than was thought before!

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: When corrupted results get validated...
« Reply #2 on: 02 Jun 2010, 06:12:44 pm »
Wish I hadn't seen that. I've got a couple of pendings waiting for a match that look a lot like that.

Offline Claggy

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 3111
    • My computers at Seti Beta
Re: When corrupted results get validated...
« Reply #3 on: 03 Jun 2010, 04:14:30 pm »
Here's one on my host: workunit.php?wuid=618018953  :(

Claggy

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: When corrupted results get validated...
« Reply #4 on: 05 Jun 2010, 10:25:19 am »

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: When corrupted results get validated...
« Reply #5 on: 05 Jun 2010, 10:46:03 am »
Unfortunately they are many. Yesterday I had another one but today it must have been deleted from the database and I can't post the link.

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: When corrupted results get validated...
« Reply #6 on: 05 Jun 2010, 12:11:02 pm »
I think they are trying to keep it a secret!! Looks like as soon as we post a link to one of them they erase it from the database. ;D Must be a conspiracy.

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: When corrupted results get validated...
« Reply #7 on: 05 Jun 2010, 12:18:43 pm »
Could people spotting / reporting this problem please check and report the hardware involved?

I have a horrible feeling that people who just throw a Fermi card into a host and attach, are being issued with the stock Cuda23 application and immediately start trashing WUs.

But we need robust reports from reliable witnesses....

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: When corrupted results get validated...
« Reply #8 on: 05 Jun 2010, 12:32:33 pm »
...I have a horrible feeling that people who just throw a Fermi card into a host and attach, are being issued with the stock Cuda23 application and immediately start trashing WUs....
  From the few I saw, that was the case (470's & 480'wingmen trashing & validating against one another).  I do suspect that was the 'errors with 2.3' situation that Eric alluded to a while back (around fermi release IIRC), which might suggest some sortof flag raised somewhere that let him know, like the noisy wus figure that used to show somewhere (?) .  Further conjecturing (& hoping), some double -9 intercept may be in place, explaining the rapid result removal sooner than the normal 24 hour assimilation/deletion period.  If that's the case, I hope they put those through again for reprocessing.

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: When corrupted results get validated...
« Reply #9 on: 05 Jun 2010, 12:59:37 pm »
Eric's comment (he actually said cuda24, which we're taking to be a typing error) was made on 20 May - well, actually late afternoon 19 May in his time zone - in the course of a conversation with David and me about Fermi issues at Beta. It came just after the corrected Fermi 6.10 application version was loaded for stock download at Beta.

The 'noisy WUs' figure is still showing on the Science status page. Since it's "science", I assume it's driven off the validated results transferred from the BOINC to the science database. Historically, it's been 'about 5%'. Last time I looked, it was down to 1.2%, which I took as a compliment to the Radar removal team. Now, it's showing as 4.8%, which probably reflects the scale of the "pseudo -9" problem.

I'm coming to the conclusion that nobody saw this one coming. I certainly hadn't thought about it until this afternoon, and yet I've been working closely with David / Eric / Jason on BOINC+Fermi issues. Even when I told David (much to his surprise) that the Fermi card wouldn't run the cuda23 app at Beta (during the quota overflow discussion), the penny didn't drop that the situation was already building up at Main.

I have now suggested - on boinc_dev, which is the wrong mailing list, but the only one we've got in the absence of an official seti_technical channel - that 6.10_fermi should be installed as a stock application at Main. I think that's the only sensible way to rescue the situation.

Let's hope that no eager young project puppy runs into the lab this afternoon and loads a pristine box of tapes.....

Offline sunu

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 771
Re: When corrupted results get validated...
« Reply #10 on: 05 Jun 2010, 01:13:52 pm »
Could people spotting / reporting this problem please check and report the hardware involved?

I have a horrible feeling that people who just throw a Fermi card into a host and attach, are being issued with the stock Cuda23 application and immediately start trashing WUs.

But we need robust reports from reliable witnesses....

Well, some of the reported workunits have a fermi card involved while others don't. The workunit from the first post here had 3-4 corrupted results, only one, if I remember correctly, was from a fermi with a cuda23 app. The other workunit I mentioned above had two 2xx cards involved with massive amounts of corrupted -9 results.

Offline Josef W. Segur

  • Janitor o' the Board
  • Knight who says 'Ni!'
  • *****
  • Posts: 3112
Re: When corrupted results get validated...
« Reply #11 on: 05 Jun 2010, 03:48:53 pm »
Suggestion: make copies of the WU and Task detail pages before BOINC purges them. Even better would be to find examples at SETI Beta where purging is disabled, and probably file deletion too. I started looking there, but no luck in finding any. Besides I got distracted by some of the nonsensical credit granting, one of Tetsuji's hosts recently did a set of reissued 0.448 tasks on 6.09 CUDA 23 with claims of 94.12 and grants ranging from 5.87 to 52.01  :o
                                                                               Joe

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: When corrupted results get validated...
« Reply #12 on: 05 Jun 2010, 03:58:59 pm »
Sorry, I didn't notice what my wingmen were running but I'm pretty sure the WUs were showing as 6.09s. I'm running a GT9500  on my vista X86 machine.


edit: Forgot to add I'm running the renamed cudart32_30_14 and cufft32_30_14 DLLs with the 197.45 driver.
« Last Edit: 05 Jun 2010, 04:11:11 pm by perryjay »

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: When corrupted results get validated...
« Reply #13 on: 05 Jun 2010, 04:07:02 pm »
Suggestion: make copies of the WU and Task detail pages before BOINC purges them. Even better would be to find examples at SETI Beta where purging is disabled, and probably file deletion too. I started looking there, but no luck in finding any. Besides I got distracted by some of the nonsensical credit granting, one of Tetsuji's hosts recently did a set of reissued 0.448 tasks on 6.09 CUDA 23 with claims of 94.12 and grants ranging from 5.87 to 52.01  :o
                                                                               Joe

Unfortunately, searching at Beta probably won't turn up many errors, because I've been leaning on David to get them fixed, and this particular problem (issuing work associated with a non-Fermi app, to a Fermi-equipped host) should no longer happen at Beta. There are just a few still visible on 12316.

All that's left is - why am I still trapped by yesterday's quota?
Code: [Select]
Max tasks per day 153
Number of tasks today 273

- and why is credit so erratic?

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: When corrupted results get validated...
« Reply #14 on: 05 Jun 2010, 05:00:40 pm »
Suggestion: make copies of the WU and Task detail pages before BOINC purges them. Even better would be to find examples at SETI Beta where purging is disabled, and probably file deletion too. I started looking there, but no luck in finding any. Besides I got distracted by some of the nonsensical credit granting, one of Tetsuji's hosts recently did a set of reissued 0.448 tasks on 6.09 CUDA 23 with claims of 94.12 and grants ranging from 5.87 to 52.01  :o
                                                                               Joe
[offtopic]
Credit granting on beta absolutely screwed. If you look on granting for AP tasks it will be even more obviously
[/offtopic]

And ontopic: cause first listed WU in this thread had no 2 Fermi GPUs, this problem not only Fermi-related (unfortunately). Looks like _any_ invalid overflow has some probability to be validated  :(

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 28
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 29
Total: 29
Powered by EzPortal