+- +-
Say hello if visiting :) by Gecko
11 Jan 2023, 07:43:05 pm

Seti is down again by Mike
09 Aug 2017, 10:02:44 am

Some considerations regarding OpenCL MultiBeam app tuning from algorithm view by Raistmer
11 Dec 2016, 06:30:56 am

Loading APU to the limit: performance considerations by Mike
05 Nov 2016, 06:49:26 am

Better sleep on Windows - new round by Raistmer
26 Aug 2016, 02:02:31 pm

Author Topic: When corrupted results get validated...  (Read 70549 times)

Offline Claggy

  • Alpha Tester
  • Knight who says 'Ni!'
  • ***
  • Posts: 3111
    • My computers at Seti Beta
Re: When corrupted results get validated...
« Reply #15 on: 05 Jun 2010, 05:29:44 pm »
I had a look for hosts matched with my E8500 / 9800GTX+ / HD5700 that were producing inconclusive/Invalid work:

GeneralFrost hostid=5356245 NVIDIA GeForce GTX 470 (1248MB) driver: 19775

Arles hostid=5355863 NVIDIA GeForce GTX 480 (1503MB) driver: 25715

Balmer hostid=5384948 [2] NVIDIA GeForce GTX 480 (1503MB) driver: 19775

djwhu hostid=5424576 NVIDIA GeForce GTX 480 (1503MB) driver: 19775

Andrew Bazhaw hostid=5423129 NVIDIA GeForce GTX 480 (1503MB) driver: 19775

Ollie hostid=5371034 NVIDIA GeForce GTX 480 (1503MB) driver: 19741

smithwr3 hostid=5293938 [2] NVIDIA GeForce GTX 480 (1493MB) driver: 25715

Chris hostid=5423967 NVIDIA GeForce GTX 480 (1503MB) driver: 25715

Anonymous hostid=4946291 [2] NVIDIA GeForce GTX 275 (895MB) driver: 19745

D. McQueen hostid=4846359 NVIDIA GeForce GTX 260 (895MB) driver: 19713

Rory Isenberg hostid=5255297 NVIDIA GeForce GTX 260 (877MB) driver: 19745

NEG hostid=1931164 [2] NVIDIA GeForce 9600 GT (495MB) driver: 19745

Michael Sangs hostid=5354486 [3] NVIDIA GeForce GTX 295 (895MB) driver: 19107

Tim Lee hostid=5301365 [2] NVIDIA GeForce 9800 GTX/9800 GTX+ (1024MB) driver: 19621

and there were a few more non Fermi hosts,

Claggy  >:(

Edit: another five Fermi's:

Bittkau hostid=5336843 NVIDIA GeForce GTX 480 (1503MB) driver: 19741

William hostid=5414447 NVIDIA GeForce GTX 470 (1248MB) driver: 19775

Anonymous hostid=5419662 NVIDIA GeForce GTX 470 (1248MB) driver: 19745

Setiman hostid=5227589 NVIDIA GeForce GTX 470 (1248MB) driver: 19775

basti84 hostid=5391741 NVIDIA GeForce GTX 470 (1248MB) driver: 25715

edit 2: added more Fermi:

Aaron Danbury hostid=5373696 NVIDIA GeForce GTX 480 (1503MB) driver: 25715

Anonymous hostid=5025277 NVIDIA GeForce GTX 480 (1503MB) driver: 19741

Edit 3: added more Fermi:

My9t5Talon hostid=5419671 NVIDIA GeForce GTX 470 (1248MB) driver: 19775

simi_id hostid=5419256 NVIDIA GeForce GTX 480 (1503MB) driver: 19741
« Last Edit: 06 Jun 2010, 01:31:40 pm by Claggy »

Offline Jason G

  • Construction Fraggle
  • Knight who says 'Ni!'
  • *****
  • Posts: 8980
Re: When corrupted results get validated...
« Reply #16 on: 05 Jun 2010, 05:32:07 pm »
...
Claggy  >:(

I think we should send David A. around to tell them off!  :o


Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: When corrupted results get validated...
« Reply #17 on: 05 Jun 2010, 05:39:07 pm »
And most important thing I see in this list - they not only FERMI GPUS!
But there can be 2 independent problems still:
1) corrupted GPU state of pre-FERMI GPU that produces random pulses and programmatic error in early CUDA app that produces non-random, but invalid pulses on FERMI GPUs.
Second one will always pass into database if 2 FERMI with broken apps meet together.
But first most probably should not pass Validator (our database Guardian in some sense became Blind Guardian ;D ;D ;D )
Problem with broken app for FERMI is resolvable. But will 1) let invalid results go into database too or not - it's hard question that needs some more evidencies IMO. If yes....  ::)
« Last Edit: 05 Jun 2010, 05:46:52 pm by Raistmer »

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: When corrupted results get validated...
« Reply #18 on: 05 Jun 2010, 05:39:51 pm »
Of the Fermi cards (the first 8 in Claggy's list), seven are running the stock v6.09_cuda23 application.

Just one - smithwr3 - is using an app_info, and he's got one of Raistmer's v12 builds. I don't know enough about the std_err to be able to tell whether it's the special one he did for Fermi, and since smithwr3 hasn't posted in the forums since he was struggling with a Mac almost 4 years ago, there's not much to go on.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: When corrupted results get validated...
« Reply #19 on: 05 Jun 2010, 05:45:59 pm »
whether it's the special one he did for Fermi, and since smithwr3 hasn't posted in the forums since he was struggling with a Mac almost 4 years ago, there's not much to go on.
Very low probability he could take that version. And even so, IMO Jason showed already that initial CUDA MB code has programmatic error that "silent" on pre-FERMI GPUs but leads to invalid computations on FERMI . I used same codebase, just rebuilt app with new SDK. That is, V12 in no way FERMI compatible.

Offline perryjay

  • Knight Templar
  • ****
  • Posts: 427
Re: When corrupted results get validated...
« Reply #20 on: 05 Jun 2010, 07:01:02 pm »
Just checked my validation inconclusive and found three or four where my wingman turned in .01. I should be ok on most because the third wingmen are running on their CPUs. I followed out the .01 wingmen and all seem to be running either a 470 or 480. They are also getting a lot of .01 credit claims validated. Again, tracing out their wingmen, they are also running 470/480s. I wonder just how much we are missing because the third man turns in good also. :o

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: When corrupted results get validated...
« Reply #21 on: 05 Jun 2010, 08:09:00 pm »

Of the Fermi cards (the first 8 in Claggy's list), seven are running the stock v6.09_cuda23 application.


And of the five additional Fermis that Claggy has listed, every one is running stock v6.09_cuda23.

Raistmer is absolutely right to say that there are two distinict problems:

1) Random state corruption of older cards
2) Fermi cards running incompatible applications

The point is, the second problem could be solved at a stroke by deploying the v6.10 Fermi app which has been tested - and has passed the test - at Beta.

That's an incredibly easy solution, and would reomove, on Claggy's figures, a hugely significant part of the problem.

The random failures would remain, to be dealt with as we understand the problem further. But that problem has existed for months, without reaching critical mass. If we remove the Fermi co-validators, it should remain insignificant: but the rise of the Fermi means we can't ingore problem #2 any longer.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: When corrupted results get validated...
« Reply #22 on: 06 Jun 2010, 03:59:18 am »
Sure, I though yesterday night project maintenance will bring 6.10 to SETI main, still not ?

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: When corrupted results get validated...
« Reply #23 on: 06 Jun 2010, 05:58:27 am »
No sign of it. But something's going on: looking at Claggy's list, only one (djwhu, 5424576) is still blowing away significant numbers oi WUs - and incidentally confirming that mid-AR suffer the same fate. The latest addition (Aaron Danbury, 5373696) has done a few, but with the new work supply, it would normally have many more. So it seems that they may have put some sort of limiter into the system, but it's not obvious what.

David has got the message (off-list response), but hasn't got a reply from Eric yet.

And the problem is about to get worse - Fermi GTX 465s have landed in the shops, and are already being discounted: I was offered one for £215.99 in a mailshot. Won't interest the hard-core crunchers, but will certainly attract a few into the fit-and-forget segment.

Offline _heinz

  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 2117
Re: When corrupted results get validated...
« Reply #24 on: 06 Jun 2010, 10:21:27 am »
It is to mention, I get no work for fermi application since yesterday.
All wu's are coming are for cpu only.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I think it is blocked till the situation is solved.
 :)

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: When corrupted results get validated...
« Reply #25 on: 06 Jun 2010, 10:28:21 am »
It is to mention, I get no work for fermi application since yesterday.
All wu's are coming are for cpu only.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I think it is blocked till the situation is solved.
 :)
Hard but probably correct decision from Berkeley's side. Hope they will be able to solve this soon cause all they need id to follow Richard's suggestion about 6.10 on main.
Looks not very hard to do actually.

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: When corrupted results get validated...
« Reply #26 on: 06 Jun 2010, 10:37:24 am »
It's not blocked, because I've got four new tasks today. But I think it may be very severely throttled.

Raistmer, people on the main board are still recommending your V12b_FERMI. Has anybody actually tested it, and posted any results? If not, I've just downloaded a copy and I'll run a bench when the 470 is free (GPUGrid task due to finish in ~2 hours). If it doesn't work, as you suggested last night, I suggest you remove the download archive.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: When corrupted results get validated...
« Reply #27 on: 06 Jun 2010, 12:09:24 pm »
It's not blocked, because I've got four new tasks today. But I think it may be very severely throttled.

Raistmer, people on the main board are still recommending your V12b_FERMI. Has anybody actually tested it, and posted any results? If not, I've just downloaded a copy and I'll run a bench when the 470 is free (GPUGrid task due to finish in ~2 hours). If it doesn't work, as you suggested last night, I suggest you remove the download archive.
AFAIK Todd Hebert tested it and found uncompatible with FERMI.
It was at the beginning of corresponding thread.
Later this info somehow modified... So better do short test and close this topic completely. Surely I will remove it if it not compatible indeed.

Offline Richard Haselgrove

  • Messenger Pigeon
  • Knight who says 'Ni!'
  • *****
  • Posts: 2819
Re: When corrupted results get validated...
« Reply #28 on: 06 Jun 2010, 01:18:19 pm »
Ran the bench with a range of Joe's full-length WUs. One of them worked, but three failures: given the ARs involved, I think this needs withdrawing, pronto. Could you remove the archive, please, and get the Mods to lock your thread after a suitable explanation?

While it was running, I looked through Todd Hebert's posts. He says a couple of times that he got the files from you, and even posts an app_info for MB_6.09_CUDA_V12b_FERMI.exe (message 990059). But I don't see anywhere where he posts, or even describes, a test result advising a change of direction. Yet other people, like ScimanStev, describe downloading files from Todd which - it turns out - have the stock app included.

I can't say I'm very impressed by the integrity of this process.

Offline Raistmer

  • Working Code Wizard
  • Volunteer Developer
  • Knight who says 'Ni!'
  • *****
  • Posts: 14349
Re: When corrupted results get validated...
« Reply #29 on: 06 Jun 2010, 01:29:11 pm »
Ran the bench with a range of Joe's full-length WUs. One of them worked, but three failures: given the ARs involved, I think this needs withdrawing, pronto. Could you remove the archive, please, and get the Mods to lock your thread after a suitable explanation?

While it was running, I looked through Todd Hebert's posts. He says a couple of times that he got the files from you, and even posts an app_info for MB_6.09_CUDA_V12b_FERMI.exe (message 990059). But I don't see anywhere where he posts, or even describes, a test result advising a change of direction. Yet other people, like ScimanStev, describe downloading files from Todd which - it turns out - have the stock app included.

I can't say I'm very impressed by the integrity of this process.
Ok, I will recommend to use stock 6.10 from beta then.

EDIT: done.

« Last Edit: 06 Jun 2010, 01:36:49 pm by Raistmer »

 

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?
Members
Total Members: 97
Latest: ToeBee
New This Month: 0
New This Week: 0
New Today: 0
Stats
Total Posts: 59559
Total Topics: 1672
Most Online Today: 355
Most Online Ever: 983
(20 Jan 2020, 03:17:55 pm)
Users Online
Members: 0
Guests: 209
Total: 209
Powered by EzPortal