Forum > Discussion Forum

Seti is down again

<< < (40/211) > >>

perryjay:
Naw, it must be my fault, I was taking too many of their precious work units.  :-\

Richard Haselgrove:
Ears burning (again)? Those rednecks deserved it - and they got what they were asking for (fermis blocked from downloading uncrunchable work).

Joe, did you manage to get any of the current work-in-progress code before it went down? And did you get to see my theory that a host which has not reached quota gets its allowance reset at midnight, but a host which has reached quota stays blocked indefinitely - as if the over-quota rejection happens before the on-contact quota reset that David was talking about? That seems to be where part of the problem lies.

Claggy:
Janice (her handle was something else) reported in one of the news feeds that after receiving cuda23 6.09 work, she was now receiving cuda 6.08 work, which she hadn't seen before,
i did have a look at her cuda host, but because of the database display problem couldn't confirm that,
i'm wondering if the error people were getting is because they didn't have Cuda and Cuda23 in their app_info's (or even Cuda_fermi)

Claggy

Brodo:
I think we are suffering a "Murphy Attack" with 3 different issues
1) the Fermi Problem
2) Bugs in the new Server code or a crude stop gap effort to block the Fermi problem
3) A batch of bad work units that slipped through the checking process, possibly due to the noise detecting problem Matt refered to in his last post to the SAH fora.

After reading that V6.08/V6.09 users with non-Fermi cards were also getting -9 errors I checked back through my results. I found that about 1 in 4 CUDA tasks on all crunchers were finishing with this error. All the bad units were all VHAR's from the 24mr10ac.16936.20517.xxx, 24mar10ac.16936.19699.xxx and 24mr10ab.3036.25016.xxx series and .4 AR's from the 24mar10aa.3010.xxx series. I noticed units from these series were also erroring out when crunched on the CPU. Units from different series downloaded at the same time crunched OK on both CPU and GPU.

I also think Richard is correct about daily WU numbers not resetting as they should. Over the last few days the only machines of mine that have been able to get new work are the ones that crunch less than 100 units per day total. My "power crunchers" have gotten nothing for 3 or 4 days.

Brodo

Josef W. Segur:

--- Quote from: Richard Haselgrove on 12 Jun 2010, 07:19:35 pm ---...
Joe, did you manage to get any of the current work-in-progress code before it went down? And did you get to see my theory that a host which has not reached quota gets its allowance reset at midnight, but a host which has reached quota stays blocked indefinitely - as if the over-quota rejection happens before the on-contact quota reset that David was talking about? That seems to be where part of the problem lies.
--- End quote ---

Essentially I have all the available source code which might be pertinent. The checkout which failed was on a section of the S@H repository which is most probably unused for anything which would affect us, and hadn't been changed in more than 2 years.

However, BOINC supplies a generic set of sources for use by the projects, as well as the client side binaries we use. That's all under LGPL. S@H provides application binaries under GPL, so are obligated to have the sources available. They do not provide binaries of the BOINC server side code, and are free to modify the BOINC code in any way they want without providing sources. From observation in normal times, IMO they almost always use the generic BOINC code with whatever set of config.xml parameters seem appropriate. These aren't normal times and I think they're using patched server code which is someplace beween the "server stable" branch and the trunk of BOINC, and quite likely some code tweaks which are in neither. They certainly wouldn't want the broken credit granting at main, but much of the secondary related changes would help with the CUDA problems. It's a situation for Sherlock Holmes, not something which just requires reading the right document.

I did see your analysis on the quota situation. IIRC it makes sense. Whether I can spot anything in the code which would give that effect if only partiially active I don't know, D.A.'s brand of C++ is difficult for me to follow.
                                                                                   Joe

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version