Forum > Discussion Forum

Seti is down again

<< < (86/211) > >>

msattler:

--- Quote from: Geek@Play on 06 Oct 2010, 01:13:39 pm ---I Can't log into my account...............and can't make any post on the forums.

ver mind.........I got logged into my account now.

--- End quote ---
Might have been things just getting resynched.....

msattler:
Jeff just posted the following in the Technical News thread.........

"It's been a painful week, but with some progress.

The server run before last was cut short by our upload space filling up. That was fixed by the bruno migration and we started the last server run a bit early.

But a crash of our primary boinc db machine, mork, got the secondary db server, jocelyn, out of sync. That meant that all of the read only queries had to go to mork instead of jocelyn. This overwhelmed mork and I turned off web access just so the server run could continue. Then mork crashed again Monday evening. Ouch.

Yesterday, we did our normal backup of mork and are recovering jocelyn from that today. The forums are up, but result viewing is disabled at the moment. We need to clear the back end queues ahead of the next server run and mork resources are needed for that.

Mork's tendency to crash seems to have accelerated. Perhaps this is secondary to the cooling crisis we had a couple of weeks ago. Actually, "crash" is not the correct term. It simply hangs and requires a power cycle to boot. Fortunately, we have mork on a networked power strip and can power cycle it remotely. Upon boot, there are no footprints whatsoever as to the cause of the hang. This sounds like hardware. So today we are going to bring mork down to swap out all of the memory and remove a couple of unused components in a desperate attempt to fix the problem. The forums of course will be down during this operation."


So, be prepared for some more downtime this afternoon.  Hopefully all goes well with the mork RAM transplant.

Meow meow for now.

Cosmic_Ocean:
I've actually been thinking about this for a little while now, but I think it is confirmed that there is in fact gigabit going up the hill to SSL now (don't know if that includes the Hurricane Electronics link as well), but SAH is still limiting their usage to 100mbit?  I'm thinking if maybe they could increase it to say.. 150 or 200, downloads would end up going through faster and therefore not have the pipe maxed out for as long.  Or possibly putting the scheduler on a separate VLAN so there are no packet contention issues when the pipe is maxed out for downloads.  Observations over the past few years have shown that most of the problems with ghosts or just failed transfers happens when there are a ton of downloads taking place.

Maybe one possible idea is to cycle the scheduler and download processes.  Just do something like 15 or 20 minutes with the scheduler enabled, but downloads disabled, so you get work assigned, but can't download it, but then turn the scheduler off and let downloads run for an hour or two, and repeat until the pipe is no longer maxed out.  It's a thought.

My area of expertise is in networking, and it seems like there are several ways to probably fix, or at the very least, smooth out some of the rough edges.

No doubt that Oscar will make a marked/noticeable improvement over the current setup, but it seems unlikely that one server will fix everything, especially since most of the problems are more network-related than software-related.

Just my opinion.

msattler:
Bandwidth is NOT the problem right now.....
It's server stability......most notably mork's lack of it.
If everything hangs together on the server side, we can live with the bandwidth currently available.

The problem has been with the servers crashing Seti-side, not the bandwidth.  It may contribute to the ghost problem, but until the servers are all stable, that cannot be established.

Hang in there folks, it's gonna be a bumpy ride.   LOL.

perryjay:
    Dan has contacted HP and Sun to see if they can give us a deep discount on a machine that could replace mork, hopefully deep enough that we can purchase it on what's remaining from the donations made in the Number Crunching threads.



Good luck Dan, that would be great, sort of like two for the price of one. I hope it doesn't cut into the amount of RAM and harddrives you guys were going to get with Oscar though. At least not too much. If it does let us know so we can wake up Kittyman to start another drive.
____________

Eric posted the first part of this and the second was my reply in tech news. Sure would be great to have two new servers, they sure are needed.
As to the bandwidth, the line they ran up was for the whole lab, guess they didn't run the Hurricane link while they were at it. That's another thing that would really help.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version