February 28, 2006 - 21:15 UTC
We had a planned outage today to remove a couple more items from the server closet (the Classic SETI@home data server and several large, heavy disk arrays which contained the old science database). In order to safely do so, we wanted to power down several important machines so they wouldn't accidentally get bumped and go down ungracefully.
The Bay Area is having a rough winter, and a storm today brought lightning which knocked out power to the entire campus, including our lab, around 8am. Most of the servers went down without a hitch. And with the power off anyway we went ahead and cleaned up the closet as planned. We can now get behind the racks again without painful contortion.
Powering up the entire network is painful, as servers need to revive in a set order, and many hidden mounting issues come to light (that only get tickled by a reboot). Plus some drives needed some fsck'ing. Everything eventually booted up just fine, except for the master science database.
One of the fibre channel loops disappeared on this particular server. Bad cable? Bad GBIC? Not sure just yet, as the terminal wasn't working well enough to give us all the boot diagnostics. We hooked up a laptop and fought with hyperterm to see these messages, but by the time we got that working the machine booted just fine for no explicable reason... but all the metadevices needed to be resynced. This resync could take up to 24 hours, during which the master science database will be down. That means no splitting and no assimilating, and we'll probably run out of work to send before too long. Oh well.