Yeah - it does look like we are getting there...
Today was a "normal" Tuesday outage to back up the mysql database. You may have noted the result table sizes have dropped considerably since we turned on the "resend-lost-results." Hopefully this solved a lot of the ghost workunit problems people have been wondering about forever. If the database can handle it, no reason to leave that setting as is. A lot of people also noticed the server status page line "Results returned and awaiting validation" should really read "Results returned and awaiting validation as long as all the other back-end queues are zero." So most of the time this reads correctly, but if there's a large backlog somewhere this can be quite misleading. It's a painful query to get exactly what we want all the time, so fixing this is low priority.
Meanwhile, after the outage we started the splitters up (though there were some initial configuration snags that required a quick shut down and restart). Actual new work is being generated and sent.
So here we are.
<sound of champagne cork>
Well, not so fast. I'd say we're "at the light at the end of the tunnel" as far as the public side is concerned, but there is still major cleanup on the inside before we're fully out of the tunnel. Some agenda items include:
1. Getting oscar up to speed: Right now it's operating pretty much as fast as thumper (which seems disappointing at first), though without using any CPU or disk i/o (which means it's able to do a LOT MORE if we tell it to). That's because informix is configured exactly as it was on thumper, so there are some artificial bottlenecks in place. We're collecting stats to understand what knobs to turn, and then we'll really crank them up.
2. Converting thumper to it's new role as internal file server: Remember that our main internal file server (which houses a bunch of important, heavy-random-access data and accounts) is as much of a crashy liability as mork was. So this conversion still needs to take place, but can happen over time while we're live.
3. Basic electrical stuff: Jeff and I tried to move as much around as possible, but there's still some server closet power issues to address.
4. All the tiny specks of sysadmin revolving around replacing old servers with new ones (dangling mounts, dead entries in /etc/hosts, zillions of scripts referring to now-defunct paths, etc.).
I'm also busy revving up the engine to start sending out the annual end-of-the-year news/funding drive mass e-mail. I know many of you already donated in some form or another (thank you!) but this sort of thing needs to happen. I apologize for any redundancy on this front.
- Matt
The splitters are running and work is being created
I have set all projects except MW to NNT and Im preparing for the chaos that will be the grab for wu's once things are turned back on.
It will be good to get back to seti - might have to get another 470 to celebrate
At least MW is keeping the room warm, 2 wu's per card and GPU's at a nice toasty 80-85C...