Major Problem

Associate
Joined
18 Nov 2010
Posts
679
What I'm having to deal with yesterday & today :eek:

HP MSA 2312fc with a 12 bay Expantion shelf 24 disks in total:
Running 3 vDisks
vdisk01 is 11 disks RAID 5
vdisk02 is 5 disks RAID 5
vdisk03 is 5 disks RAID 5
3 Hot Spare's Disks

This MSA Host's all my VMWare Servers 23 of them tobe exact
I have 3 Physical EXS Servers & one physical Backup Server.

The whole company is down all 23 VmWare servers down, Why? you ask
All this from a power cut, UPS's kicked in and shut the lot down as it should do, upon the power coming back on UPS's & I powered it all on.
Then....

MSA has put vdisk 1 & 2 as offline in a degraded state & vdisk 3 is in a Critical state.
All ESX Servers are fine & so is the Backup Server.
I have managed to copy all the Virtual Server image files off the vdisks 1 & 2 to 3 USB HD's (Took what 10 hours or so) but the vdisk 3 is inaccessable This is 4 remaining servers that I cannot loose the current backup for this is over 6 weeks old due to backup failures and incomplete backups the servers in question are the company's Accounts for itself & it's customers including payroll so a big one to loose or revert back 6 weeks, another is a server that hosts a custom database package that has lets just say about 12 years of data on it the other two are minimal as I could rebuild them no problem minor loss as one's a Proxy Server & the other is a Web Server that is currently backed up to a 3rd party web provider.

Been on the dong & bone to HP on & off for the past two days, tried a few things & diags from HP say there is a communication issue between one of the controllers & the mid plane in the MSA not the Expansion shelf.
They say the array's will be intact (they think as it's unlikly that more than 4 or 5 drives can blow in one go) ???? I'm Unsure as none of the Hot Spare's kicked in at all one of the major issues is as we can only gather logs from one of the controllers this reports as ok just degraded array's I could bring these online and boot most of the servers..... but for how long is unsure and if another drive goes then it's GAME OVER for them arrays.

Now waiting for a HP Engineer & parts they say they are bringing a new Chassis which includes the Mid Plane board...... I asked what if its the controller? They said unlikely?????

It's painful just sitting here waiting for this part to tip up when i have the MD's, Cheif Exec & every member of staff i see all asking me how long.......?
Im also thinking HP swap this part and it don't work?

I Have locked myself in the server room LOL

STRESSED IS NOT THE WORD.... WOOOSARRR..... WOOOSARRRR :(
JUST THOUGHT I'D LET ALL THIS OUT, FEEL A BIT BETTER NOW.
 
HP just rang have said 4pm now instead of 3pm..... another hour to kill :-(
Just had a butty from local supermarket deliverd by one of the girls..... awww bless thanks I thought until she asked me why she can't I get her emails? & how long will it be.
OMG..... Shhhhh have told the powers that be that HP and bringing X part but I think it's this, that & the other I said at least it's progress and with this new part we should be able to gather logs from the other controller now and pin point failure.......
 
Oo-err.

Wait for HP to get there and replace what they feel is appropriate - it gives you a get-out for if it's not completely successful.
It's never nice having drives fail, let alone losing data.

You need to look at getting your backup solution sorted. 6 weeks is unacceptable.

Already got my own get out policy for backups not working, 2 months ago I installed a trial of Veeam backup as Backup Exec 12.5 was proving not to be as good or stable, Veeam ran great and worked every time I put the offer to the powers that be about buying veeam, they came back to me as it's too expensive and lets just stick to what we have for now...... Sod's Law 6 weeks later power cut & MSA dies.........

Dare I say I told you so, as with the Veeam backups i could have had most servers back online & running within about 3 - 4 hours once the MSA was fixed of course, now it's a drag and drop import of the virutal server images one by one will take an age from the USB drives there on
 
Last edited:
How you getting on with this?

I managed to get all Virtual servers apart from 2 back online fully yesterday.
the bad news which I thought would have been ok I pulled servers from a corrupt bad array to a good one and like i said all but 2 booted fine,

Bad news is one of the failed servers was te MAIN DC aswell as the FILE server so all users data....

Im currently rebuilding and going to have to restore the AD etc from the backups that are 6 weeks old then chop and change what ever data i can handle the other failed server aint that important anyhow, could be redone in a few hours which will do that once I got the major one fixed.

Still on with it and got some help too now as not sleeped much since monday :o
 
Back
Top Bottom