RAID help

Sp00n · 3 Apr 2009 at 15:58

I am in need of some advice, I have a poweredge 2800 with a perc 4 raid controller.

One single logical drive with 3 partitions across 5 discs (one completely died and pulled out), 4 discs remain, 3 of which are "degraded" but the server is still accessible but it will blue screen if you try and access the c: drive too much.

My dilemma is as such, Can i swap the drives out one at a time to get the controller to rebuild the array itself or do I cut my loses, backup data (been doing that all day anyway) and rebuild with a fresh array and new discs?

iaind · 3 Apr 2009 at 16:00

That should work although you'll experience poor performance during a rebuild, especially on the older perc4.

Rebuild a fresh if you can - much less hassle and wont take as long

Sp00n · 3 Apr 2009 at 16:05

Im not concerned about the degrade in performance, the company has been offline pretty much most of the week as this job has only just been handed to me this morning.

I need to be absolutely certain it will work because I could potentially lose a lot of time on this which is why I have been swaying towards the rebuild as I know for sure that will work but is a much bigger job than rebuilding the whole server.

Do i do the drives one at a time as surely with RAID 5 the data is split across them all and if i remove 3 of the 4 it will just fall over won't it?

Skidilliplop · 3 Apr 2009 at 16:25

Don't risk it, if something goes wrong while a disk is rebuilding then it's all kaput. Backup, rebuild the array, test it to ensure there's no controller issues causing it. Then put your data back on.
Really you shouldn't have the sysroot partition on the same RAID5 as everything else. Common practice is to have a RAID1 mirror for the OS and RAID5 for the data area. Ideally on different controllers.

iaind · 3 Apr 2009 at 16:35

If you're replacing with new disks and it all goes wrong during a rebuild, you'll be able to put the old disks back in and get back to the state you're in.

As Skid said, you cant do that if something goes wrong during a rebuild

Sp00n · 3 Apr 2009 at 17:08

Lol iaind, thats a very confusing statement.

Do you mean "you cant do that if something goes wrong during a reinstall" ?

I'm gonna try and rebuild the array with the replacement discs first thing tomorrow and if it fails I shall start from scratch, Just seems really weird that a raid 5 can stay alive on 1 failed, 3 degraded and one ok.

oh... and for the record, I didn't build the server to start with lol.

ooo, one other thing, dell replaced one of the 146's with a 300, am I right in thinking the perc will just use 146gb of the 300 drive and ignore the rest?

madman045 · 3 Apr 2009 at 17:23

Yes you are right in your thinking, it will ignore the rest of the space.

I had to replace a failed drive in a PE2600, it was a 73GB drive and replaced it with a 146GB drive (different speed too) and it worked ok, just wasted the additional space.

Vertigo · 3 Apr 2009 at 17:40

Backup and rebuild.

If you don't already have something that'll do the job then grab a demo copy of Acronis True Image Echo Server. Image the whole machine, rebuild the array from scratch and restore the backup. Job done.

We use DPM to backup data regularly and the associated SRT utility to take a bare metal image of the servers once a week. One of them is actually a PE2800 with a Perc4 in it! If I ever lose the array or even a single drive off it I'd immediately bin the whole thing, rebuild it (possibly with fresh disks) and restore from backup.

Bit confused by your drive arrangement. You said there are 5 disks but one of them has died but only 3 of the remaining 4 are showing as degraded? Is the fourth drive there at all? If not then that would suggest it was a 4 drive array with a cold standby drive. If it is there then gawd knows what's happened as you can't have part of an array showing as degraded.

s0ck · 3 Apr 2009 at 18:07

Good luck but system and data arrays reaaaally should be separate, even if on the same controller.

Sp00n · 4 Apr 2009 at 09:34

0:0 - degraded
0:1 - fine
0:2 - degraded
0:3 - degraded
0:4 - dead (removed)

0:0 keeps dropping off the array if you try and access the c: drive too much, I'm going to see if i can true image it as that would save me SO much trouble.

I think 0:4 was setup to be the hotspare or something.

Sp00n · 4 Apr 2009 at 10:31

Stupid thing just won't rebuild, not supprising when 4 out of 5 discs are dead or dying

Trying a true image before I cut my losses but I'm not holding out.

/edit drive just drops offline and the image fails, bugger.

iaind · 4 Apr 2009 at 11:12

No, I did mean you cant go back to where you are if something goes wrong during a rebuild. If the rebuild goes pear shaped you're left with some new disks and a knackered array.

If you had taken them all out and started a re-install with the new disks and the reinstall went wrong, you could have got back to the way it was just by putting the disks back in.

Why would you bother posting to ask for advice and then not take it? You had 3 people saying a reinstall is the better option and none saying that an array rebuild is a better option....

Sp00n · 4 Apr 2009 at 12:31

Because an array rebuild was the easier option had it worked.

thanks for the help anyways.

/edit I couldnt "go back to the way it was" because i will be using the good drive from the current array to build the new one.

Slime101 · 4 Apr 2009 at 18:03

Isnt degraded the state of the array not the physical disks, hence when you replace the dead disk and rebuild the array it wont be degraded anymore??

Datamonkey · 4 Apr 2009 at 18:47

Slime,

Exactly what i was thinking,

Sp00n have you been given all the info on the way the server was setup?

You sure 0:1 is the os drive on its own?

The other drives would show degraded as the array is in a degraded state, ie hot spare if it was the hotspare is now missing?

Sp00n · 5 Apr 2009 at 15:00

I tried to rebuild the array many times, openmanager stated 0:0 0:2 0:3 were degraded, it sai something like "pre-empt fail" in the logs somewhere.

0:1 wasn't the OS on its own as it was a RAID 5 spanned across all the discs, its just very perculiar how it was kind of working but not really.

Since then I have backed up everything, installed new discs and now I'm rebuilding a new sbs install, company isn't "that" big and its just not worth my time faffing trying to re-create the domain again.

Skidilliplop · 6 Apr 2009 at 09:28

Sp00n said:
I tried to rebuild the array many times, openmanager stated 0:0 0:2 0:3 were degraded, it sai something like "pre-empt fail" in the logs somewhere.

0:1 wasn't the OS on its own as it was a RAID 5 spanned across all the discs, its just very perculiar how it was kind of working but not really.

Since then I have backed up everything, installed new discs and now I'm rebuilding a new sbs install, company isn't "that" big and its just not worth my time faffing trying to re-create the domain again.

If you have 5 disks and are building from scratch, take the time to do it properly. Have a 3 drive RAID5 for the data store and a RAID1 mirror with the OS and paging space on it. That way if this happens again then your OS and domain setup will be intact.
Though it sounds like you've getten into quite a mess. One thought that does pop up is do you not have daily backups of this server? could you not rebuild then ghost from that? at worst you've lost 24 hours but everything will be up and working as it was.