RAID help

Soldato
Joined
18 Oct 2002
Posts
18,296
Location
Brighton
I am in need of some advice, I have a poweredge 2800 with a perc 4 raid controller.

One single logical drive with 3 partitions across 5 discs (one completely died and pulled out), 4 discs remain, 3 of which are "degraded" but the server is still accessible but it will blue screen if you try and access the c: drive too much.

My dilemma is as such, Can i swap the drives out one at a time to get the controller to rebuild the array itself or do I cut my loses, backup data (been doing that all day anyway) and rebuild with a fresh array and new discs?
 
That should work although you'll experience poor performance during a rebuild, especially on the older perc4.

Rebuild a fresh if you can - much less hassle and wont take as long
 
Im not concerned about the degrade in performance, the company has been offline pretty much most of the week as this job has only just been handed to me this morning.

I need to be absolutely certain it will work because I could potentially lose a lot of time on this which is why I have been swaying towards the rebuild as I know for sure that will work but is a much bigger job than rebuilding the whole server.

Do i do the drives one at a time as surely with RAID 5 the data is split across them all and if i remove 3 of the 4 it will just fall over won't it?
 
Don't risk it, if something goes wrong while a disk is rebuilding then it's all kaput. Backup, rebuild the array, test it to ensure there's no controller issues causing it. Then put your data back on.
Really you shouldn't have the sysroot partition on the same RAID5 as everything else. Common practice is to have a RAID1 mirror for the OS and RAID5 for the data area. Ideally on different controllers.
 
If you're replacing with new disks and it all goes wrong during a rebuild, you'll be able to put the old disks back in and get back to the state you're in.

As Skid said, you cant do that if something goes wrong during a rebuild
 
Lol iaind, thats a very confusing statement.

Do you mean "you cant do that if something goes wrong during a reinstall" ?

I'm gonna try and rebuild the array with the replacement discs first thing tomorrow and if it fails I shall start from scratch, Just seems really weird that a raid 5 can stay alive on 1 failed, 3 degraded and one ok.

oh... and for the record, I didn't build the server to start with lol.

ooo, one other thing, dell replaced one of the 146's with a 300, am I right in thinking the perc will just use 146gb of the 300 drive and ignore the rest?
 
Last edited:
Yes you are right in your thinking, it will ignore the rest of the space.

I had to replace a failed drive in a PE2600, it was a 73GB drive and replaced it with a 146GB drive (different speed too) and it worked ok, just wasted the additional space.
 
Backup and rebuild.

If you don't already have something that'll do the job then grab a demo copy of Acronis True Image Echo Server. Image the whole machine, rebuild the array from scratch and restore the backup. Job done.

We use DPM to backup data regularly and the associated SRT utility to take a bare metal image of the servers once a week. One of them is actually a PE2800 with a Perc4 in it! If I ever lose the array or even a single drive off it I'd immediately bin the whole thing, rebuild it (possibly with fresh disks) and restore from backup.

Bit confused by your drive arrangement. You said there are 5 disks but one of them has died but only 3 of the remaining 4 are showing as degraded? Is the fourth drive there at all? If not then that would suggest it was a 4 drive array with a cold standby drive. If it is there then gawd knows what's happened as you can't have part of an array showing as degraded.
 
Last edited:
0:0 - degraded
0:1 - fine
0:2 - degraded
0:3 - degraded
0:4 - dead (removed)

0:0 keeps dropping off the array if you try and access the c: drive too much, I'm going to see if i can true image it as that would save me SO much trouble.

I think 0:4 was setup to be the hotspare or something.
 
Stupid thing just won't rebuild, not supprising when 4 out of 5 discs are dead or dying :(

Trying a true image before I cut my losses but I'm not holding out.

/edit drive just drops offline and the image fails, bugger.
 
Last edited:
No, I did mean you cant go back to where you are if something goes wrong during a rebuild. If the rebuild goes pear shaped you're left with some new disks and a knackered array.

If you had taken them all out and started a re-install with the new disks and the reinstall went wrong, you could have got back to the way it was just by putting the disks back in.

Why would you bother posting to ask for advice and then not take it? You had 3 people saying a reinstall is the better option and none saying that an array rebuild is a better option....
 
Because an array rebuild was the easier option had it worked.

thanks for the help anyways.

/edit I couldnt "go back to the way it was" because i will be using the good drive from the current array to build the new one.
 
Last edited:
Isnt degraded the state of the array not the physical disks, hence when you replace the dead disk and rebuild the array it wont be degraded anymore??
 
Slime,

Exactly what i was thinking,

Sp00n have you been given all the info on the way the server was setup?

You sure 0:1 is the os drive on its own?

The other drives would show degraded as the array is in a degraded state, ie hot spare if it was the hotspare is now missing?
 
I tried to rebuild the array many times, openmanager stated 0:0 0:2 0:3 were degraded, it sai something like "pre-empt fail" in the logs somewhere.

0:1 wasn't the OS on its own as it was a RAID 5 spanned across all the discs, its just very perculiar how it was kind of working but not really.

Since then I have backed up everything, installed new discs and now I'm rebuilding a new sbs install, company isn't "that" big and its just not worth my time faffing trying to re-create the domain again.
 
I tried to rebuild the array many times, openmanager stated 0:0 0:2 0:3 were degraded, it sai something like "pre-empt fail" in the logs somewhere.

0:1 wasn't the OS on its own as it was a RAID 5 spanned across all the discs, its just very perculiar how it was kind of working but not really.

Since then I have backed up everything, installed new discs and now I'm rebuilding a new sbs install, company isn't "that" big and its just not worth my time faffing trying to re-create the domain again.

If you have 5 disks and are building from scratch, take the time to do it properly. Have a 3 drive RAID5 for the data store and a RAID1 mirror with the OS and paging space on it. That way if this happens again then your OS and domain setup will be intact.
Though it sounds like you've getten into quite a mess. One thought that does pop up is do you not have daily backups of this server? could you not rebuild then ghost from that? at worst you've lost 24 hours but everything will be up and working as it was.
 
Back
Top Bottom