RAID10 array wants to re-build every few hours!

Associate
Joined
8 Mar 2004
Posts
409
Location
London, UK
Hi there,

I have a RAID10 array made up of 4 x 500GB Western Digital Caviar SE16 SATA-II (WD5000AAKS) disk drives. Roughly once a day, my RAID controller spontaneously decides that the RAID volume needs to be re-built; I have no option but to let it rebuild. The SMART status of all four disks is "NORMAL". On one occasion, the RAID controller claimed that the 4th disk in the array was broken and that I should replace the disk. But, after a few minutes, it decided that the fourth disk was absolutely fine and started rebuilding the array.

My RAID controller hardware is the built-in RAID function of the Intel P35 chipset. My motherboard is a Gigabyte P35-DQ6. My RAID software is the Intel Matrix Storage Console (v7.6). As far as I'm aware, I'm running the latest BIOS and drivers. I'm running WinXP32 SP2.

I've tried running the Western Digital diagnostic programme but it can't see the individual drives in the RAID array and I don't have the option of disabling the RAID array (i.e. I have 1TB of data on my RAID array that I need to keep on my RAID array).

Sometimes the computer will freeze for a few seconds while I'm using it and immediately after the computer comes back to life, the RAID controller software will say that it needs to re-build. I don't know if the freezing causes the RAID controller to get confused or if the RAID controller causes the freeze!

Any help would be really, REALLY appreciated! I need my computer to be healthy again!

I have two hunches: the first is that my fourth drive really is dying and that I should replace it. My second hunch is that my cheapo-PSU is doing something ugly and should be replaced.

Unless anyone has any better ideas, my plan of action would be to remove the fourth drive from the array and run an exhaustive set of tests on the drive.

Thanks loads,
Jack
 
When I ran lifeguard diagnostics I didn't have to disable RAID.

Anyway, if it won't work with RAID enabled (unsure as to why as mine does work) then simply disable RAID in the BIOS and boot into the program. After the check is complete on the drives then re-enable RAID and reboot and your array should simply pick up where it left off.

Well, that's how it works on my mainboard.
 
Is it always the same disk that is being reported as the problem drive?

I'd be tempted to run diagnostics on each of the disks anyway, total pain in the rear but at least you can pull one at a time out of the array to run them.
 
Hi guys,

Thanks loads for the replies. I'm backing up my data right now... once the data is backed up then I'll do as you suggest - I'll disable RAID in my BIOS and then I'll test the drives.

Thanks,
Jack
 
Disabling RAID entirely is a bit extreme, you should be able to disconnect a drive at a time and move it onto the JMicron controller, test it and put it back into the main array. You might need to rebuild each time but that can be done online surely.
 
Thanks loads for the reply... that's what I'll do...

Here's my planned workflow:

1) backup!
2) turn off PC. Disconnect drive 1 and connect it to the jmicron controller.
3) turn on PC, run WD diagnostics
4) turn off PC, re-connect drive 1 to the Intel RAID controller
5) turn on PC, allow RAID array to rebuild
6) repeat steps 2-5 for all disks!

Does that sound sensible? Actually, I'll probably start with drive4 seeing as the finger of suspicion points to drive4.

Thanks,
Jack
 
Disabling RAID entirely is a bit extreme, you should be able to disconnect a drive at a time and move it onto the JMicron controller, test it and put it back into the main array. You might need to rebuild each time but that can be done online surely.

Like I said, on my board it's fine. I disable it, boot the PC, the three drives on the array are detected like normal drives, do what needs to be done, re-enable it and then NVidia RAID picks up the array straight away.

Don't see why that is extreme. Far easier in my eyes then faffing about disconnecting drives.

But then again I'm not sure if his board will allow it.
 
What you suggest might work, then again it might not. If it does then great, it'll be far quicker and less disruptive than my method. However we're talking about a different controller so you can't make any assumptions.

RAID10 can cope with losing a disk so it makes sense to take advantage of that while each disk is tested. It's a trade off between reducing the risk in the process and the time it takes.
 
Hi there...

backups are complete... I've removed disk4 and I'm testing it now...

It was quite a faff digging out the cable but I agree with RPStewart - I *know* that removing a single drive at a time shouldn't kill my RAID array.

Thanks,
Jack
 
Back
Top Bottom