Real world RAID5 risk

Soldato
Joined
28 Dec 2003
Posts
16,296
Ok so many of us will have read the endless articles about how RAID 5 is completely unsafe and inadvisable nowadays due to the sheer size of disks coupled with their URE rate and the resulting risk of a second drive failure during rebuild.

In reality though...?

Scenario I have is a RAID 1 mirror of two 12TB drives and I need more space. The serious temptation is to add a third drive and convert to RAID 5 as this doubles my effective storage which still protecting me against a single drive failure.

So what are the increased risks here?

The same risks of second drive failure during rebuild also apply to RAID 1 of course, so I presume the reason RAID 5 is targeted is due to the fact that, the larger the array, the more drives have to be fully read without errors for the array to be rebuilt.

In the case of a three drive RAID 5 array however, surely the risk of secondary failure is only doubled as two drives need to be read rather than one? When the drives already have a URE rate of 10^15, the risk is already pretty minimal so is there really that much additional risk by switching to RAID 5? Yes I know the risk is increased but it's a question of weighing that against the benefit of doubling effective capacity for the cost of a single additional drive.
 
The primary risk with RAID5 is the risk at resilver time. It works surprisingly well if you have a bunch of disks bought at different times funnily enough as you're less likely to have more go after the first one.

I personally run ZRAID5 and am planning on moving to unraid or snap-raid when I finish my current server.

When it comes to RAID1 I've never thought it's a good idea personally, other than maybe running the boot disk like that to save some time if that goes. RAID isn't a replacement for backups, and RAID 1 is verging on the cost of just having a separate box with a full backup.
 
When the drives already have a URE rate of 10^15, the risk is already pretty minimal

Actually, no. Remember that that URE is bits, not bytes. If you have a 10 TB drive (10^13 bytes) then that's approx 2.5x 10^15 bits so you can expect that drive to have two or three errors on it.
 
The same risks of second drive failure during rebuild also apply to RAID 1 of course, so I presume the reason RAID 5 is targeted is due to the fact that, the larger the array, the more drives have to be fully read without errors for the array to be rebuilt.

RAID1 (or 10) rebuilds are a lot quicker, due to simply copying from 1 drive to the other. Yes the data is still at risk, but it's at risk for a lot less time.

A URE does still affect a RAID1 remirror but only to the extent that only the "bit" of data covered by the URE is damaged, whereas during a RAID5 rebuild the whole of a stripe is compromised and cannot be reconstructed.

When it comes to RAID1 I've never thought it's a good idea personally, other than maybe running the boot disk like that to save some time if that goes. RAID isn't a replacement for backups, and RAID 1 is verging on the cost of just having a separate box with a full backup.

No version of RAID is a replacement for backups - it is purely for available, to keep your data accessible in the event of a failure. Our servers at work all use RAID1 or RAID10. Not as a backup (as we have duplicate servers for that), but to minimize downtime.
 
Actually, no. Remember that that URE is bits, not bytes. If you have a 10 TB drive (10^13 bytes) then that's approx 2.5x 10^15 bits so you can expect that drive to have two or three errors on it.

I think your maths is a bit wrong.

2.5x 10^15 bits is actually 250 times more than 10^13 bytes

10^15 bits is 113.7GB (proper binary gigabytes, not this decimal garbage, I just refuse to use GiB :p )
 
A URE does still affect a RAID1 remirror but only to the extent that only the "bit" of data covered by the URE is damaged, whereas during a RAID5 rebuild the whole of a stripe is compromised and cannot be reconstructed.

Ah ok that's interesting! So you're saying that, if an error is encountered whilst reading the remaining drive(s) to reconstruct the array, single bit errors in a mirror would just be replicated across to the other drive but an error in a parity array would kill the entire rebuild?
 
Back
Top Bottom