Hard drive dying, proof for warranty?

krooton · 19 Aug 2019 at 15:46

My 4TB WD Red that has 1 month left of its 3 year warranty left has started to kick up some read errors in UnRaid.

It is my parity drive, and I have a new one on the way (wanted to add to the array any way), but I was just wondering what should I be doing to test/prove the failures?

Just some SMART tests and the built in Windows scandisk, or are there more 'invasive' tests to show bad sectors and whatnot? Free of course!

Thanks

Steampunk · 19 Aug 2019 at 15:55

Download the Data Lifeguard software WD use to test drives. They'll probably ask you to do this anyway, then you can send the errors with the drive RMA.

FreeStream · 19 Aug 2019 at 15:56

What does SMART show?

Tried swapping the SATA cable to confirm it’s not that?

Interestingly enough, I had to RMA any Red Pro parity drive because when I was preclearing it to be sold it threw some errors too.. WD just accepted SMART errors no proof needed...!

Try running the preclear script on it for three cycles set to erase and clear to see what it does...

pcfarrar · 19 Aug 2019 at 16:07

I've returned loads of drives to WD you don't need proof. Just complete the online RMA, put read errors and send the drive off.

krooton · 19 Aug 2019 at 16:17

Thanks all!

Glad I noticed this before the warranty expired!

krooton · 24 Aug 2019 at 20:53

Well this is weird, Unraid was still showing periodic read errors (649 in total this morning), so I did a full scan with the WD Lifeguard Diagnostics (took all day) and it shows no issues.
Chkdsk also showed no errors.

Is the drive not dying then? (I can return my replacement if so as I haven't opened it yet).

Steampunk · 24 Aug 2019 at 21:14

Chkdsk isn't a good tool for this, because it checks filesystem errors, rather than surface errors. It's possible that errors will only show if you do a write, and then the drive will be forced to remap bad/weak blocks to the spare error. You might have to do a full write across the whole disk, and see if it does any remaps. Then wait to see if more bad blocks appear. There's likely an option somewhere in Lifeguard to do a surface test that will overwrite the data (make sure you have a backup first).

Some tools like Hard Disk Sentinel can do a read and then re-write of the block with the same data to see if it's weak under writing. This allows you to keep your data intact as far as possible. I've done this sort of thing before, and sometimes the disk stabilises once those weak blocks have been remapped, but more often than not if the drive is failing more bad blocks will come up with further use and you'll need to replace the drive anyway.

You've got a replacement, I'd just put that into use and not worry about messing around to try and get bit more life out of what is likely a failing drive.

Avalon · 25 Aug 2019 at 05:37

krooton said:
It is my parity drive, and I have a new one on the way (wanted to add to the array any way)

krooton said:
Is the drive not dying then? (I can return my replacement if so as I haven't opened it yet).

So which is it then? Unraid doesn't throw errors because it feels like it and the parity drive is kind of important/will take the largest hit in terms of writes. If you want to be sure of the drive's condition, put the new one in and pre-clear the old one - pre-clearing will write each sector and read it again over multiple passes.

krooton · 25 Aug 2019 at 09:00

Avalon said:
So which is it then? Unraid doesn't throw errors because it feels like it and the parity drive is kind of important/will take the largest hit in terms of writes. If you want to be sure of the drive's condition, put the new one in and pre-clear the old one - pre-clearing will write each sector and read it again over multiple passes.

I'll give that a try, thanks.

Steampunk said:
Chkdsk isn't a good tool for this, because it checks filesystem errors, rather than surface errors. It's possible that errors will only show if you do a write, and then the drive will be forced to remap bad/weak blocks to the spare error. You might have to do a full write across the whole disk, and see if it does any remaps. Then wait to see if more bad blocks appear. There's likely an option somewhere in Lifeguard to do a surface test that will overwrite the data (make sure you have a backup first).

Some tools like Hard Disk Sentinel can do a read and then re-write of the block with the same data to see if it's weak under writing. This allows you to keep your data intact as far as possible. I've done this sort of thing before, and sometimes the disk stabilises once those weak blocks have been remapped, but more often than not if the drive is failing more bad blocks will come up with further use and you'll need to replace the drive anyway.

You've got a replacement, I'd just put that into use and not worry about messing around to try and get bit more life out of what is likely a failing drive.

The check is more for sending back for warranty replacement, but if there isn't actually any visible issue with drive, surely WD will refuse replacement?

Steampunk · 25 Aug 2019 at 09:15

krooton said:
The check is more for sending back for warranty replacement, but if there isn't actually any visible issue with drive, surely WD will refuse replacement?

The drive will still have logged the number of remaps it's had to do, that will probably be enough for WD.

krooton · 25 Aug 2019 at 09:23

Steampunk said:
The drive will still have logged the number of remaps it's had to do, that will probably be enough for WD.

Would that be reallocated sector in the smart test?

That is showing as 0, raw read error rate is 1392, and multi zone error rate is 14.

Everything else is 0/normal in the test results.

Steampunk · 25 Aug 2019 at 09:37

krooton said:
Would that be reallocated sector in the smart test?

That is showing as 0, raw read error rate is 1392, and multi zone error rate is 14.

Everything else is 0/normal in the test results.

Those numbers aren't always a straight count. The drive manufactures says the numbers mean special things to them. This is the smart attributes for a drive that has 124 reported sector remaps, but nowhere can you see the number 124, unless you happen to know that 7C is 124 in hex.

Code:

5,Reallocated Sectors Count,10,97,97,OK,00000000007C,0,Enabled

Don't stress about it, just send the failing drive back.

FreeStream · 25 Aug 2019 at 09:52

krooton said:
Would that be reallocated sector in the smart test?

That is showing as 0, raw read error rate is 1392, and multi zone error rate is 14.

Everything else is 0/normal in the test results.

Five passes of the preclear script using the erase option.

If it passes then the errors are due to cable/ram/cpu.

krooton · 25 Aug 2019 at 13:35

Thanks guys, put in the new drive and am busy rebuilding parity (6 hours to go), then I will mount the old drive and do a bunch of preclears and see what's what.

krooton · 2 Sep 2019 at 08:06

5 preclears with no issue , but I did realise like an idiot I had parity check running daily, lol. Oh well, new drive just means I've trebled my storage space