Occasional file corruption - how to trace source?

Associate
Joined
15 Oct 2006
Posts
268
Hi folks,

Not sure if this is the right forum for this; mods please move if necessary.

I transfer up to 1TB of data between a server and workstation each day. The files are large (~100MB) ascii grids (simulation result files). Approximately 0.01-0.1% of these files seems to become corrupted at random; typically a few unexpected characters will appear somewhere in a grid and cause our processing scripts to fail.

A few notes:
- errors only appear once files have transferred from server to workstation (i.e. we have not detected any errors or corruption on server side)
- if I zip a good archive on the workstation and then unzip it, sometimes a file in the unzipped variant will corrupt

The second point makes me think the fault lies not with the network but with the workstation (i.e. when the files are being written to the workstation drive). However, the workstation passes memtest and there are no SMART errors showing for the workstation drives.

Any suggestions as to how I might trace the source of error. Pretty certain it has to be hardware related...

Thanks
 
Workstation is non-ecc; we are considering switching to ecc in future

What is odd is that this is a relatively recent issue; the workstation has been happy for a year with the same work package but the erros have only just started creeping in.

I've just checked and the disk (a 6TB Seagate Enterprise job) does have a few reallocated sectors (raw value of 40), not enough to trigger SMART but perhaps the drive is starting to wobble?
 
Drive is a 6TB Seagate Enterprise Nearline (ST6000NM0024). I will swap it out and try that. ECC memory will mean a new workstation so this will happen when we next procure.

Thanks folks
 
Back
Top Bottom