why do my drives vanish when writing a lot of data to them?

GeX

GeX

Soldato
Joined
17 Dec 2002
Posts
6,994
Location
Manchester
Hi all.

I have;

Gigabyte 965P-DS3 (rev 3)
2x 1TB Samsung HD103UJ'S
2x WD 300gb drives in RAID0

The array holds my OS, Win7 Pro (x64). When copying large amounts of data from one 1TB drive to the other, it'll work for a while and then drive will vanish from the system. The only way to get it back is to power cycle the system.

It doesn't matter which drive i copy from / to - it is always the destination drive that will vanish.

In event viewer I get this error under Disk;

The device, \Device\Harddisk1\DR1, is not ready for access yet.

and this one under atapi;

The driver detected a controller error on \Device\Ide\IdePort4.

Obviously the numbers in the errors change to reflect which drive has vanished.

So far I have set the system back to default clock, and aimed a large fan at the chipset area as it was rather warm.

Below is the SMART data from the two drives.

Code:
----------------------------------------------------------------------------
CrystalDiskInfo 3.5.6 (C) 2008-2010 hiyohiyo
                                Crystal Dew World : http://crystalmark.info/
----------------------------------------------------------------------------

    OS : Windows 7  [6.1 Build 7600] (x64)
  Date : 2010/05/16 19:59:52

-- Controller Map ----------------------------------------------------------
 + Standard Dual Channel PCI IDE Controller [ATA]
   + ATA Channel 0 (0)
     - SAMSUNG HD103UJ ATA Device
   + ATA Channel 1 (1)
     - SAMSUNG HD103UJ ATA Device
 + Standard AHCI 1.0 Serial ATA Controller [ATA]
   + ATA Channel 0 (0)
     - PIONEER DVD-RW  DVR-112D ATA Device
   - ATA Channel 1 (1)
   - ATA Channel 4 (4)
   - ATA Channel 5 (5)
 + JMicron JMB36X Controller [SCSI]
   - GRAID  SCSI Disk Device
 + Virtual CloneDrive [SCSI]
   - ELBY CLONEDRIVE SCSI CdRom Device
   - ELBY CLONEDRIVE SCSI CdRom Device

-- Disk List ---------------------------------------------------------------
 (1) SAMSUNG HD103UJ : 1000.2 GB [1-3-0, pd1]
 (2) SAMSUNG HD103UJ : 1000.2 GB [2-4-0, pd1]

----------------------------------------------------------------------------
 (1) SAMSUNG HD103UJ
----------------------------------------------------------------------------
           Model : SAMSUNG HD103UJ
        Firmware : 1AA01113
   Serial Number : S13PJ1CQ727059
       Disk Size : 1000.2 GB (8.4/137.4/1000.2)
     Buffer Size : 32767 KB
     Queue Depth : 32
    # of Sectors : 1953523055
   Rotation Rate : Unknown
       Interface : Serial ATA
   Major Version : ATA/ATAPI-7
   Minor Version : ATA8-ACS version 3b
   Transfer Mode : SATA/300
  Power On Hours : 6172 hours
  Power On Count : 785 count
     Temparature : 28 C (82 F)
   Health Status : Good
        Features : S.M.A.R.T., APM, AAM, 48bit LBA, NCQ
       APM Level : 0000h [OFF]
       AAM Level : FE00h [OFF]

-- S.M.A.R.T. --------------------------------------------------------------
ID Cur Wor Thr RawValues(6) Attribute Name
01 100 100 _51 000000000000 Read Error Rate
03 _77 _77 _11 000000001E28 Spin-Up Time
04 _99 _99 __0 000000000320 Start/Stop Count
05 100 100 _10 000000000000 Reallocated Sectors Count
07 100 100 _51 000000000000 Seek Error Rate
08 100 100 _15 000000000000 Seek Time Performance
09 _99 _99 __0 00000000181C Power-On Hours
0A 100 100 _51 000000000000 Spin Retry Count
0B 100 100 __0 000000000000 Recalibration Retries
0C _99 _99 __0 000000000311 Power Cycle Count
0D 100 100 __0 000000000000 Soft Read Error Rate stab
B7 100 100 __0 000000000000 Unknown
B8 __1 __1 _99 000000000176 End-to-End Error
BB 100 100 __0 000000000000 Reported Uncorrectable Errors
BC 100 100 __0 000000000000 Command Timeout
BE _71 _65 __0 00001D1D001D Airflow Temperature
C2 _72 _64 __0 00001D1C001C Temperature
C3 100 100 __0 0000000038E2 Hardware ECC recovered
C4 100 100 __0 000000000000 Reallocation Event Count
C5 100 100 __0 000000000000 Current Pending Sector Count
C6 100 100 __0 000000000000 Uncorrectable Sector Count
C7 100 100 __0 00000000000B UltraDMA CRC Error Count
C8 100 100 __0 000000000005 Write Error Rate
C9 253 253 __0 000000000000 Soft Read Error Rate

----------------------------------------------------------------------------
 (2) SAMSUNG HD103UJ
----------------------------------------------------------------------------
           Model : SAMSUNG HD103UJ
        Firmware : 1AA01113
   Serial Number : S13PJ1CQ727058
       Disk Size : 1000.2 GB (8.4/137.4/1000.2)
     Buffer Size : 32767 KB
     Queue Depth : 32
    # of Sectors : 1953525168
   Rotation Rate : Unknown
       Interface : Serial ATA
   Major Version : ATA/ATAPI-7
   Minor Version : ATA8-ACS version 3b
   Transfer Mode : SATA/300
  Power On Hours : 6167 hours
  Power On Count : 783 count
     Temparature : 28 C (82 F)
   Health Status : Good
        Features : S.M.A.R.T., APM, AAM, 48bit LBA, NCQ
       APM Level : 0000h [OFF]
       AAM Level : FE00h [OFF]

-- S.M.A.R.T. --------------------------------------------------------------
ID Cur Wor Thr RawValues(6) Attribute Name
01 100 100 _51 000000000000 Read Error Rate
03 _77 _77 _11 000000001E3C Spin-Up Time
04 _99 _99 __0 00000000031B Start/Stop Count
05 100 100 _10 000000000000 Reallocated Sectors Count
07 253 253 _51 000000000000 Seek Error Rate
08 100 100 _15 000000000000 Seek Time Performance
09 _99 _99 __0 000000001817 Power-On Hours
0A 100 100 _51 000000000000 Spin Retry Count
0B 100 100 __0 000000000000 Recalibration Retries
0C _99 _99 __0 00000000030F Power Cycle Count
0D 100 100 __0 000000000000 Soft Read Error Rate stab
B7 100 100 __0 000000000000 Unknown
B8 __1 __1 _99 000000000096 End-to-End Error
BB 100 100 __0 000000000000 Reported Uncorrectable Errors
BC 100 100 __0 000000000000 Command Timeout
BE _72 _66 __0 00001C1C001C Airflow Temperature
C2 _72 _65 __0 00001D1C001C Temperature
C3 100 100 __0 000000099097 Hardware ECC recovered
C4 100 100 __0 000000000000 Reallocation Event Count
C5 100 100 __0 000000000000 Current Pending Sector Count
C6 100 100 __0 000000000000 Uncorrectable Sector Count
C7 100 100 __0 000000000001 UltraDMA CRC Error Count
C8 100 100 __0 000000000002 Write Error Rate
C9 253 253 __0 000000000000 Soft Read Error Rate

anyone any idea why this might be happening?

ok i've run HDTune and on (B8) (unknown attribute) Its saying Failed. Threshold is 99, data is 374 on one drive 150 on the other.

translates too; 184 End-to-End error Number of parity errors during transfer between the cache RAM and the host.
 
Last edited:
I'd start by checking that all the cables are connected OK. Might even be worth swapping them for new ones and/or using different ports on the board.
 
i've switched to different ports, and it's still doing it - new cables are next on the list :)
 
My guess would be the data cable is duff, Its ok to get it to detect it but as you say your getting "parity errors during transfer between the cache RAM and the host" this sounds like a dodgy cable.
 
how hot is the chipset getting, if it's getting to hot because of the strain you're puttig on it, it might be throwing a wobbler.
 
i suspected the chipset too, so downclocked and pointed a very large fan at it - and it still did the same :(
 
I had this problem with a large sixteen disk array, and it was definately caused by a dud sata cable It really was a pain because the affected drive then logged Smart errors, which you can't reset, despite having no faults and working faultlessly for two years after replacing the cable.
 
Back
Top Bottom