# Has sector 88 of both Raid 1 disks failed? [MOSTLY SOLVED]

## Fog_Watch

Hello,

My Proliant DL380 G5 boots off an md raid 1 of two Samsung HD204UI 2TB disks.  The disks are attached via eSATA cables to a PCI-X Silicon Image 3124 controller.  

A couple of times /dev/md1 does not come up clean and on one occasion the physical volume that sits on /dev/md1 had disappeared and needed re-creating.  The following relates to my attempts to resolve these issues.

dmesg  does not look nice:

 *Quote:*   

> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> 
> ata2.00: irq_stat 0x00020002, device error via D2H FIS
> 
> ata2.00: failed command: READ DMA
> ...

 

Smartctl concurs, smartctl -a /dev/sda

 *Quote:*   

> Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
> 
> # 1  Extended offline    Completed: read failure       90%      4207         88
> 
> 

 and, smartctl -a /dev/sdb

 *Quote:*   

> Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
> 
> # 1  Extended offline    Completed: read failure       90%      1819         88
> 
> 

 

It does not appear that this problem is being caused by any funky security features, hdparm -I /dev/sda | tail -n 14

 *Quote:*   

> Security: 
> 
> 	Master password revision code = 65534
> 
> 		supported
> ...

 

Does this really mean that sector 88 of both disks has failed?  If so, what do you think the cause of this might be?Last edited by Fog_Watch on Tue Dec 04, 2012 2:48 am; edited 1 time in total

----------

## Ant P.

It's not outside the realm of possibility... if they're both similar disks and they've always been used as mirrors, they'll have mostly identical wear patterns. If they've ever been through a power failure it's possible the heads had an accident at the same place too. Could also both be from a manufacturing batch with a recurring defect.

----------

## Fog_Watch

gdisk -l /dev/sda | tail -n 5

 *Quote:*   

> Number  Start (sector)    End (sector)  Size       Code  Name
> 
>    1            2048            6143   2.0 MiB     EF02  BIOS boot
> 
>    2            6144        10491903   5.0 GiB     8200  Swap
> ...

 

The above suggests to me that sector 88 is unused - so "end_request: I/O error, dev sda, sector 88" is nothing to worry about?

----------

## Fog_Watch

This is what I then did:

Panic,

Get two new drives and get those old ones with the dodgy sector 88 off my server,

Think,

Use Samsung's  estools.  This confirmed the problem with sector 88.  I then used estools to do a 12 hour low level format.  estools then no-longer reported a problem with sector 88.

smartctl concurs: Before; after.

What caused the original sector 88 problem who only knows.  And, why was Reallocated_Event_Count and Current_Pending_Sector = 0.  I would have thought that if there was a problem there would have been some reallocations going on.

Anyway, estools reformat seems to have done the trick.

----------

## salahx

Reallocation only occur on writes. To force a drive to reallocate a bad sector, use "hdparm --write-sector" option on the problem sector - with great caution, (the option is described as  VERY DANGEROUS in the manpage for a reason, as any data in the sector will be lost).

----------

