# RAID block consistency check: which block is correct?

## halfgaar

Hi,

The Gentoo wiki page on software raid says this:

 *Quote:*   

> Normally, RAID passively detects bad blocks. If a read error occurs, the data is reconstructed from the rest of the array, and the bad block is rewritten. If the block can not be rewritten, the defective disk is kicked out of the active array.
> 
> Once the defective drive is replaced, reconstruction will cause all blocks of the remaining drives to be read. If this process runs across a previously undetected bad block on the remaining drives, another drive will be marked as failed, making RAID5 unusable. The larger the disks, the higher the odds that passive bad block detection will be inadaquate. Therefore, with today's large disks it is important to actively perform data scrubbing on your array.
> 
> With a modern (>=2.6.16) kernel, this command will initiate a data consistency and bad block check, reading all blocks, checking them for consistency, and attempting to rewrite inconsistent blocks and bad blocks. 

 

In a RAID1, how does it know which of the two blocks is the right one when it finds a mismatch?

----------

## John R. Graham

RAID 1 doesn't compare two good blocks, nor does any RAID level.  RAID 1 normally uses the block from the first disk and, in the event of a hardware disk read error, takes data from the second.  (Note that load leveling makes this a little more complicated, but it's still a good conceptual model.)  Since read errors can sometimes be corrected by merely re-writing, that's what RAID 1 tries next, using data from the good block on the mirrored drive.  If the re-write fails, then the disk is marked bad and, "...kicked out of the active array."  Clear?    :Smile: 

- John

----------

## halfgaar

I'm not talking about normal operations, I'm talking about when you do 

```
echo check >> /sys/block/mdX/md/sync_action
```

It will do an integrity check and report on mismatches. Those can be found in the file /sys/block/mdX/md/mismatch_cnt

The Debian docs say this:

 *Quote:*   

> If, however, while reading, a read error occurs, the check will trigger the
> 
> normal response to read errors which is to generate the 'correct' data and try
> 
> to write that out - so it is possible that a 'check' will trigger a write.
> ...

 

Hence my question, where will it read "correct" data from when a RAID1 array has inconsistent blocks? I'm not talking about a situation when there are damaged blocks, but merely mismatched blocks.

I have an array that has 128 mismatches and I'm curious if that has consequences. I know that grub's savedefault can cause that, but here it's not on the partition that has grub stage2 on it.

----------

## aidanjt

RAID1 provides no guarantees about data integrity.  If there's a block mismatch the array is flagged dirty, and MD will do what John mentioned to attempt to clean the array, MD isn't concerned with whether the data is correct (as in what you actually wrote), just as long as the array is in sync.

----------

## halfgaar

Well, I have an array with 128 mismatched blocks which is not marked as broken. So what does that mean?

----------

## eccerr0r

If you have mismatched blocks in RAID1, likely it means

1- you shut down uncleanly (crash/reboot) -- this is most likely the culprit.  mdraid never got a chance to update both disks

2- you mounted one disk alone and updated it without updating the other disk (oops, user error)

3- bug in mdraid software

4- hard drive returned wrong data (unlikely due to hd ecc checking)

It's up to you to figure out which disk of the RAID1 blocks is the correct one and copy it to the other disk.  If you just want to make it choose one or the other, then just sync them... which means you may have corrupt data that you're copying to the other disk...

----------

## halfgaar

But the thing is, it's not marked as failed, even though I forced a check. Apparently, the driver doesn't consider it a problem. All your hypotheses are protected against by the driver, which would give a notice when you try to activate the array.

----------

## eccerr0r

 *halfgaar wrote:*   

> But the thing is, it's not marked as failed, even though I forced a check. Apparently, the driver doesn't consider it a problem. All your hypotheses are protected against by the driver, which would give a notice when you try to activate the array.

 

The thing is, It's _NOT_ a hardware failure.  md only considers a failure when it detects a bad disk -- when a bad sector appears or a disk does not show up/cannot be read.

Inconsistency is usually user failure.  The user induced the error and mdraid check feature found the difference.  It's not a fatal error to the md driver that the two disks are inconsistent -- just that now it's up to you to find out which one is the 'correct' data.

Contrary to the apparent understanding of the mdraid tools, NONE of the four possibilities are covered by mdraid.  md was meant to cover hardware failures not software/user errors.  Only the first 'unclean mount' scenario should have been detected by md (as nonfatal) -- but most people would ignore the warning as md WILL usually assemble the array anyway, simply warning that the two disks may be different due to unclean shutdown (I know this because I have crashed my md raid5 multiple times and it assembled it anyway, giving a rightful warning).  The others will likely not be detected because they may not touch the md superblock.

I have been blessed that as far as I know I have not seen corruption caused by unclean shutdown of my raid5 array.  I believe the md driver does what it can to prevent inconsistencies no worse than if a single disk suffered a crash but not all cases are covered (caches).

The direct answer to the original question is ... it doesn't know and will never know.  Only the user can tell what the 'correct' data should be when there is a consistency issue.  Even if you have a 3-way or higher RAID1, best-of is insufficient to tell what the correct data is.  It will just pick one (even if it's wrong) and write it to all mirror disks.  Usually it will pick the "master" disk and can typically be "correct" but there's no guarantee this is the case.

----------

