# random read errors with mdraid??? [SOLVED]

## dobbs

Alright, I need a second opinion from a kernel guru.

I dd'ed my windows partition (115GiB) into a file, and then this happened:

```
dobbs@bender ~ $ sudo !-1

sudo cmp -l /dev/sda4 /mnt/storage/tempstore/windows.part 

    51796599  40   0

 16039693943 274 234

 29991661943 201 241

 66805234167 164 124

 69818277623 115 155

 73482455671 202 242

 94468409719 377 337

 95529264119 260 220

 96286320375  17  57

103653245047   6  46

103809902583 173 133

105303325815  40   0

106056683383 163 123

107211539063 112 152

109386836727 215 255

109386836855 104 144

117111876599 210 250

120390354551 312 352

121743028727 365 325

dobbs@bender ~ $ sudo cmp -l /dev/sda4 /mnt/storage/tempstore/windows.part 

Password: 

   390982263 144 104

  9640181367  54  14

  9640181623 262 222

 29991661943 201 241

 31463156343 256 216

 37555086327 346 306

 56837503223  51  11

 69818277623 115 155

 73482455671 202 242

 80509345527 175 135

 80509345655 162 122

 94666073719 343 303

101261748087 151 111

103393197431 344 304

103454269047 251 211

103454269175  56  16

103653245047   6  46

105992555639 150 110

107211539063 112 152

109386836727 215 255

109386836855 104 144

109549263351  56  16

109549263479 363 323

110002149239  52  12

114666473079 167 127

114671000439 171 131

117111876599 210 250

117340243959 376 336

117340244215 276 236

120390354551 312 352

dobbs@bender ~ $ 
```

What's worrisome is that some of the errors repeat (write errors?), but some don't (read errors?).  Worse, none of the operations printed any error messages to dmesg, /var/log/messages or stdio.  The dd operation reported success and no errors.  The cmp operations did not report any read errors.  Shouldn't the block layer find checksum mismatches in this case?

The destination, /mnt/storage/, is a reiser3 partition on a RAID 5 md array.  smartctl doesn't show any hardware errors on the underlying devices, and /proc/mdstat shows a good array.  fsck says the filesystem is fine, though single byte errors wouldn't make much sense in that case.  Kernel is gentoo-sources-3.2.1-r2.

I'm not worried about /dev/sda -- it has yet to exhibit any other symptoms, it's a young drive, I'm not writing anything to it, and it's the least complex of the two setups.  That leaves the RAID-5 array.  It's an old array I set up years ago, but the underlying devices don't report any errors.

So is mdraid just not reliable?  It looks like I'm getting both read AND write errors from that layer.  The lack of error detecting seems absurd.  And I just realized every one of those errors is a difference of 40...  

Did I just hit a bug in the raid456 module or what?!Last edited by dobbs on Fri Apr 06, 2012 6:43 am; edited 1 time in total

----------

## NeddySeagoon

dobbs,

You can't usefully dd anything from a mounted filesystem because you will have open files.  If thats what you did throw away the image and start afgain.

With read errors on a single drive in a raid5 array, you won't notice. Any n-1 from n drives works.

If you suspect the raid array do 

```
echo "check" > /sys/block/mdX/md/sync_action
```

where X is the md node you want to check.

A real totally failed read error will put

```
[231200.568383] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

[231200.568389] ata6.00: irq_stat 0x40000001

[231200.568402] ata6.00: cmd 25/00:08:e8:99:04/00:00:c0:00:00/e0 tag 0 dma 4096 in

[231200.568405]          res 51/40:08:e8:99:04/00:00:c0:00:00/e0 Emask 0x9 (media error)

[231200.575646] ata6.00: configured for UDMA/133

[231200.575666] ata6: EH complete
```

or something like it in dmesg as the kernel resets the interface. If the drive has several goes at the read, you may get something like  

```
SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x002f   140   140   051    Pre-fail  Always       -       18654

  3 Spin_Up_Time            0x0027   253   253   021    Pre-fail  Always       -       1166

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       104

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   092   092   000    Old_age   Always       -       6409

 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0

 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       103

192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       44

193 Load_Cycle_Count        0x0032   102   102   000    Old_age   Always       -       295050

194 Temperature_Celsius     0x0022   126   110   000    Old_age   Always       -       24

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       263

198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       63

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0008   188   166   000    Old_age   Offline      -       3355

```

The meaning of the RAW numbers vary from vendor to vendor.  Check yours.  The importand numbers here are 

```
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       263
```

so the drive has not reallocated any sectors yet but its considering reallocating 263.  The above dmesg and smartctrl -a are real from a dead drive I'm about to get RMAed. I'm having ddrescue work hard on it first.

Write errors would cause an immedate Reallocated_Event, unless the drive had no spare sectors left, then you would get an I/O error.

----------

## dobbs

 *NeddySeagoon wrote:*   

> You can't usefully dd anything from a mounted filesystem because you will have open files.  If thats what you did throw away the image and start afgain.

 

Right.  I should have explicitly stated that both /dev/sda4 and the windows.part file were never mounted during this debacle.  Sorry about that.  It's what I meant when I said I wasn't writing anything to /dev/sda (which is blatantly false anyway; I'm just not writing to /dev/sda4).

 *NeddySeagoon wrote:*   

> With read errors on a single drive in a raid5 array, you won't notice. Any n-1 from n drives works.

 

Which is why I'm confused and frightened.  The system obviously isn't detecting any "errors"; bit 6 (octal 40) just happen to get flipped occasionally.  Given that the partition and the file should be inert, this is an impossible[1] situation.  Specifically, this is a situation I hoped to avoid by constructing the RAID 5 array, and now it looks (to me) like the raid layer is introducing these errors.

1. This event exceeds my improbability threshold.

As for smartctl, one drive has one relocated sector, but it's had it for over a year (I've been keeping an eye on that for a while).  Zero pending relocations across all drives.  I don't believe the underlying drives are the source of corruption.

What brand was your drive there?

Addendum: The array check found zero mismatched.  I will re-copy the partition yet again to reproduce the problem.

----------

## NeddySeagoon

dobbs,

My drive is an WD20EARS.  Thats a green 2Tb drive.  I have five in raid5 and two have died over the last few weeks.

The first one was obvious - mega iowaits.  When I replaced that, the resyc failed as another drive (the one I showed above, has 6 bad blocks.

Bit flipping sonds like dud RAM. Data reads from the HDD into its RAM is CRC protected. Across the raid set, its 'parity protected'

If your resysnc did not produce any errors - your data is self consistant in the raid set.  Thats does not meanits correct, just that all the members of the raid agree on what it is.  Those two things taken together rule out any bit flipping.

If your drives are SATA, the data interface is serial, that only one bit gets flipped during data transmission over a serial link is well beyond my incredability threshold. That only leaves the motherboard and its component parts.

Time to boot into memtest86+ and run a few cycles.

Errors found in memtest86 do not always point to RAM. Its only likely to be RAM if you get the same error at the same address every time.

----------

## dobbs

Sorry to hear about losing the drives, Neddy.  I've been weary of drive reliability since we passed the 500GB mark.  I think that's when "perpendicular recording" became common.  Possibly just me being paranoid, though.  I do need to replace these three drives for various reasons: they're only 320GB, two of them are PATA, more than 45,000 hours operating...  Like I said, this array is old. :)  Unfortunately, I don't know what to purchase anymore.

 *Quote:*   

> Bit flipping sonds like dud RAM. Data reads from the HDD into its RAM is CRC protected. Across the raid set, its 'parity protected'
> 
> If your resysnc did not produce any errors - your data is self consistant in the raid set. Thats does not meanits correct, just that all the members of the raid agree on what it is. Those two things taken together rule out any bit flipping. 

 

Yeah, that's why I was considering an mdraid software bug.  I was grasping at straws.  A possible RAM issue didn't occur to me...  I would expect other system stability issues.  I'm guessing the faulty region of RAM lies outside the kernel memory, and the data buffer runs into it due to the heavy load.  Does that make sense, or am I way off?

I did eliminate mdraid as the culprit, though.  Freed up another drive and duplicated the partition:

```
dobbs@bender ~ $ sudo fdisk -l /dev/sd[ab] | grep -E "sda4|sdb1"

/dev/sda4   *   238774095   477173440   119199673    7  HPFS/NTFS/exFAT

/dev/sdb1            2048   238401393   119199673    7  HPFS/NTFS/exFAT

dobbs@bender ~ $ sudo dd if=/dev/sda4 of=/dev/sdb1 bs=32M

3637+1 records in

3637+1 records out

122060465152 bytes (122 GB) copied, 2242.62 s, 54.4 MB/s

dobbs@bender ~ $ sudo cmp -l /dev/sda4 /dev/sdb1

Password:

          253594999 377 337

          302277623  47   7

          388563063  40   0

          457392375 252 212

          617962103 165 125

          710643831 156 116

          781120759 253 213

          823862263 243 203

          866853367 154 114

         1238579191 141 101

         1238581623  40   0

         1312984567 242 202

         1313322999 270 230

         1482857335 170 130

         1977688311  40   0

         2081347575 376 336

         2120394615  40   0

         2161162231  43   3

         2212050039 173 133

         2263106423  42   2

         2501622135 277 237

         2534076919 355 315

         2565879927 375 335

         2747989111  40   0

         2837622903  41   1

         3005169271  40   0

         3063135095 370 330

         3083515127 163 123

...and lots more

```

sda and sdb are both SATA; my RAID 5 array spans sd[def].  Same issue, same bit, getting worse...  I don't know the significance, but the byte offset mod 128 is always 119.

I'm trying to read one of these errors with hdparm, but neither the left nor right byte values as reported by cmp appeared at the indicated byte offset.  It's possible my math is wrong, but I've checked it three times now.

I will memtest the system while I leave town for the weekend.  Thanks for the insight, Neddy!

Update: After 18 completed passes, memtest (memtest86+ 4.2) showed zero errors.  I'm back to not knowing where the issue lies.  Regardless, I do have more RAM on order.  We'll see if replacing the RAM solves it.

----------

## dobbs

Yep.  Replacing the RAM resolved the issue.  Marking solved.

----------

## NeddySeagoon

dobbs,

I bet putting your old RAM back in would work too.  Thats called 'wiping the contacts'.  It reduces the contract resistance between the plugged in parts and is usually good for 12 to 18 months.

Oh, I lost 3 DVDs tops as I have 2 one block errors and a four block error, all in the area where my DVD rips are stored.

The raid5 is back and WD replaced 2 nine month old drives under warranty.

----------

## kimmie

Neddy,

That load cycle count in your smartctl output looks a little high. Do you know about the nasty head-unloading behaviour of WD20EARS under linux, and how to cure it with WDIDLE.exe? I have some of these drives in RAID5 too... they needed to be spanked before they kept their heads in the right place.

Anyway if you can't find this utility and you need it drop me a PM.

----------

## NeddySeagoon

kimmie,

I'm aware of the  head-unloading  every eight seconds issuse now.  I wasn't when I set up the raid.

I understand that WDIDLE.exe needs to be run under Windows and Windows, or even getting those drives near a box with a GUI, is out of the question.

I'm using 

```
hdparm -S 252 /dev/...
```

 which sets the idle timeout to an hour but I don't thing its the same thing.

hdparm has an option to set the idle3 timeout but its not widely tested, so I have not used it.

----------

## kimmie

Just needs DOS... I had to make a FreeDOS boot floppy and boot that. I'm guessing you could convince FreeDOS to redirect console to serial if you cared enough.

----------

## NeddySeagoon

kimmie,

The drives are in a HP Microserver. There is no floppy interface and no PATA interfacae.

Its USB or (e)SATA

Hmm - I wonder if I could remaster a SystemRescueCD image to put on a USB pen drive, so WDIDLE.exe (and FreeDOS) was one of its image tools.

I can at least test that the floppy boots on another box before I make the ISO

----------

## dobbs

 *NeddySeagoon wrote:*   

> I bet putting your old RAM back in would work too.  Thats called 'wiping the contacts'.  It reduces the contract resistance between the plugged in parts and is usually good for 12 to 18 months.

 

I got around to trying that.  While the problem isn't as severe, it's still there:

```
ubuntu@ubuntu:/mnt$ sudo cmp -l storage/tempstore/windows.part /dev/sdc4

 55485640375 370 330

 58497711927 120 160

 93697501719 116 156

ubuntu@ubuntu:/mnt$ 
```

Still in the sixth bit, but the offsets mod 128 is now 55 and 23 instead of always 119.  Offset mod 256 is 23 for all three, but the sample set is too small.  Different kernel (LiveUSB in this case), memory capacity and physical arrangement, so I'm not going to explore that.

The obvious explanation for fewer discrepancies is that the system has twice the RAM, so the bad bit(s?) isn't used as frequently.  Also, the whole RAM subsystem is operating slightly slower.  The "bad" RAM can run at 5ns latency (CAS 4 at 800MHz), while the new RAM needs at least 5.5ns latency (CAS 6 at 1067MHz).  My motherboard actually runs the RAM at 800MHz and CAS 6 (7.5ns) when both sets are installed, so they're not really operating at their peak.  Might help, might not; that's all conjecture to me.

On a sadder note, the original boot disk died abruptly shortly after configuring the boot array.  I don't know how or why it died; the SMART status was always clean while I investigated the RAM problem.  Now the system won't POST with the drive connected (tried different SATA cables, ports, basic debug procedure).  Unfortunately, I was absent when it happened.  Coincidentally, it's a WD3200KS with a manufacture date of "01 APR 2006", and it died the night of 01 APR 2012.  I kinda want to call Western Digital and ask them if it's just a prank...

----------

