# strange issues with raid6 (file corruption or kernel oops)

## matt2kjones

Hello,

I have a raid6 array with a damaged hard drive.  However, when a write error occurs on the array, it doesn't fail the harddrive, instead one of two things happen:

If I'm using kernel 3.18.12, it will log messages to dmesg saying I/O error, and the file on the array will be corrupt.  The array does not fail the disk, as it should, so I end up with tons of corrupt files  :Sad: 

If I'm using any 4.x version of kernel (I have tried both 4.0.9 and 4.1.12) then when a write error occurs, I get a kernel oops logged to dmesg and all I/O to the array will hang.  I have to forcefully reboot the server, because a ton of processes get stuck in state D, and the discs are never marked as failed.

Here is the output from dmesg of a write error when it occurs on kernel version 3.18.12:

```
 172.679073] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 1052672 starting block 5172953088)

[  172.679076] Buffer I/O error on device md4, logical block 5172953088

[  172.679078] Buffer I/O error on device md4, logical block 5172953089

[  172.679078] Buffer I/O error on device md4, logical block 5172953090

[  172.679079] Buffer I/O error on device md4, logical block 5172953091

[  172.679080] Buffer I/O error on device md4, logical block 5172953092

[  172.679081] Buffer I/O error on device md4, logical block 5172953093

[  172.679082] Buffer I/O error on device md4, logical block 5172953094

[  172.679082] Buffer I/O error on device md4, logical block 5172953095

[  172.679083] Buffer I/O error on device md4, logical block 5172953096

[  172.679084] Buffer I/O error on device md4, logical block 5172953097

[  172.983977] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 1576960 starting block 5172953216)

[  173.489071] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 2101248 starting block 5172953344)

[  174.330710] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 2625536 starting block 5172953472)

[  175.123257] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 3149824 starting block 5172953600)

[  175.406390] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 3674112 starting block 5172953728)

[  175.608958] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 4198400 starting block 5172953856)

[  175.968224] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 4722688 starting block 5172953984)

[  176.130072] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 5246976 starting block 5172954112)

[  176.215623] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 25165824 size 6819840 starting block 5172954240)

[  177.925267] EXT4-fs warning: 6 callbacks suppressed

[  177.925270] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 33554432 size 1052672 starting block 5172955136)

[  177.925271] buffer_io_error: 2038 callbacks suppressed

[  177.925272] Buffer I/O error on device md4, logical block 5172955136

[  177.925274] Buffer I/O error on device md4, logical block 5172955137

[  177.925275] Buffer I/O error on device md4, logical block 5172955138

[  177.925276] Buffer I/O error on device md4, logical block 5172955139

[  177.925276] Buffer I/O error on device md4, logical block 5172955140

[  177.925277] Buffer I/O error on device md4, logical block 5172955141

[  177.925278] Buffer I/O error on device md4, logical block 5172955142

[  177.925279] Buffer I/O error on device md4, logical block 5172955143

[  177.925280] Buffer I/O error on device md4, logical block 5172955144

[  177.925280] Buffer I/O error on device md4, logical block 5172955145

[  178.642566] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 33554432 size 1576960 starting block 5172955264)

[  179.078914] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 33554432 size 2101248 starting block 5172955392)

[  179.976324] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 33554432 size 2625536 starting block 5172955520)

[  180.782833] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 33554432 size 3149824 starting block 5172955648)

[  181.333570] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 33554432 size 3674112 starting block 5172955776)

[  181.820475] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 33554432 size 4198400 starting block 5172955904)

[  183.171425] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 361872033 (offset 33554432 size 4722688 starting block 5172956032)

[  183.171428] buffer_io_error: 886 callbacks suppressed

[  183.171429] Buffer I/O error on device md4, logical block 5172956032

[  183.171431] Buffer I/O error on device md4, logical block 5172956033

[  183.171432] Buffer I/O error on device md4, logical block 5172956034

[  183.171433] Buffer I/O error on device md4, logical block 5172956035

[  183.171434] Buffer I/O error on device md4, logical block 5172956036

[  183.171435] Buffer I/O error on device md4, logical block 5172956037

[  183.171436] Buffer I/O error on device md4, logical block 5172956038

[  183.171436] Buffer I/O error on device md4, logical block 5172956039

[  183.171437] Buffer I/O error on device md4, logical block 5172956040

[  183.171438] Buffer I/O error on device md4, logical block 5172956041

```

Here is sample output from dmesg when a write error occurs on version 4.0.9 or 4.1.12:

```

[  158.138253] BUG: unable to handle kernel NULL pointer dereference at 0000000000000120

[  158.138391] IP: [<ffffffffa024cc1f>] handle_stripe+0xdc0/0x1e1f [raid456]

[  158.138482] PGD 24ff59067 PUD 24fe43067 PMD 0

[  158.138646] Oops: 0000 [#1] SMP

[  158.138758] Modules linked in: ipv6 binfmt_misc joydev x86_pkg_temp_thermal coretemp kvm_intel kvm microcode pcspkr video i2c_i801 thermal acpi_cpufreq fan battery rtc_cmos backlight processor thermal_sys xhci_pci button xts gf128mul aes_x86_64 cbc sha256_generic scsi_transport_iscsi multipath linear raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod hid_sunplus hid_sony led_class hid_samsung hid_pl hid_petalynx hid_monterey hid_microsoft hid_logitech hid_gyration hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech sl811_hcd usbhid xhci_hcd ohci_hcd uhci_hcd usb_storage ehci_pci ehci_hcd usbcore usb_common megaraid_sas megaraid_mbox megaraid_mm megaraid sx8

[  158.141809]  DAC960 cciss mptsas mptfc scsi_transport_fc mptspi scsi_transport_spi mptscsih mptbase sg

[  158.142226] CPU: 0 PID: 2017 Comm: md4_raid6 Not tainted 4.1.12-gentoo #1

[  158.142272] Hardware name: Supermicro X10SAT/X10SAT, BIOS 2.0 04/21/2014

[  158.142323] task: ffff880254267050 ti: ffff880095afc000 task.ti: ffff880095afc000

[  158.142376] RIP: 0010:[<ffffffffa024cc1f>]  [<ffffffffa024cc1f>] handle_stripe+0xdc0/0x1e1f [raid456]

[  158.142493] RSP: 0018:ffff880095affc18  EFLAGS: 00010202

[  158.142554] RAX: 000000000000000d RBX: ffff880095cfac00 RCX: 0000000000000002

[  158.142617] RDX: 000000000000000d RSI: 0000000000000000 RDI: 0000000000001040

[  158.142682] RBP: ffff880095affcf8 R08: 0000000000000003 R09: 00000000cd920408

[  158.142745] R10: 000000000000000d R11: 0000000000000007 R12: 000000000000000d

[  158.142809] R13: 0000000000000000 R14: 000000000000000c R15: ffff8802161f2588

[  158.142873] FS:  0000000000000000(0000) GS:ffff88025ea00000(0000) knlGS:0000000000000000

[  158.142938] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[  158.143000] CR2: 0000000000000120 CR3: 0000000253ef4000 CR4: 00000000001406f0

[  158.143062] Stack:

[  158.143117]  0000000000000000 ffff880254267050 00000000000147c0 0000000000000000

[  158.143328]  ffff8802161f25d0 0000000effffffff ffff8802161f3670 ffff8802161f2ef0

[  158.143537]  0000000000000000 0000000000000000 0000000000000000 0000000c00000000

[  158.143747] Call Trace:

[  158.143805]  [<ffffffffa024dea3>] handle_active_stripes.isra.37+0x225/0x2aa [raid456]

[  158.143873]  [<ffffffffa024e31d>] raid5d+0x363/0x40d [raid456]

[  158.143937]  [<ffffffff814315bc>] ? schedule+0x6f/0x7e

[  158.143998]  [<ffffffff81372ae7>] md_thread+0x125/0x13b

[  158.144060]  [<ffffffff81061b00>] ? wait_woken+0x71/0x71

[  158.144122]  [<ffffffff813729c2>] ? md_start_sync+0xda/0xda

[  158.144185]  [<ffffffff81050609>] kthread+0xcd/0xd5

[  158.144244]  [<ffffffff8105053c>] ? kthread_create_on_node+0x16d/0x16d

[  158.144309]  [<ffffffff81434f92>] ret_from_fork+0x42/0x70

[  158.144370]  [<ffffffff8105053c>] ? kthread_create_on_node+0x16d/0x16d

[  158.144432] Code: 8c 0f d0 01 00 00 48 8b 49 10 80 e1 10 74 0d 49 8b 4f 48 80 e1 40 0f 84 c2 0f 00 00 31 c9 41 39 c8 7e 31 48 8b b4 cd 50 ff ff ff <48> 83 be 20 01 00 00 00 74 1a 48 8b be 38 01 00 00 40 80 e7 01

[  158.147700] RIP  [<ffffffffa024cc1f>] handle_stripe+0xdc0/0x1e1f [raid456]

[  158.147801]  RSP <ffff880095affc18>

[  158.147859] CR2: 0000000000000120

[  158.147916] ---[ end trace 536b72bd7c91f068 ]---

```

Things that I have tried:

Disable queuing on all drives

Disable write cache on all drives

Build minimal kernel which doesn't contain any sata drivers for any controller other than what I'm using.

The drives are connected to two LSI PCI-Express SAS Controllers.  These controllers don't support hardware raid, setup as JBOD.

Any Idea's?  I can obviously change the faulty disk to stop this from happening, but I don't want to do that until this is fixed, because if a drive fails in the future, and I don't notice, I could have corrupt files.

My /proc/mdstat:

```
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear] [multipath]

md2 : active raid1 sdk2[0] sdl2[1]

      16760832 blocks super 1.2 [2/2] [UU]

md4 : active raid6 sdc1[0] sdp1[13] sdo1[12] sdn1[11] sdm1[10] sdj1[9] sdb1[8] sdg1[15] sdi1[6] sdh1[5] sda1[14] sdf1[3] sde1[2] sdd1[1]

      23440588800 blocks super 1.2 level 6, 512k chunk, algorithm 2 [14/14] [UUUUUUUUUUUUUU]

      bitmap: 2/15 pages [8KB], 65536KB chunk

md1 : active raid1 sdk1[0] sdl1[1]

      1048512 blocks [2/2] [UU]

md3 : active raid1 sdk3[0] sdl3[1]

      1935556672 blocks super 1.2 [2/2] [UU]

      bitmap: 2/15 pages [8KB], 65536KB chunk

unused devices: <none>
```

My mdadm --detail /dev/md4:

```
/dev/md4:

        Version : 1.2

  Creation Time : Thu May 21 09:36:16 2015

     Raid Level : raid6

     Array Size : 23440588800 (22354.69 GiB 24003.16 GB)

  Used Dev Size : 1953382400 (1862.89 GiB 2000.26 GB)

   Raid Devices : 14

  Total Devices : 14

    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Fri Nov  6 11:44:14 2015

          State : clean

 Active Devices : 14

Working Devices : 14

 Failed Devices : 0

  Spare Devices : 0

         Layout : left-symmetric

     Chunk Size : 512K

           Name : livecd:4

           UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64

         Events : 4122

    Number   Major   Minor   RaidDevice State

       0       8       33        0      active sync   /dev/sdc1

       1       8       49        1      active sync   /dev/sdd1

       2       8       65        2      active sync   /dev/sde1

       3       8       81        3      active sync   /dev/sdf1

      14       8        1        4      active sync   /dev/sda1

       5       8      113        5      active sync   /dev/sdh1

       6       8      129        6      active sync   /dev/sdi1

      15       8       97        7      active sync   /dev/sdg1

       8       8       17        8      active sync   /dev/sdb1

       9       8      145        9      active sync   /dev/sdj1

      10       8      193       10      active sync   /dev/sdm1

      11       8      209       11      active sync   /dev/sdn1

      12       8      225       12      active sync   /dev/sdo1

      13       8      241       13      active sync   /dev/sdp1
```

Thanks

----------

## frostschutz

can you post mdadm --detail for /dev/md* and mdadm --examine for /dev/sd* and tune2fs -l /dev/md4?

Your issue is strange because it actually reports as I/O error on md4. With a bad disk it should report I/O error on /dev/sdx instead. It's a raid with double redundancy so a bad disk should not cause I/O errors on the md device until you have triple failure.

So it may be your issue is something different after all, such as a filesystem that believes itself to be larger than the device its on, or some other structural / logical problem rather than a hardware one.

The kernel panic you should probably take to the raid mailing list (try the latest stable kernel first, in case it was fixed somewhere already)

----------

## matt2kjones

Thanks for the reply.

If I re-enable ncq on the discs then the errors in the log do report at the /dev/sdb for example, but since I set the queue_depth to 1, it reports the raid device.

Here is all the info you requested:

mdadm --detail for /dev/md*

```
/dev/md1:

        Version : 0.90

  Creation Time : Fri May 22 18:38:44 2015

     Raid Level : raid1

     Array Size : 1048512 (1024.11 MiB 1073.68 MB)

  Used Dev Size : 1048512 (1024.11 MiB 1073.68 MB)

   Raid Devices : 2

  Total Devices : 2

Preferred Minor : 1

    Persistence : Superblock is persistent

    Update Time : Fri Nov  6 12:30:49 2015

          State : clean

 Active Devices : 2

Working Devices : 2

 Failed Devices : 0

  Spare Devices : 0

           UUID : 3021e831:c6f0b96b:cb201669:f728008a

         Events : 0.24

    Number   Major   Minor   RaidDevice State

       0       8      161        0      active sync   /dev/sdk1

       1       8      177        1      active sync   /dev/sdl1

/dev/md2:

        Version : 1.2

  Creation Time : Fri May 22 18:39:20 2015

     Raid Level : raid1

     Array Size : 16760832 (15.98 GiB 17.16 GB)

  Used Dev Size : 16760832 (15.98 GiB 17.16 GB)

   Raid Devices : 2

  Total Devices : 2

    Persistence : Superblock is persistent

    Update Time : Fri Oct 30 11:21:19 2015

          State : clean

 Active Devices : 2

Working Devices : 2

 Failed Devices : 0

  Spare Devices : 0

           Name : livecd:2

           UUID : c841b565:9ce84038:33926cee:e78f907a

         Events : 17

    Number   Major   Minor   RaidDevice State

       0       8      162        0      active sync   /dev/sdk2

       1       8      178        1      active sync   /dev/sdl2

/dev/md3:

        Version : 1.2

  Creation Time : Fri May 22 18:41:13 2015

     Raid Level : raid1

     Array Size : 1935556672 (1845.89 GiB 1982.01 GB)

  Used Dev Size : 1935556672 (1845.89 GiB 1982.01 GB)

   Raid Devices : 2

  Total Devices : 2

    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Fri Nov  6 12:33:29 2015

          State : clean

 Active Devices : 2

Working Devices : 2

 Failed Devices : 0

  Spare Devices : 0

           Name : livecd:3

           UUID : cd185b80:08a5a8bf:fb3016b7:45891977

         Events : 5592

    Number   Major   Minor   RaidDevice State

       0       8      163        0      active sync   /dev/sdk3

       1       8      179        1      active sync   /dev/sdl3

/dev/md4:

        Version : 1.2

  Creation Time : Thu May 21 09:36:16 2015

     Raid Level : raid6

     Array Size : 23440588800 (22354.69 GiB 24003.16 GB)

  Used Dev Size : 1953382400 (1862.89 GiB 2000.26 GB)

   Raid Devices : 14

  Total Devices : 14

    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Fri Nov  6 12:30:52 2015

          State : clean

 Active Devices : 14

Working Devices : 14

 Failed Devices : 0

  Spare Devices : 0

         Layout : left-symmetric

     Chunk Size : 512K

           Name : livecd:4

           UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64

         Events : 4128

    Number   Major   Minor   RaidDevice State

       0       8       33        0      active sync   /dev/sdc1

       1       8       49        1      active sync   /dev/sdd1

       2       8       65        2      active sync   /dev/sde1

       3       8       81        3      active sync   /dev/sdf1

      14       8        1        4      active sync   /dev/sda1

       5       8      113        5      active sync   /dev/sdh1

       6       8      129        6      active sync   /dev/sdi1

      15       8       97        7      active sync   /dev/sdg1

       8       8       17        8      active sync   /dev/sdb1

       9       8      145        9      active sync   /dev/sdj1

      10       8      193       10      active sync   /dev/sdm1

      11       8      209       11      active sync   /dev/sdn1

      12       8      225       12      active sync   /dev/sdo1

      13       8      241       13      active sync   /dev/sdp1
```

mdadm --examine for /dev/sd*

```
/dev/sda:

   MBR Magic : aa55

Partition[0] :   3907027120 sectors at         2048 (type fd)

/dev/sda1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x9

     Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64

           Name : livecd:4

  Creation Time : Thu May 21 09:36:16 2015

     Raid Level : raid6

   Raid Devices : 14

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)

     Array Size : 23440588800 (22354.69 GiB 24003.16 GB)

  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)

    Data Offset : 262144 sectors

   Super Offset : 8 sectors

   Unused Space : before=262056 sectors, after=176 sectors

          State : clean

    Device UUID : 7e11b910:f5624a24:38ed2418:7e309fd0

Internal Bitmap : 8 sectors from superblock

    Update Time : Fri Nov  6 12:30:52 2015

  Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.

       Checksum : 818939e0 - correct

         Events : 4128

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 4

   Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdb:

   MBR Magic : aa55

Partition[0] :   3907027120 sectors at         2048 (type fd)

/dev/sdb1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x1

     Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64

           Name : livecd:4

  Creation Time : Thu May 21 09:36:16 2015

     Raid Level : raid6

   Raid Devices : 14

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)

     Array Size : 23440588800 (22354.69 GiB 24003.16 GB)

  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)

    Data Offset : 262144 sectors

   Super Offset : 8 sectors

   Unused Space : before=262056 sectors, after=176 sectors

          State : clean

    Device UUID : 23903a8f:b96bfb6e:04f35623:0c35676e

Internal Bitmap : 8 sectors from superblock

    Update Time : Fri Nov  6 12:30:52 2015

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : f3b5cb95 - correct

         Events : 4128

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 8

   Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdc:

   MBR Magic : aa55

Partition[0] :   3907027120 sectors at         2048 (type fd)

/dev/sdc1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x1

     Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64

           Name : livecd:4

  Creation Time : Thu May 21 09:36:16 2015

     Raid Level : raid6

   Raid Devices : 14

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)

     Array Size : 23440588800 (22354.69 GiB 24003.16 GB)

  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)

    Data Offset : 262144 sectors

   Super Offset : 8 sectors

   Unused Space : before=262056 sectors, after=176 sectors

          State : clean

    Device UUID : 10911088:dadaf2c5:19a09b0a:91d51505

Internal Bitmap : 8 sectors from superblock

    Update Time : Fri Nov  6 12:30:52 2015

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : c1768935 - correct

         Events : 4128

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 0

   Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdd:

   MBR Magic : aa55

Partition[0] :   3907027120 sectors at         2048 (type fd)

/dev/sdd1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x1

     Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64

           Name : livecd:4

  Creation Time : Thu May 21 09:36:16 2015

     Raid Level : raid6

   Raid Devices : 14

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)

     Array Size : 23440588800 (22354.69 GiB 24003.16 GB)

  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)

    Data Offset : 262144 sectors

   Super Offset : 8 sectors

   Unused Space : before=262056 sectors, after=176 sectors

          State : clean

    Device UUID : aa6811d5:10d5679f:0c559636:ffceb688

Internal Bitmap : 8 sectors from superblock

    Update Time : Fri Nov  6 12:30:52 2015

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : 97880968 - correct

         Events : 4128

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 1

   Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sde:

   MBR Magic : aa55

Partition[0] :   3907027120 sectors at         2048 (type fd)

/dev/sde1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x1

     Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64

           Name : livecd:4

  Creation Time : Thu May 21 09:36:16 2015

     Raid Level : raid6

   Raid Devices : 14

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)

     Array Size : 23440588800 (22354.69 GiB 24003.16 GB)

  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)

    Data Offset : 262144 sectors

   Super Offset : 8 sectors

   Unused Space : before=262056 sectors, after=176 sectors

          State : clean

    Device UUID : dd21b987:7f344fee:05ba94e7:2e5e82c9

Internal Bitmap : 8 sectors from superblock

    Update Time : Fri Nov  6 12:30:52 2015

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : 8ae11634 - correct

         Events : 4128

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 2

   Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdf:

   MBR Magic : aa55

Partition[0] :   3907027120 sectors at         2048 (type fd)

/dev/sdf1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x1

     Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64

           Name : livecd:4

  Creation Time : Thu May 21 09:36:16 2015

     Raid Level : raid6

   Raid Devices : 14

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)

     Array Size : 23440588800 (22354.69 GiB 24003.16 GB)

  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)

    Data Offset : 262144 sectors

   Super Offset : 8 sectors

   Unused Space : before=262056 sectors, after=176 sectors

          State : clean

    Device UUID : 81fc58c1:bd831960:ffbbc225:efff592c

Internal Bitmap : 8 sectors from superblock

    Update Time : Fri Nov  6 12:30:52 2015

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : d750e3d0 - correct

         Events : 4128

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 3

   Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdg:

   MBR Magic : aa55

Partition[0] :   3907027120 sectors at         2048 (type fd)

/dev/sdg1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x9

     Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64

           Name : livecd:4

  Creation Time : Thu May 21 09:36:16 2015

     Raid Level : raid6

   Raid Devices : 14

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)

     Array Size : 23440588800 (22354.69 GiB 24003.16 GB)

  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)

    Data Offset : 262144 sectors

   Super Offset : 8 sectors

   Unused Space : before=262056 sectors, after=176 sectors

          State : clean

    Device UUID : 60cdfa5c:246ba2a4:5368f531:b10580ac

Internal Bitmap : 8 sectors from superblock

    Update Time : Fri Nov  6 12:30:52 2015

  Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.

       Checksum : 43d44e41 - correct

         Events : 4128

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 7

   Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdh:

   MBR Magic : aa55

Partition[0] :   3907027120 sectors at         2048 (type fd)

/dev/sdh1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x1

     Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64

           Name : livecd:4

  Creation Time : Thu May 21 09:36:16 2015

     Raid Level : raid6

   Raid Devices : 14

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)

     Array Size : 23440588800 (22354.69 GiB 24003.16 GB)

  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)

    Data Offset : 262144 sectors

   Super Offset : 8 sectors

   Unused Space : before=262056 sectors, after=176 sectors

          State : clean

    Device UUID : 38403e44:8bf2a98f:cb3d98b7:10969838

Internal Bitmap : 8 sectors from superblock

    Update Time : Fri Nov  6 12:30:52 2015

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : 27daae45 - correct

         Events : 4128

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 5

   Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdi:

   MBR Magic : aa55

Partition[0] :   3907027120 sectors at         2048 (type fd)

/dev/sdi1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x9

     Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64

           Name : livecd:4

  Creation Time : Thu May 21 09:36:16 2015

     Raid Level : raid6

   Raid Devices : 14

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)

     Array Size : 23440588800 (22354.69 GiB 24003.16 GB)

  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)

    Data Offset : 262144 sectors

   Super Offset : 8 sectors

   Unused Space : before=262056 sectors, after=176 sectors

          State : clean

    Device UUID : e8953848:8a01645f:de181342:376666ba

Internal Bitmap : 8 sectors from superblock

    Update Time : Fri Nov  6 12:30:52 2015

  Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.

       Checksum : 7d7be37 - correct

         Events : 4128

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 6

   Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdj:

   MBR Magic : aa55

Partition[0] :   3907027120 sectors at         2048 (type fd)

/dev/sdj1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x1

     Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64

           Name : livecd:4

  Creation Time : Thu May 21 09:36:16 2015

     Raid Level : raid6

   Raid Devices : 14

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)

     Array Size : 23440588800 (22354.69 GiB 24003.16 GB)

  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)

    Data Offset : 262144 sectors

   Super Offset : 8 sectors

   Unused Space : before=262056 sectors, after=176 sectors

          State : clean

    Device UUID : c1f36d7f:aa57e669:d7597f75:07b62e66

Internal Bitmap : 8 sectors from superblock

    Update Time : Fri Nov  6 12:30:52 2015

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : 28c402f4 - correct

         Events : 4128

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 9

   Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdk:

   MBR Magic : aa55

Partition[0] :      2097152 sectors at         2048 (type fd)

Partition[1] :     33554432 sectors at      2099200 (type fd)

Partition[2] :   3871375536 sectors at     35653632 (type fd)

/dev/sdk1:

          Magic : a92b4efc

        Version : 0.90.00

           UUID : 3021e831:c6f0b96b:cb201669:f728008a

  Creation Time : Fri May 22 18:38:44 2015

     Raid Level : raid1

  Used Dev Size : 1048512 (1024.11 MiB 1073.68 MB)

     Array Size : 1048512 (1024.11 MiB 1073.68 MB)

   Raid Devices : 2

  Total Devices : 2

Preferred Minor : 1

    Update Time : Fri Nov  6 12:30:49 2015

          State : clean

 Active Devices : 2

Working Devices : 2

 Failed Devices : 0

  Spare Devices : 0

       Checksum : e321120 - correct

         Events : 24

      Number   Major   Minor   RaidDevice State

this     0       8      161        0      active sync   /dev/sdk1

   0     0       8      161        0      active sync   /dev/sdk1

   1     1       8      177        1      active sync   /dev/sdl1

/dev/sdk2:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : c841b565:9ce84038:33926cee:e78f907a

           Name : livecd:2

  Creation Time : Fri May 22 18:39:20 2015

     Raid Level : raid1

   Raid Devices : 2

 Avail Dev Size : 33521664 (15.98 GiB 17.16 GB)

     Array Size : 16760832 (15.98 GiB 17.16 GB)

    Data Offset : 32768 sectors

   Super Offset : 8 sectors

   Unused Space : before=32680 sectors, after=0 sectors

          State : clean

    Device UUID : a28b0b55:8a027224:ba4b1ca0:b84661dc

    Update Time : Fri Oct 30 11:21:19 2015

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : ed26c709 - correct

         Events : 17

   Device Role : Active device 0

   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdk3:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x1

     Array UUID : cd185b80:08a5a8bf:fb3016b7:45891977

           Name : livecd:3

  Creation Time : Fri May 22 18:41:13 2015

     Raid Level : raid1

   Raid Devices : 2

 Avail Dev Size : 3871113392 (1845.89 GiB 1982.01 GB)

     Array Size : 1935556672 (1845.89 GiB 1982.01 GB)

  Used Dev Size : 3871113344 (1845.89 GiB 1982.01 GB)

    Data Offset : 262144 sectors

   Super Offset : 8 sectors

   Unused Space : before=262056 sectors, after=48 sectors

          State : clean

    Device UUID : 41b24115:a618293c:e4f20ee0:2af72266

Internal Bitmap : 8 sectors from superblock

    Update Time : Fri Nov  6 12:34:31 2015

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : c08f6a12 - correct

         Events : 5592

   Device Role : Active device 0

   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdl:

   MBR Magic : aa55

Partition[0] :      2097152 sectors at         2048 (type fd)

Partition[1] :     33554432 sectors at      2099200 (type fd)

Partition[2] :   3871375536 sectors at     35653632 (type fd)

/dev/sdl1:

          Magic : a92b4efc

        Version : 0.90.00

           UUID : 3021e831:c6f0b96b:cb201669:f728008a

  Creation Time : Fri May 22 18:38:44 2015

     Raid Level : raid1

  Used Dev Size : 1048512 (1024.11 MiB 1073.68 MB)

     Array Size : 1048512 (1024.11 MiB 1073.68 MB)

   Raid Devices : 2

  Total Devices : 2

Preferred Minor : 1

    Update Time : Fri Nov  6 12:30:49 2015

          State : clean

 Active Devices : 2

Working Devices : 2

 Failed Devices : 0

  Spare Devices : 0

       Checksum : e321132 - correct

         Events : 24

      Number   Major   Minor   RaidDevice State

this     1       8      177        1      active sync   /dev/sdl1

   0     0       8      161        0      active sync   /dev/sdk1

   1     1       8      177        1      active sync   /dev/sdl1

/dev/sdl2:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : c841b565:9ce84038:33926cee:e78f907a

           Name : livecd:2

  Creation Time : Fri May 22 18:39:20 2015

     Raid Level : raid1

   Raid Devices : 2

 Avail Dev Size : 33521664 (15.98 GiB 17.16 GB)

     Array Size : 16760832 (15.98 GiB 17.16 GB)

    Data Offset : 32768 sectors

   Super Offset : 8 sectors

   Unused Space : before=32680 sectors, after=0 sectors

          State : clean

    Device UUID : 6558b051:61dbd3fa:296798ee:2e82dcf0

    Update Time : Fri Oct 30 11:21:19 2015

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : 2324c38b - correct

         Events : 17

   Device Role : Active device 1

   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdl3:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x1

     Array UUID : cd185b80:08a5a8bf:fb3016b7:45891977

           Name : livecd:3

  Creation Time : Fri May 22 18:41:13 2015

     Raid Level : raid1

   Raid Devices : 2

 Avail Dev Size : 3871113392 (1845.89 GiB 1982.01 GB)

     Array Size : 1935556672 (1845.89 GiB 1982.01 GB)

  Used Dev Size : 3871113344 (1845.89 GiB 1982.01 GB)

    Data Offset : 262144 sectors

   Super Offset : 8 sectors

   Unused Space : before=262056 sectors, after=48 sectors

          State : clean

    Device UUID : 93c52ef1:1fc77f86:a37016c3:8bbe6b63

Internal Bitmap : 8 sectors from superblock

    Update Time : Fri Nov  6 12:34:31 2015

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : c72370ff - correct

         Events : 5592

   Device Role : Active device 1

   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdm:

   MBR Magic : aa55

Partition[0] :   3907027120 sectors at         2048 (type fd)

/dev/sdm1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x1

     Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64

           Name : livecd:4

  Creation Time : Thu May 21 09:36:16 2015

     Raid Level : raid6

   Raid Devices : 14

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)

     Array Size : 23440588800 (22354.69 GiB 24003.16 GB)

  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)

    Data Offset : 262144 sectors

   Super Offset : 8 sectors

   Unused Space : before=262056 sectors, after=176 sectors

          State : clean

    Device UUID : f6fe2e35:6d4fdccf:bde20ad0:a21b7f9c

Internal Bitmap : 8 sectors from superblock

    Update Time : Fri Nov  6 12:30:52 2015

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : d556f46e - correct

         Events : 4128

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 10

   Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdn:

   MBR Magic : aa55

Partition[0] :   3907027120 sectors at         2048 (type fd)

/dev/sdn1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x1

     Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64

           Name : livecd:4

  Creation Time : Thu May 21 09:36:16 2015

     Raid Level : raid6

   Raid Devices : 14

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)

     Array Size : 23440588800 (22354.69 GiB 24003.16 GB)

  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)

    Data Offset : 262144 sectors

   Super Offset : 8 sectors

   Unused Space : before=262056 sectors, after=176 sectors

          State : clean

    Device UUID : 4d95468f:26b94d0c:9fc8db13:7ab51494

Internal Bitmap : 8 sectors from superblock

    Update Time : Fri Nov  6 12:30:52 2015

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : a7467438 - correct

         Events : 4128

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 11

   Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdo:

   MBR Magic : aa55

Partition[0] :   3907027120 sectors at         2048 (type fd)

/dev/sdo1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x1

     Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64

           Name : livecd:4

  Creation Time : Thu May 21 09:36:16 2015

     Raid Level : raid6

   Raid Devices : 14

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)

     Array Size : 23440588800 (22354.69 GiB 24003.16 GB)

  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)

    Data Offset : 262144 sectors

   Super Offset : 8 sectors

   Unused Space : before=262056 sectors, after=176 sectors

          State : clean

    Device UUID : d3f2f2a7:ccb804fa:15b8dce3:25928566

Internal Bitmap : 8 sectors from superblock

    Update Time : Fri Nov  6 12:30:52 2015

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : 501b9d88 - correct

         Events : 4128

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 12

   Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdp:

   MBR Magic : aa55

Partition[0] :   3907027120 sectors at         2048 (type fd)

/dev/sdp1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x9

     Array UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64

           Name : livecd:4

  Creation Time : Thu May 21 09:36:16 2015

     Raid Level : raid6

   Raid Devices : 14

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)

     Array Size : 23440588800 (22354.69 GiB 24003.16 GB)

  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)

    Data Offset : 262144 sectors

   Super Offset : 8 sectors

   Unused Space : before=262056 sectors, after=176 sectors

          State : clean

    Device UUID : c891d33f:47ac354c:ad47f2ea:832e7cd1

Internal Bitmap : 8 sectors from superblock

    Update Time : Fri Nov  6 12:30:52 2015

  Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.

       Checksum : ac39644e - correct

         Events : 4128

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 13

   Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
```

tune2fs -l /dev/md4

```
tune2fs 1.42.13 (17-May-2015)

Filesystem volume name:   <none>

Last mounted on:          /mnt/DataArray

Filesystem UUID:          68d335b1-4d92-4945-ab5b-e7416f346468

Filesystem magic number:  0xEF53

Filesystem revision #:    1 (dynamic)

Filesystem features:      has_journal ext_attr dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize

Filesystem flags:         signed_directory_hash

Default mount options:    user_xattr acl

Filesystem state:         clean

Errors behavior:          Continue

Filesystem OS type:       Linux

Inode count:              366260224

Block count:              5860147200

Reserved block count:     586014

Free blocks:              606915714

Free inodes:              360358423

First block:              0

Block size:               4096

Fragment size:            4096

Group descriptor size:    64

Blocks per group:         32768

Fragments per group:      32768

Inodes per group:         2048

Inode blocks per group:   128

RAID stride:              128

RAID stripe width:        1536

Flex block group size:    16

Filesystem created:       Fri May 22 22:18:26 2015

Last mount time:          Fri Nov  6 12:30:52 2015

Last write time:          Fri Nov  6 12:30:52 2015

Mount count:              18

Maximum mount count:      -1

Last checked:             Fri Jul 17 10:54:53 2015

Check interval:           0 (<none>)

Lifetime writes:          88 TB

Reserved blocks uid:      0 (user root)

Reserved blocks gid:      0 (group root)

First inode:              11

Inode size:               256

Required extra isize:     28

Desired extra isize:      28

Journal inode:            8

Default directory hash:   half_md4

Directory Hash Seed:      9a96b45d-93ee-4faf-b081-74a7ebe2b0b4

Journal backup:           inode blocks
```

Thanks for the help

----------

## frostschutz

Maybe your issue has something to do with the bad block log, which is a relatively new feature in MD. A drive might get a bad block recorded in this log instead of being kicked from the array.

But /dev/sda1, /dev/sdg1, /dev/sdi1, /dev/sdp1 all claim to have "bad blocks present" and that probably shouldn't be, it shouldn't affect this many disks.

Do the disks all pass a 'smartctl -t long' self-test?

 *Quote:*   

> 
> 
> man md
> 
>    BAD BLOCK LIST
> ...

 

Maybe that's your issue, the md itself has bad blocks, hence the I/O errors on md.

Please also do mdadm --examine-badblocks for all

But none of this explains the kernel panic you're getting, so you should still take it to the RAID mailing list, so one of the developers can take a peek at it.

----------

## matt2kjones

I did notice while pasting the logs that I had badblocks on multiple discs.  But surely the raid array should take the approach of degrading if a write fails.  I assume that if a write fails and it records it in the badblock list, that it will use another part of the disk to write that data?

It's also strange that on 3.18.12 I have I/O errors and on 4.0.8 / 4.0.12 I get a kernel oops, as if the condition is being handled differently.

I will post this to the kernel raid mailing list as well.

----------

## frostschutz

 *Quote:*   

> But surely the raid array should take the approach of degrading if a write fails.

 

It depends. Failing 8TB worth of disk for a single bad sector may not always be what you want. If you have a single bad sector on 3 different disks, but the sectors are in different places for each disk, you can still use them for rebuilding with intact disks. As long as you're smart enough to actually replace disks that have bad sectors, your RAID survives where with/out bad block log, it would have failed already.

My own RAID setup is a bit older, from before the bad block log, but I took the same approach after a fashion. I use a split RAID. Instead of making one big terabyte array, I use smaller partitions (250G each, so 4 partitions per terabyte per disk), and build an independent array for each set of partitions, which are later joined back together using LVM. That way my RAID too survives multiple single bad sectors on different disks as long as they're 250G apart. Since a single bad sector only degrades the 250G partition it was in and not the whole disk.

The way I understood it the bad block log implements my split RAID idea on an actual block-level resolution.

But I don't have personal experience with that system yet, my disks refuse to die on me.  :Laughing: 

----------

## matt2kjones

Makes sense.

I have posted my issues to the linux-raid kernel mailing list, and linked them to this thread for more information.

I have issued the smartctl tests on all drives.  That is going to take 4 hours so I will come back with the results from them later.  I am expecting some of the drives to fail, in which case I would have thought that linux raid would degrade the array, unless obviously the badblocks list can work around the errors - but if that was the case I should have corrupt files / kernel oops.

Thanks for all your help.

Matt

----------

## frostschutz

and what does the --examine-badblocks look like?

If my theory was right it should show the same blocks bad on 3 disks and that block should translate to the sector ext4 was complaining about.

Since it's incredibly unlikely for same block to go back on three disks, maybe a controller issue that triggered it. you'd have to check your logs for old messages if you have them

----------

## NeddySeagoon

matt2kjones,

When you get a write fail on a single drive in a raid set the drive will attempt to reallocate the failed sector.

This is a internal to the drive thing. The kernel is not involved.

Similarly with a read fail.  The drive will want to reallocate the sector but can't because it can't read it.

Events like this are recorded in the drives internal SMART log.  Take a look with smartmontools. 

Drive level errors look like

```
[415787.257222] ata1.00: exception Emask 0x0 SAct 0xfff000 SErr 0x0 action 0x0

[415787.257229] ata1.00: irq_stat 0x40000008

[415787.257243] ata1.00: cmd 60/08:60:08:d4:f4/00:00:bd:00:00/40 tag 12 ncq 4096 in

[415787.257246]          res 41/40:00:08:d4:f4/00:00:bd:00:00/40 Emask 0x409 (media error) <F>

[415787.267041] ata1.00: configured for UDMA/133

[415787.267075] ata1: EH complete
```

 in dmesg.

What do your SMART logs look like?

In particular, these parameters

```
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       1

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1
```

The Reallocated_Sector_Ct being not zero is not a cause for concern.  That's how drives hide bad sectors from the operating system.

Current_Pending_Sectors are a bad thing.  That's a count of the blocks the drive had tried to read and can't.

On a single drive filesystem, that data is probably lost.  On a raid set, it can be reconstructed from the redundant data.

That tells me I need to run a repair on that raid set nowish or even sooner.  That should force that pending sector to be reconstructed from the other members of the set.

Your problems appear to be related to the filesystem on the raid set itself, rather than the individual members of the set.

----------

## frostschutz

 *matt2kjones wrote:*   

> I have posted my issues to the linux-raid kernel mailing list, and linked them to this thread for more information.

 

Do you have a link for your mail in the mailinglist archives? I can't find it...

----------

## matt2kjones

Yeah I'm not sure whats happening.

I subscribed to the linux-raid mailing list, and everything went fine,  I am now receiving mails sent to that list.

I posted a message to the list and I get no response back, however if I send a command like "help" to the list, I do get a reply, so not sure why my message isn't being posted to the list.

----------

## matt2kjones

OK, I have managed to post to the kernel mailing list using a different email address.

I have the output from mdadm --examine-badblocks.  I am only listing drives here which have anything in the list:

/dev/sda1:

```
Bad-blocks on /dev/sda1:

          1938038928 for 512 sectors

          1938039440 for 512 sectors

          1938977144 for 512 sectors

          1938977656 for 512 sectors

          3303750816 for 512 sectors

          3303751328 for 512 sectors

          3313648904 for 512 sectors

          3313649416 for 512 sectors

          3313651976 for 512 sectors

          3313652488 for 512 sectors

          3418023432 for 512 sectors

          3418023944 for 512 sectors

          3418024456 for 512 sectors

          3418024968 for 512 sectors

          3418037768 for 512 sectors

          3418038280 for 512 sectors

          3418038792 for 512 sectors

          3418039304 for 512 sectors

          3418112520 for 512 sectors

          3418113032 for 512 sectors

          3418113544 for 512 sectors

          3418114056 for 512 sectors

          3418114568 for 512 sectors

          3418115080 for 512 sectors

          3418124808 for 512 sectors

          3418125320 for 512 sectors

          3418165768 for 512 sectors

          3418166280 for 512 sectors

          3418187272 for 512 sectors

          3418187784 for 512 sectors

          3418213224 for 512 sectors

          3418213736 for 512 sectors

          3418214248 for 512 sectors

          3418214760 for 512 sectors

          3418215272 for 512 sectors

          3418215784 for 512 sectors

          3420607528 for 512 sectors

          3420608040 for 512 sectors

          3420626984 for 512 sectors

          3420627496 for 512 sectors

          3448897824 for 512 sectors

          3448898336 for 512 sectors

          3458897888 for 512 sectors

          3458898400 for 512 sectors

          3519403992 for 512 sectors

          3519404504 for 512 sectors

          3617207456 for 512 sectors

          3617207968 for 512 sectors

```

/dev/sdg1:

```
Bad-blocks on /dev/sdg1:

          1938038928 for 512 sectors

          1938039440 for 512 sectors

          1938977144 for 512 sectors

          1938977656 for 512 sectors

          3303750816 for 512 sectors

          3303751328 for 512 sectors

          3313648904 for 512 sectors

          3313649416 for 512 sectors

          3313651976 for 512 sectors

          3313652488 for 512 sectors

          3418023432 for 512 sectors

          3418023944 for 512 sectors

          3418024456 for 512 sectors

          3418024968 for 512 sectors

          3418037768 for 512 sectors

          3418038280 for 512 sectors

          3418038792 for 512 sectors

          3418039304 for 512 sectors

          3418112520 for 512 sectors

          3418113032 for 512 sectors

          3418113544 for 512 sectors

          3418114056 for 512 sectors

          3418114568 for 512 sectors

          3418115080 for 512 sectors

          3418124808 for 512 sectors

          3418125320 for 512 sectors

          3418165768 for 512 sectors

          3418166280 for 512 sectors

          3418187272 for 512 sectors

          3418187784 for 512 sectors

          3418213224 for 512 sectors

          3418213736 for 512 sectors

          3418214248 for 512 sectors

          3418214760 for 512 sectors

          3418215272 for 512 sectors

          3418215784 for 512 sectors

          3420607528 for 512 sectors

          3420608040 for 512 sectors

          3420626984 for 512 sectors

          3420627496 for 512 sectors

          3448897824 for 512 sectors

          3448898336 for 512 sectors

          3458897888 for 512 sectors

          3458898400 for 512 sectors

          3519403992 for 512 sectors

          3519404504 for 512 sectors

          3617207456 for 512 sectors

          3617207968 for 512 sectors

```

/dev/sdi1:

```
Bad-blocks on /dev/sdi1:

          1938977144 for 512 sectors

          1938977656 for 512 sectors
```

/dev/sdp1:

```
Bad-blocks on /dev/sdp1:

          1938038928 for 512 sectors

          1938039440 for 512 sectors

          3303750816 for 512 sectors

          3303751328 for 512 sectors

          3313648904 for 512 sectors

          3313649416 for 512 sectors

          3313651976 for 512 sectors

          3313652488 for 512 sectors

          3418023432 for 512 sectors

          3418023944 for 512 sectors

          3418024456 for 512 sectors

          3418024968 for 512 sectors

          3418037768 for 512 sectors

          3418038280 for 512 sectors

          3418038792 for 512 sectors

          3418039304 for 512 sectors

          3418112520 for 512 sectors

          3418113032 for 512 sectors

          3418113544 for 512 sectors

          3418114056 for 512 sectors

          3418114568 for 512 sectors

          3418115080 for 512 sectors

          3418124808 for 512 sectors

          3418125320 for 512 sectors

          3418165768 for 512 sectors

          3418166280 for 512 sectors

          3418187272 for 512 sectors

          3418187784 for 512 sectors

          3418213224 for 512 sectors

          3418213736 for 512 sectors

          3418214248 for 512 sectors

          3418214760 for 512 sectors

          3418215272 for 512 sectors

          3418215784 for 512 sectors

          3420607528 for 512 sectors

          3420608040 for 512 sectors

          3420626984 for 512 sectors

          3420627496 for 512 sectors

          3448897824 for 512 sectors

          3448898336 for 512 sectors

          3458897888 for 512 sectors

          3458898400 for 512 sectors

          3519403992 for 512 sectors

          3519404504 for 512 sectors

          3617207456 for 512 sectors

          3617207968 for 512 sectors

```

It seems very odd that I have 3 drives with badblocks at the same locations? Something looks very wrong there :/

I have also unmounted the filesystem and done as fsck on this array and nothing is wrong.

As for the extended tests with smartmontools, 1 drive out of the set had a "read error" 60%, all other discs passed.

----------

## frostschutz

Yup, I'm not sure if that's how bad blocks are supposed to work. In your case it seems to have resulted in a "raid that never fails" which is not particularly useful if that in turn leaves the filesystem or system to deal with the mess or even crash...

I wish the bad blocks feature would be more exposed, say in /proc/mdstat instead of showing [UUU] or [U_U] it could do something like [BBB] for disks with known bad blocks, and mdadm monitor should send you mails about it if it doesn't already.

RAID survival depends on detecting errors early, and replacing disks immediately; if the bad blocks is designed to hide errors from you then it would be better to go without this feature. (even though it is a nice idea depending on the implementation as I mentioned earlier in this thread)

 *Quote:*   

> It seems very odd that I have 3 drives with badblocks at the same locations? Something looks very wrong there :/

 

Kernel panic aside, it explains why the read errors show on /dev/mdX rather than a specific /dev/sdX.

As for how those bad blocks came to be you'd have to check your logs if you have them, maybe some controller jitter...

I'm not sure what the mailing list will recommend; I would probably attempt recovery by clearing bad block log on the disks that passed the SMART long selftest and then replace the drive that failed.

The problem is you don't see when those sectors were added to the log, if it was a controller fluke then it probably all happened at the same time, but ... if one disk had the bad block log earlier than the others then that disk would be less likely to have good data in those sectors than the others so ... you should only clear the ones that have good data, or simply try several combinations (once you've determined what is stored in those locations using filefrag).

----------

## matt2kjones

Thanks again for the reply.

I think I read somewhere that if you use metadata version 0.9, the badblock functionality isn't enabled.

This array is split over two controllers (two 8port sas cards), and one of those drives with badblocks is on a different controller to the other, so I don't think it would be controller error, unless there was a power glitch or something, which could be possible, although the array and server are attached to a ups.

I can actually destroy this array.  This server contains backups of our live, master server which uses hardware raid10 with many more discs.  So I can easily destroy this array and re-create it with good discs and see if the problem goes away.  The main reason I am looking to resolve it without destroying the data is so that I can understand why it's happened, and how to get around it in the future if it happens again.

I will probably go down the route you suggest and clear the badblock logs for all the drives and replace the known faulty drive (we have lots of unopened spares here).

One question, if you don't mind?  If a drive has a write error, then the block is added to the badblocks list and if the write to the badblocks list fails, then the drive is set as faulty, I understand that.  But what happens if the drive sucessfully writes the badblock to the badblock list? Do I only have one copy of that data elsewhere on the array? What happens if I have drive A with badblocks then Drive B and C fail. Theoretically I can recover the array, but I assume that the data in those badblocks would be lost.

----------

## frostschutz

 *matt2kjones wrote:*   

> I think I read somewhere that if you use metadata version 0.9, the badblock functionality isn't enabled.

 

Don't use 0.90 metadata for anything.

badblock list can be enabled or disabled as you like (update bbl,  no-bbl, something like that)

 *Quote:*   

> Do I only have one copy of that data elsewhere on the array?

 

Yes, that block is no longer redundant (or in case of RAID6, less redundant than it should be).

 *Quote:*   

> What happens if I have drive A with badblocks then Drive B and C fail.

 

It's dead... (at least the bad blocks are gone for good in that case, if actually bad on the drives and not just md-believes-so, in which case they might just have outdated data)

----------

## DingbatCA

I am having what I think is the same issue.  After 50+ hours of trouble shooting, I ordered in 2 new LSI controller cards.  I think the problem is with the mvsas card/driver.  Matt2kjones, can you give us the output of lspci?  What type of drives are you using?  Mine are all 3TB WD Greens.

I have 10 drives in question and they keep failing.  The drive(s) gets a sector marked as "pending"  bad.  From there I can use hdparm --write-sector to toggle the exact sector in question.  The drive says the sector is fine.  I have gone as far as running SMART long tests, and secure erase.  The drives always come back healthy.  Every test I can run on the drive shows they are in good healthy.

```
root@MediaNAS:~# lspci | grep SATA

00:1f.2 IDE interface: Intel Corporation 5 Series/3400 Series Chipset 4 port SATA IDE Controller (rev 05)

00:1f.5 IDE interface: Intel Corporation 5 Series/3400 Series Chipset 2 port SATA IDE Controller (rev 05)

03:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SE6440 SAS/SATA PCIe controller (rev 02)

04:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SE6440 SAS/SATA PCIe controller (rev 02)

05:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SE6440 SAS/SATA PCIe controller (rev 02)

06:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SE6440 SAS/SATA PCIe controller (rev 02)
```

Yes, in this case it is pointing to one drive.  But the drive and sectors seem to move around.

```
[ 3815.448942] md/raid:md1: read error not correctable (sector 865360712 on sdl).

[ 3815.448949] md/raid:md1: read error not correctable (sector 865360720 on sdl).

[ 3815.448952] md/raid:md1: read error not correctable (sector 865360728 on sdl).

[ 3815.448955] md/raid:md1: read error not correctable (sector 865360736 on sdl).

[ 3815.448957] md/raid:md1: read error not correctable (sector 865360744 on sdl).

[ 3815.448960] md/raid:md1: read error not correctable (sector 865360752 on sdl).

[ 3815.448963] md/raid:md1: read error not correctable (sector 865360760 on sdl).

[ 3815.448966] md/raid:md1: read error not correctable (sector 865360768 on sdl).

[ 3815.448969] md/raid:md1: read error not correctable (sector 865360776 on sdl).

[ 3815.448971] md/raid:md1: read error not correctable (sector 865360784 on sdl).
```

----------

## matt2kjones

Hi DingbatCA,

My issue seems to be that I have multiple drives with blocks all in the same area.  If I take a drive out of the array with no badblocks, then add it back in, the badblocks from the other drives propagate to the badblocks list on the drive added back in.  Not sure if this is meant to happen.

I have different cards to you:

lspci |grep SAS

01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)

02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)

Also, on my system, I'm not actually getting any harddrive errors on the harddrives themselves, only on the raid array as a whole, which makes me think that no read/write errors are actually happening on any of the drives and I think the badblock list is faulty somehow

----------

## matt2kjones

OK, Changes I have done since my last post.

I have failed, removed and re-added 3 drives, one at a time.

/dev/sdp - This had the full list of badblocks above.  When it re-added, it had none, when the sync completed, the badblock list was full again.

/dev/sda - Same as above

/dev/sdi - This drive only had two entries in the badblock list prior to removal - After a full sync, it had the full list... same as sdp and sda.

I have also switched to the latest mainline kernel 4.3.0

Since I have done these two things, write have been considerably faster, and I haven't had any dmesg errors yet (written over 400GB so far).

So I'm not sure whether taking the drives out of the array, and adding them back in, 1 at a time has fixed the issue, or whether the badblocks implementation is broken in earlier kernels, and it works correctly in 4.3.0

I plan to fill all the free space (about 6TB) to see if I have any write errors - If not, I assume this is fixed.

----------

## matt2kjones

This issue seems to be resolved.

Wrote over 4TB of data to the array this morning, and finally hit an I/O error on /dev/sdd

drive was marked as faulty, and array degraded.

```
[77502.279233] sd 0:0:3:0: attempting task abort! scmd(ffff8801ef40b6c0)

[77502.279237] sd 0:0:3:0: [sdd] CDB: opcode=0x85 85 08 0e 00 d5 00 01 00 00 00 4f 00 c2 00 b0 00

[77502.279239] scsi target0:0:3: handle(0x000c), sas_address(0x4433221103000000), phy(3)

[77502.279240] scsi target0:0:3: enclosure_logical_id(0x500605b008924a60), slot(1)

[77502.279241] scsi target0:0:3: enclosure level(0x0000),connector name()

[77502.333188] sd 0:0:3:0: task abort: SUCCESS scmd(ffff8801ef40b6c0)

[77502.713979] blk_update_request: I/O error, dev sdd, sector 2064

[77502.713982] md: super_written gets error=-5

[77502.713985] md/raid:md4: Disk failure on sdd1, disabling device.

               md/raid:md4: Operation continuing on 13 devices.

```

Seems to be working as expected now.

The only thing I can image has fixed it was removing and adding all the drives with badblocks.  I'm guessing that the array was in some sort of error state maybe from a broken implementation of badblocks when using an earlier kernel.

I also upgraded to kernel 4.3.0, rather than using the latest kernel from the portage tree, so that may have something to do with it also.

Thanks to everyone that helped, especially frostschutz who replied to every post.

Cheers!

----------

## matt2kjones

Spoke to soon...

After writing about 6TB of data I have hit buffer I/O errors again:

```
[158219.456484] EXT4-fs warning (device md4): ext4_end_bio:329: I/O error -5 writing to inode 125274714 (offset 176160768 size 8388608 starting block 4955235712)

[158219.456487] Buffer I/O error on device md4, logical block 4955235584

[158219.456490] Buffer I/O error on device md4, logical block 4955235585

[158219.456491] Buffer I/O error on device md4, logical block 4955235586

[158219.456491] Buffer I/O error on device md4, logical block 4955235587

[158219.456492] Buffer I/O error on device md4, logical block 4955235588

[158219.456493] Buffer I/O error on device md4, logical block 4955235589

[158219.456494] Buffer I/O error on device md4, logical block 4955235590

[158219.456495] Buffer I/O error on device md4, logical block 4955235591

[158219.456496] Buffer I/O error on device md4, logical block 4955235592

[158219.456497] Buffer I/O error on device md4, logical block 4955235593

[158219.456580] EXT4-fs warning (device md4): ext4_end_bio:329: I/O error -5 writing to inode 125274714 (offset 176160768 size 8388608 starting block 4955235456)

[158219.456663] EXT4-fs warning (device md4): ext4_end_bio:329: I/O error -5 writing to inode 125274714 (offset 176160768 size 8388608 starting block 4955235200)

[158219.456747] EXT4-fs warning (device md4): ext4_end_bio:329: I/O error -5 writing to inode 125274714 (offset 176160768 size 8388608 starting block 4955234944)

[158219.456829] EXT4-fs warning (device md4): ext4_end_bio:329: I/O error -5 writing to inode 125274714 (offset 176160768 size 8388608 starting block 4955234688)

[158219.456912] EXT4-fs warning (device md4): ext4_end_bio:329: I/O error -5 writing to inode 125274714 (offset 176160768 size 8388608 starting block 4955234432)

[158469.158278] EXT4-fs warning (device md4): ext4_end_bio:329: I/O error -5 writing to inode 123995503 (offset 0 size 8388608 starting block 4970080384)
```

Whats interesting though, if I remove a drive with no entries in the badblocks list, then add it back... Once it has synced, that drive will have the same badblocks list as all the others.

I now have 5 drives in the array with the same badblocks list.  I am sure that if I take each drive out one by one, and add them all back in, every drive would have the same badblocks list.  Should the badblocks list be replicating like this?

I can't even remove the badblocks feature, because according to the man pages, if badblocks contain any entries, it can't be removed  :Sad: 

The next question I have, do the badblocks in the badblocks list map to locations on a physical device, or to locations within the array on the md device?  If it maps to bad blocks within the array, that makes sense why it would be propagated, and would also mean those badblocks could be passed to ext4 to get to avoid using that area of the filesystem.

----------

## krinn

If you want sector2 of drive1 the same content of sector2 in drive2, 3... you duplicates their sectors content. And have no way than accepting if any of the drive have sector3 dead, all drives will have sector3 mark dead (dead sector count == total different sectors dead on all drives)

You have another way by duplicating files instead, that is more flexible (you can use compression, dead sectors count == the biggest total of sector count on all drives, files content are the same on all disks, but sectors content are not), but the complexity to handle that have a great impact on performance.

with software raid, you can duplicate logical sectors, with hw you can only duplicate hw sectors (because to know the logical sectors, you must know the partition). So software raid array can combine different partitions from different disks, while hw arrays can only be made from disks.

----------

## frostschutz

if the same blocks are bad on 3+ disks, data for those blocks is gone (or at least considered such by mdadm), so sync won't get you data for those blocks back.

So after syncing the synced disks don't have valid data for these blocks, thus they are bad in a way.

You might have to turn off the bad block log to get rid of this issue (and remove disks that were not previously part of the raid, as those will be guaranteed to have wrong data in those blocks).

Please note my own experience with the bbl is very limited, hence I suggested the mailing list, ...

You can enable/disable bad block log using bbl / no-bbl options on assemble update, check mdadm manpage for details.

----------

## matt2kjones

Hi,

I was going to remove the badblocks log but according to man, if there is anything stored in the badblocks log you can't remove it, IE, you can only remove the badblocks log if there are currently no badblocks logged on that drive.

So it seems that I am stuck in an error state that I can't get out of.  MDADM adds badblocks to all drives that I add or remove..  the badblocks are not passed down to the filesystem level, so I can't even get ext4 to ignore the badblocks to avoid corruption, and I can't remove the badblocks list from any harddrives.

I could fail and replace each disc with a new disc one at a time and would still have array that is unusable.

I have posted this thread and additional information to the kernel mailing list and I haven't had any replies, and as there is so little information on mdadm badblocks on the internet, im going to have to destroy the array and start fresh, rebuilding the data from the master server as I can't spend all of next week on this as well (Spent two weeks trying to get it operating so far). When I re-create the array I will leave badblocks on - I guess it got into this state due to an early broken implementation.

Thanks for all the help

----------

## frostschutz

 *matt2kjones wrote:*   

> you can only remove the badblocks log if there are currently no badblocks logged on that drive.

 

That sucks.

You could patch-remove that check from mdadm source code though. Or edit the metadata directly, although that also involves updating the metadata checksum.

Or re-create in-place but that's probably the most dangerous choice of all as it's so easy to get wrong, you can't rely on default values (defaults change over time) so if you do re-create you have to specify everything (metadata version, data offset, raid level, chunk size, layout, disk order, ...). And that with assume clean, and if you added any disks after blocks were already marked as bad (such as if you replaced your drive that was actually faulty) you should add that as 'missing' so it can sync in with the "original" data of those "bad" blocks just in case it's relevant for anything (and I guess it is as otherwise you'd not have hit the errors yourself).

----------

## matt2kjones

This server acts as a backup that we can quickly grab files off, or a server we can switch over to if our master fails, so I am in a position where I can just destroy the array and re-create it.

Would have been nice to find a way out of this situation other than starting clean though.

I was thinking of stopping the array, then using dd to write zeros to the location of the badblocks list on each drive as I'm not worried about the data in those locations, then force a check on the array, but I guess I would have run into issues doing that and seemed like a lot of work for something that probably wouldn't have worked.

Again, thanks for all the help. Really appreciate all the help you've given.

----------

## matt2kjones

OK,

So it has been a few weeks since I re-created the array with all the same drives.

Written about 80TB to the array though our backup process which deletes old files and copies new ones.

So far, using the same set of discs, no I/O errors, nothing dodgy in dmesg, and no bad blocks on any of the drives.

Not really sure what to make of it...

----------

