# mdadm - stuck reshape operation

## Cephas

Hi,

I have a 3 disk RAID 5 array that I tried to add a 4th disk to.

```
> mdadm --add /dev/md6 /dev/sdb1

> mdadm --grow --raid-devices=4 /dev/md6
```

This operation started successfully and proceeded until it hit 51.1%

```
> cat /proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]

md6 : active raid5 sda1[0] sdb1[5] sdf1[3] sde1[4]

      3906764800 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

      [==========>..........]  reshape = 51.1% (998533632/1953382400) finish=9046506.1min speed=1K/sec

      bitmap: 0/15 pages [0KB], 65536KB chunk
```

It has been sitting on the same 998533632 position for days. I've tried a few reboots, but it never progresses.

Stopping the array, or trying to start the logical volume in it hangs.

Altering the min / max speed parameters has no effect. 

When I reboot and resemble the array the speed indicated steadily drops to almost 0.

```
> mdadm --assemble /dev/md6 --verbose --uuid 90c2b5c3:3bbfa0d7:a5efaeed:726c43e2
```

I haven't tried anything more drastic than a reboot yet,

Below is as much information as I can think to provide at this stage. Please let me know what else I can do. 

I'm happy to change kernels, kernel config or anything else require to get better info.

Kernel: 4.4.3

mdadm 3.4

```
> ps aux | grep md6

root      5041 99.9  0.0      0     0 ?        R    07:10 761:58 [md6_raid5]

root      5042  0.0  0.0      0     0 ?        D    07:10   0:00 [md6_reshape]
```

This is consistent. 100% cpu on the raid component, but not the reshape

My guess is that the sync and reshape operation are entangled somehow.

```
> mdadm --detail --verbose /dev/md6

/dev/md6:

        Version : 1.2

  Creation Time : Fri Aug 29 21:13:52 2014

     Raid Level : raid5

     Array Size : 3906764800 (3725.78 GiB 4000.53 GB)

  Used Dev Size : 1953382400 (1862.89 GiB 2000.26 GB)

   Raid Devices : 4

  Total Devices : 4

    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed Apr 27 07:10:07 2016

          State : clean, reshaping

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0

         Layout : left-symmetric

     Chunk Size : 512K

 Reshape Status : 51% complete

  Delta Devices : 1, (3->4)

           Name : Alpheus:6  (local to host Alpheus)

           UUID : 90c2b5c3:3bbfa0d7:a5efaeed:726c43e2

         Events : 47975

    Number   Major   Minor   RaidDevice State

       0       8        1        0      active sync   /dev/sda1

       4       8       65        1      active sync   /dev/sde1

       3       8       81        2      active sync   /dev/sdf1

       5       8       17        3      active sync   /dev/sdb1

```

```
> iostat

Linux 4.4.3-gentoo (Alpheus)    04/27/2016      _x86_64_        (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           1.84    0.00   24.50    0.09    0.00   73.57

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn

sda               0.02         2.72         1.69     128570      79957

sdb               0.01         0.03         1.69       1447      79889

sdd               3.85         2.27        56.08     106928    2646042

sde               0.02         2.73         1.69     128610      79961

sdf               0.02         2.72         1.69     128128      79961

sdc               4.08         5.44        56.08     256899    2646042

md0               2.91         7.62        55.08     359714    2598725

dm-0              0.00         0.03         0.00       1212          0

dm-1              0.00         0.05         0.00       2151          9

dm-2              2.65         6.52         3.42     307646     161296

dm-3              0.19         1.03        51.66      48377    2437420

md6               0.00         0.02         0.00       1036          0
```

```
> dmesg

[ 1199.426995] md: bind<sde1>

[ 1199.427779] md: bind<sdf1>

[ 1199.428379] md: bind<sdb1>

[ 1199.428592] md: bind<sda1>

[ 1199.429260] md/raid:md6: reshape will continue

[ 1199.429274] md/raid:md6: device sda1 operational as raid disk 0

[ 1199.429275] md/raid:md6: device sdb1 operational as raid disk 3

[ 1199.429276] md/raid:md6: device sdf1 operational as raid disk 2

[ 1199.429277] md/raid:md6: device sde1 operational as raid disk 1

[ 1199.429498] md/raid:md6: allocated 4338kB

[ 1199.429807] md/raid:md6: raid level 5 active with 4 out of 4 devices, algorithm 2

[ 1199.429810] RAID conf printout:

[ 1199.429811]  --- level:5 rd:4 wd:4

[ 1199.429812]  disk 0, o:1, dev:sda1

[ 1199.429814]  disk 1, o:1, dev:sde1

[ 1199.429816]  disk 2, o:1, dev:sdf1

[ 1199.429817]  disk 3, o:1, dev:sdb1

[ 1199.429993] created bitmap (15 pages) for device md6

[ 1199.430297] md6: bitmap initialized from disk: read 1 pages, set 0 of 29807 bits

[ 1199.474604] md6: detected capacity change from 0 to 4000527155200

[ 1199.474611] md: reshape of RAID array md6

[ 1199.474613] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.

[ 1199.474614] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.

[ 1199.474617] md: using 128k window, over a total of 1953382400k.
```

```
> lsblk

NAME                          MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT

sda                             8:0    0  1.8T  0 disk

└─sda1                          8:1    0  1.8T  0 part

  └─md6                         9:6    0  3.7T  0 raid5

sdb                             8:16   0  1.8T  0 disk

└─sdb1                          8:17   0  1.8T  0 part

  └─md6                         9:6    0  3.7T  0 raid5

sdc                             8:32   0  2.7T  0 disk

├─sdc1                          8:33   0   16M  0 part

└─sdc2                          8:34   0  2.7T  0 part

  └─md0                         9:0    0  2.7T  0 raid1

    ├─vg--mirror-swap         253:0    0    4G  0 lvm   [SWAP]

    ├─vg--mirror-boot         253:1    0  256M  0 lvm   /boot

    ├─vg--mirror-root         253:2    0  256G  0 lvm   /

    └─vg--mirror-data--mirror 253:3    0  2.5T  0 lvm   /data/mirror

sdd                             8:48   0  2.7T  0 disk

├─sdd1                          8:49   0   16M  0 part

└─sdd2                          8:50   0  2.7T  0 part

  └─md0                         9:0    0  2.7T  0 raid1

    ├─vg--mirror-swap         253:0    0    4G  0 lvm   [SWAP]

    ├─vg--mirror-boot         253:1    0  256M  0 lvm   /boot

    ├─vg--mirror-root         253:2    0  256G  0 lvm   /

    └─vg--mirror-data--mirror 253:3    0  2.5T  0 lvm   /data/mirror

sde                             8:64   0  1.8T  0 disk

└─sde1                          8:65   0  1.8T  0 part

  └─md6                         9:6    0  3.7T  0 raid5

sdf                             8:80   0  1.8T  0 disk

└─sdf1                          8:81   0  1.8T  0 part

  └─md6                         9:6    0  3.7T  0 raid5
```

Thanks for any pointers

----------

## frostschutz

There are no I/O errors in dmesg? What kind of disks are you using, and is it SATA/USB/...? Show smartctl -a and run a -t long selftest for all disks?

Maybe one disk has problems without reporting errors properly, and instead spending lots of time in internal error correction... that is one case where software raid fails because it relies on disks reporting errors properly.

I'm running RAID-5 in 4.5.x kernel without problems so maybe that's also worth a try, just in case there was a kernel bug at some point, I don't know.

What's your stripe cache set to? http://www.cyberciti.biz/tips/linux-raid-increase-resync-rebuild-speed.html

----------

## deck2

Hello, 

Did you happen to find a solution for this?  I observe the exact same issue. 

I had a drive legitimately fail during reshape which caused mdadm to lock up with full cpu usage. I subsequently stopped the array, and rebooted. Upon trying to start the array, the md127_raid5 process immediately spikes to 100% usage, and the array becomes completely unresponsive.  Other processes involving the array show blocking on IO. 

I've tried pretty much everything, including failing out the faulty drive.  When ever I try to start the array, the process locks up. 

```

mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 --verbose --backup-file=/tmp/grow_md127.bak

mdadm: looking for devices for /dev/md127

mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 3.

mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 2.

mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 1.

mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 4.

mdadm: /dev/md127 has an active reshape - checking if critical section needs to be restored

mdadm: No backup metadata on /tmp/grow_md127.bak

mdadm: accepting backup with timestamp 1467397557 for array with timestamp 1467774201

mdadm: backup-metadata found on device-4 but is not needed

mdadm: no uptodate device for slot 0 of /dev/md127

mdadm: added /dev/sdc1 to /dev/md127 as 2

mdadm: added /dev/sdb1 to /dev/md127 as 3

mdadm: added /dev/sde1 to /dev/md127 as 4

mdadm: added /dev/sdd1 to /dev/md127 as 1

mdadm: /dev/md127 has been started with 4 drives (out of 5).

```

```

[ 2662.922539] md: bind<sdc1>

[ 2662.926020] md: bind<sdb1>

[ 2662.926333] md: bind<sde1>

[ 2662.926564] md: bind<sdd1>

[ 2662.960617] raid6: sse2x1   gen()  3484 MB/s

[ 2662.977603] raid6: sse2x1   xor()  3673 MB/s

[ 2662.994612] raid6: sse2x2   gen()  5652 MB/s

[ 2663.011601] raid6: sse2x2   xor()  5978 MB/s

[ 2663.028613] raid6: sse2x4   gen()  7156 MB/s

[ 2663.045601] raid6: sse2x4   xor()  3837 MB/s

[ 2663.045603] raid6: using algorithm sse2x4 gen() 7156 MB/s

[ 2663.045604] raid6: .... xor() 3837 MB/s, rmw enabled

[ 2663.045606] raid6: using intx1 recovery algorithm

[ 2663.051916] async_tx: api initialized (async)

[ 2663.058743] xor: measuring software checksum speed

[ 2663.068598]    prefetch64-sse: 13312.000 MB/sec

[ 2663.078597]    generic_sse: 12720.000 MB/sec

[ 2663.078599] xor: using function: prefetch64-sse (13312.000 MB/sec)

[ 2663.139818] md: raid6 personality registered for level 6

[ 2663.139822] md: raid5 personality registered for level 5

[ 2663.139823] md: raid4 personality registered for level 4

[ 2663.140432] md/raid:md127: reshape will continue

[ 2663.140454] md/raid:md127: device sdd1 operational as raid disk 1

[ 2663.140455] md/raid:md127: device sde1 operational as raid disk 4

[ 2663.140456] md/raid:md127: device sdb1 operational as raid disk 3

[ 2663.140457] md/raid:md127: device sdc1 operational as raid disk 2

[ 2663.140983] md/raid:md127: allocated 5432kB

[ 2663.142246] md/raid:md127: raid level 5 active with 4 out of 5 devices, algorithm 2

[ 2663.142251] RAID conf printout:

[ 2663.142253]  --- level:5 rd:5 wd:4

[ 2663.142254]  disk 1, o:1, dev:sdd1

[ 2663.142256]  disk 2, o:1, dev:sdc1

[ 2663.142257]  disk 3, o:1, dev:sdb1

[ 2663.142258]  disk 4, o:1, dev:sde1

[ 2663.142436] created bitmap (15 pages) for device md127

[ 2663.142682] md127: bitmap initialized from disk: read 1 pages, set 2 of 29807 bits

[ 2663.735434] md127: detected capacity change from 0 to 6000790732800

[ 2663.735465] md: reshape of RAID array md127

[ 2663.735477] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.

[ 2663.735483] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.

[ 2663.735498] md: using 128k window, over a total of 1953382400k.

[ 2663.989420]  md127: p1

```

```
ps aux | grep md127

root      4300 99.7  0.0      0     0 ?        R    23:03  15:05 [md127_raid5]

root      4301  0.0  0.0      0     0 ?        D    23:03   0:00 [md127_reshape]

root      4334  0.0  0.0   4900   712 pts/0    D+   23:03   0:00 mdadm -S /dev/md127

```

```
mdadm -D /dev/md127

/dev/md127:

        Version : 1.2

  Creation Time : Sun May 18 16:54:52 2014

     Raid Level : raid5

     Array Size : 5860147200 (5588.67 GiB 6000.79 GB)

  Used Dev Size : 1953382400 (1862.89 GiB 2000.26 GB)

   Raid Devices : 5

  Total Devices : 4

    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Jul  5 23:03:21 2016

          State : clean, degraded, reshaping 

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0

         Layout : left-symmetric

     Chunk Size : 128K

 Reshape Status : 94% complete

  Delta Devices : 1, (4->5)

           Name : rza.eth0.net:0  (local to host rza.eth0.net)

           UUID : 9d5d1606:414b51f8:b5173999:7239c63f

         Events : 345118

    Number   Major   Minor   RaidDevice State

       0       0        0        0      removed

       1       8       49        1      active sync   /dev/sdd1

       2       8       33        2      active sync   /dev/sdc1

       4       8       17        3      active sync   /dev/sdb1

       5       8       65        4      active sync   /dev/sde1

```

```

uname -R

4.5.7-200.fc23.x86_64

```

```

mdadm -V

mdadm - v3.3.4 - 3rd August 2015

```

Sorry for posting cross distro.

----------

## Cephas

I ended up starting from scratch unfortunately. 

Array just had tv shows and movies on it, so everything was recoverable over time.

Wiped the superblocks and re-created. 

Posted in the mdadm kernel group a bit. nothing worked.

Tried latest mdadm from source, plus just about every funky rebuild command I could find. 

Sorry.

----------

## deck2

Thanks for the reply.  I'll post back for future reference if I figure something out.

----------

## frostschutz

Try a SystemRescueCD ... your mdadm is old, for one, and not sure what to make of fc kernel.

If the kernel task keeps hanging in 100% CPU even with the current stable kernel, it's probably something for the linux-raid mailing list.

----------

## deck2

Thanks frostschutz, I've seen your name come up a bunch in the various forums, your posts on mdadm are very helpful. 

I've tried a couple different kernels / versions of mdadm.  I don't think the version I am using to too old, its 3.3.4 which is only superseded by 3.4, which Neil claims could still be 'buggy'.

I tried going back to a couple older kernels, and the behaviour is the same. 

The only additional piece of info is that I have observed in some cases, md127_reshape does run very briefly, and the count of reshaped blocks is incrementing slightly right when the array first assembles.  I think this supports Cephas' suspicion that md127_raid5 and md127_reshape are getting locked into some manner of IO contention.  In theory I could restart my PC 5000 times, and the reshape might complete. 

My next step is to try to figure out how to either manually start the reshape without raid5 running, or find a way to block raid5 from running when the array is assembled. 

I am also going to try building 3.4 from source and see what happens. 

I'll try posting on the mailing list also and see if there are any other suggestions there. 

thanks

----------

