# Raid5 unusable after reboot (help)

## Muddy

Ok here goes.

I've been messing with this for two days straight now and I'm at the end of the rope.

I have 4x Sata drives on a sil3114 pci add on card that I've setup in a raid5 group on /dev/md2 using mdadm.

I've done this four of five times now and it always ends up the same, I reboot to test and make sure the raid group

is ok and the kernel (dmesg) complains the superblock is bad.

Here is the scan after this last setup.

(my other two raid groups are raid1 and part of a lvm group.

This raid5 is not)

```
mustang etc # mdadm --detail --scan

ARRAY /dev/md0 level=raid1 num-devices=2 UUID=1ee59ab3:7a3f2a33:65f30751:42108948

ARRAY /dev/md1 level=raid1 num-devices=2 UUID=af96a87c:e43f81a0:1446f685:c5adeb47

ARRAY /dev/md2 level=raid5 num-devices=4 UUID=eb8b16ac:cd5f09be:b0a196d0:33fa32e7

```

```

and here is the current status (pre-reboot)

mustang etc # mdadm --detail /dev/md2

/dev/md2:

        Version : 00.90.03

  Creation Time : Tue Jan 16 14:52:59 2007

     Raid Level : raid5

     Array Size : 234444288 (223.58 GiB 240.07 GB)

    Device Size : 78148096 (74.53 GiB 80.02 GB)

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 2

    Persistence : Superblock is persistent

    Update Time : Tue Jan 16 21:08:48 2007

          State : clean

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0

         Layout : left-symmetric

     Chunk Size : 64K

           UUID : eb8b16ac:cd5f09be:b0a196d0:33fa32e7

         Events : 0.2

    Number   Major   Minor   RaidDevice State

       0       8        1        0      active sync   /dev/sda1

       1       8       17        1      active sync   /dev/sdb1

       2       8       33        2      active sync   /dev/sdc1

       3       8       49        3      active sync   /dev/sdd1

```

and here is my current mdadm.conf file (in /etc)

```

# paste this inside

DEVICE          /dev/hde*

DEVICE          /dev/hdg*

ARRAY           /dev/md0 devices=/dev/hde1,/dev/hdg1

# paste this inside

DEVICE          /dev/hdb*

DEVICE          /dev/hdc*

ARRAY           /dev/md1 devices=/dev/hdb1,/dev/hdc1

# paste this inside

DEVICE         /dev/sda*

DEVICE         /dev/sdb*

DEVICE         /dev/sdc*

DEVICE         /dev/sdd*

ARRAY          /dev/md2 devices=/dev/sda1,/dev/sdb1,/dev/sdc1,/dev/sdd1

```

I've double checked all four drives they are setup as fd volumes and the raidgroup has been formatted with reiserfs and I've mounted it under

/temp to verify it can read/write and it can just fine.

Also checked over my kernel but my thinking is if this can go this far, get setup and have me mount it and read/write data would not the kernel be ok?

So any ideas as to what/where to look I'm listening.

Regards, 

Muddy

----------

## Muddy

Tried it again this time with no mdadm.conf file to see what would happen, same thing.

here is the output from dmesg..

 *Quote:*   

> 
> 
> Linux version 2.6.18-gentoo-r6 (root@mustang) (gcc version 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1, ssp-3.3.5.20050130-1, pie-8.7.7.1)) #2 SMP Wed Jan 10 01:13:37 EST 2007
> 
> (removed extra output)
> ...

 

The more I read over the log/dmesg files I'm thinking if maybe I can tell mdadm to NOT autodetect the raid arrays and just read the config file that would work.

----------

## user123

Try to assemble the array manually instead.

Stop any running arrays using mdadm -S /dev/md0 (for example).

Use mdadm -E /dev/hda1 (also an example) to examine the md-superblock on /dev/hda1.

Do this for every partition you know is part of an array and figure which devices belong to which array.

Say you've found out /dev/hda1, /dev/hdc1, /dev/hde1 are part of your raid 5 array.

Then just assemble it using mdadm --assemble /dev/md0 /dev/hda1 /dev/hdc1 /dev/hde1.

----------

## Muddy

did that all looks good, however here is something odd.. ran monitor for the new raid5 array and check this out.

```
 # mdadm --monitor /dev/md2

Jan 17 13:49:19: SparesMissing on /dev/md2 unknown device

```

[/quote]

I don't have spares configured.

```
# mdadm --detail /dev/md2

/dev/md2:

        Version : 00.90.03

  Creation Time : Wed Jan 17 11:40:03 2007

     Raid Level : raid5

     Array Size : 234444288 (223.58 GiB 240.07 GB)

    Device Size : 78148096 (74.53 GiB 80.02 GB)

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 2

    Persistence : Superblock is persistent

    Update Time : Wed Jan 17 13:39:36 2007

          State : clean

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0

         Layout : left-symmetric

     Chunk Size : 64K

           UUID : f43616a6:b3a2de12:713ef095:8b48b649

         Events : 0.6

    Number   Major   Minor   RaidDevice State

       0       8        1        0      active sync   /dev/sda1

       1       8       17        1      active sync   /dev/sdb1

       2       8       33        2      active sync   /dev/sdc1

       3       8       49        3      active sync   /dev/sdd1
```

any ideas what that message is referring to?

----------

## user123

It might be something in /etc/mdadm.conf that confuses mdadm --monitor.

Just a wild guess though.

----------

## Muddy

no, I'm at a loss at this point.

I've even tried to create the raidgroup using the new metadata specs 1.0, 1.1 and 1.2.

For whatever reason I can't create the raidgroup at all unless I use the default 0.90.

Then no matter what I try upon reboot the raidgroup fails to creaate with bad superblock in dmesg.

here is the current config file

```
# cat /etc/mdadm.conf

ARRAY /dev/md0 level=raid1 num-devices=2 UUID=1ee59ab3:7a3f2a33:65f30751:42108948

ARRAY /dev/md1 level=raid1 num-devices=2 UUID=af96a87c:e43f81a0:1446f685:c5adeb47

ARRAY /dev/md2 level=raid5 num-devices=4 metadata=0.90 UUID=b7baefd6:8ae1bd7f:8212f91b:66a91381

```

----------

## Muddy

Still can't get this working.

Just for kicks made it raid0 and it worked fine.

Tried Raid6 and it failed on reboot as well.

Any ideas on if it's something in the kernel?

----------

## user123

Try upgrading to 2.6.19.2.

There's a data corruption bug in dm/dm-crypt with canceled BIOs, if i recall correctly, in earlier versions which was fixed in 2.6.19.

The fix was backported to 2.6.18.6 aswell.

----------

## Muddy

went to 2.6.19-gentoo-r2 and created the raid5 group again, formatted it with reiserfs, updated the mdadm.conf file and rebooted.

with this.. again.

```
md: invalid superblock checksum on sda1

md: sda1 has invalid sb, not importing!

md: invalid superblock checksum on sdb1

md: sdb1 has invalid sb, not importing!

md: invalid superblock checksum on sdc1

md: sdc1 has invalid sb, not importing!

md: invalid superblock checksum on sdd1

md: sdd1 has invalid sb, not importing!md: md2 stopped.

md: invalid superblock checksum on sdb1

md: sdb1 has invalid sb, not importing!

md: md_import_device returned -22

md: invalid superblock checksum on sdc1

md: sdc1 has invalid sb, not importing!

md: md_import_device returned -22

md: invalid superblock checksum on sdd1

md: sdd1 has invalid sb, not importing!

md: md_import_device returned -22

md: invalid superblock checksum on sda1

md: sda1 has invalid sb, not importing!

md: md_import_device returned -22

```

```

ReiserFS: md2: warning: sh-2006: read_super_block: bread failed (dev md2, block 2, size 4096)

ReiserFS: md2: warning: sh-2006: read_super_block: bread failed (dev md2, block 16, size 4096)

ReiserFS: md2: warning: sh-2021: reiserfs_fill_super: can not find reiserfs on md2

```

*sigh* 

making it again.. but this time I will stop it and restart it before rebooting to see if it comes back together.

```
 mdadm --detail /dev/md2                                                                                                 Tue Jan 23 12:59:50 2007

/dev/md2:

        Version : 00.90.03

  Creation Time : Tue Jan 23 12:53:07 2007

     Raid Level : raid5

     Array Size : 234444288 (223.58 GiB 240.07 GB)

    Device Size : 78148096 (74.53 GiB 80.02 GB)

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 2

    Persistence : Superblock is persistent

    Update Time : Tue Jan 23 12:53:07 2007

          State : clean, degraded, recovering

 Active Devices : 3

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 1

         Layout : left-symmetric

     Chunk Size : 64K

 Rebuild Status : 6% complete

           UUID : 2db141f2:e2acc70c:8b2a3fed:48ee2ef6

         Events : 0.1

    Number   Major   Minor   RaidDevice State

       0       8        1        0      active sync   /dev/sda1

       1       8       17        1      active sync   /dev/sdb1

       2       8       33        2      active sync   /dev/sdc1

       4       8       49        3      spare rebuilding   /dev/sdd1

```

Last edited by Muddy on Tue Jan 23, 2007 5:47 pm; edited 1 time in total

----------

## user123

1) Does it work if you skip the whole mdadm.conf part and assemble it manually on boot?

2) Are you sure you don't have a hardware error?

Faulty disks, faulty memory, a faulty or not fully plugged in PCI IDE controller or something?

You can test memory by emerging memtest86 and add it to grub.conf and then boot it using grub.

----------

## Muddy

Like I said I have 2x raid1 groups using lvm and it's flawless.

Only thing I can think is maybe the sil3114 pci sata card or one of the drives are goofed... however I've run the dd zero disk on them so many times now without error that I'd guess that it would of show itself by now.

----------

## user123

Weird.. and you created your LVM volume like this (assuming md2 is your raid5 array)?

pvcreate /dev/md2

vgcreate vgraid5 /dev/md2 

..

..and not:

pvcreate /dev/hda2 (assuming /dev/hda2 is part of the raid5 array "md2")

pvcreate /dev/hdc2 (assuming /dev/hdc2 is part of the raid5 array "md2")

pvcreate /dev/hde2 (assuming /dev/hde2 is part of the raid5 array "md2")

Because in the later case I could understand your corruption problems..

----------

## Muddy

well the lvm2 groups are on md0 and m1 putting them together.

Aside from that yea, similar to what you typed in.

md2 the raid5 group that is killing me is/was going to be a stand alone volume from the lvm2 stuff.

The more I think about it the more I'm thinking udev is screwing me.

The UUID for the raid5 group changes each time I make it, should that be happening?

----------

## user123

Hmm.. Very odd.  :Smile: 

What if you do something like,

# Zap old superblocks to minimize the risk of md getting confused 

for part in sda1 sdb1 sdc1 sdd1; do mdadm --zero-superblock /dev/$part ; done

# Zap partitions and re-read partition tables

for drive in sda sdb sdc sdd; do 

  dd if=/dev/zero of=/dev/$drive count=1

  blockdev --rereadpt /dev/$drive

done

# Try to create array, this time not using partitions, and storing the superblock 4K into the drive instead (-e 1.2) -- just to make sure the problem doesn't have to do with a read/write problem at the end of the disk or somethign

mdadm --create /dev/md2 -l5 -n4 -e 1.2 /dev/sdb /dev/sda /dev/sdd /dev/sdc

# Stop the array

mdadm -S /dev/md2

# Try assembling it again

mdadm -A /dev/md2 /dev/sda /dev/sdb /dev/sdc /dev/sdd

# Create reiserfs on /dev/md2

mkreiserfs -q /dev/md2

# Stop the array

mdadm -S /dev/md2 

# Reboot and see if the filesystem is still there after manually reassembling the device

reboot 

# ..and then when the system comes up again, mdadm -A /dev/md2 /dev/sda /dev/sdb /dev/sdc /dev/sdd)

Don't forget to remove the corresponding lines from /etc/mdadm.conf

----------

## user123

Also..

I think you said it never worked when you rebooted.

But it worked if you manually stopped the array and re-assembled it again.

Doesn't that indicate that there's something in either the shutdown- or boot-scripts that mess with those devices?

How about trying to flush the buffers with the sync command, wait a few secs and then do a hard poweroff, boot into single-user mode, then try to manually reassemble the array and see if it gives any errors.

At least that would rule out the shutdown- and init-scripts.

----------

## Muddy

I've not been able to re-assemble after a boot.

Waiting on it to finish the current sync, then trying some of what you said.

Will update when I know more.

----------

## Muddy

lmao, oh man I've got some screwed up stuff.

finished building the array on /dev/sd[abcd]1 and then ran mdadm -S /dev/md2

then ran the mdadm -A /dev/md2 and it failed, but for kicks I just arrowed up and hit enter a bunch of times, then the freak show started.

every few times it would actually build with only one drive (random), then once it built with two... but no more.

I'm at a total loss.

 :Shocked: 

----------

## Muddy

cleaned all four drives with no partitions at all, ended up having to use the --force option on creation

```

mdadm --create --force --verbose -e 1.2 /dev/md2 --level=5 --raid-devices=4 /dev/sda /dev/sdb /dev/sdc /dev/sdd
```

but it's building

```
# mdadm --detail /dev/md2

/dev/md2:

        Version : 01.02.03

  Creation Time : Tue Jan 23 19:44:01 2007

     Raid Level : raid5

     Array Size : 234451968 (223.59 GiB 240.08 GB)

    Device Size : 156301312 (74.53 GiB 80.03 GB)

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 2

    Persistence : Superblock is persistent

    Update Time : Tue Jan 23 19:44:01 2007

          State : clean, resyncing

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0

         Layout : left-symmetric

     Chunk Size : 64K

 Rebuild Status : 0% complete

           Name : 2

           UUID : 64ca03e9:56bd6c9e:1b2cf477:9fe18e19

         Events : 0

    Number   Major   Minor   RaidDevice State

       0       8        0        0      active sync   /dev/sda

       1       8       16        1      active sync   /dev/sdb

       2       8       32        2      active sync   /dev/sdc

       3       8       48        3      active sync   /dev/sdd

```

----------

## Muddy

no go same crap

here is after the sync before I did the mdadm -S on /dev/md2

```
# mdadm --detail /dev/md2

/dev/md2:

        Version : 01.02.03

  Creation Time : Tue Jan 23 19:44:01 2007

     Raid Level : raid5

     Array Size : 234451968 (223.59 GiB 240.08 GB)

    Device Size : 156301312 (74.53 GiB 80.03 GB)

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 2

    Persistence : Superblock is persistent

    Update Time : Tue Jan 23 21:50:38 2007

          State : clean

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0

         Layout : left-symmetric

     Chunk Size : 64K

           Name : 2

           UUID : 64ca03e9:56bd6c9e:1b2cf477:9fe18e19

         Events : 2

    Number   Major   Minor   RaidDevice State

       0       8        0        0      active sync   /dev/sda

       1       8       16        1      active sync   /dev/sdb

       2       8       32        2      active sync   /dev/sdc

       3       8       48        3      active sync   /dev/sdd

mustang ~ # cat /proc/mdstat 

Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] 

md2 : active raid5 sdd[3] sdc[2] sdb[1] sda[0]

      234451968 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

      

md1 : active raid1 hdc1[1] hdb1[0]

      30018112 blocks [2/2] [UU]

      

md0 : active raid1 hdg1[1] hde1[0]

      117218176 blocks [2/2] [UU]

      

unused devices: <none>
```

and the output from /var/log/messages

```

Jan 23 19:44:01 mustang RAID5 conf printout:

Jan 23 19:44:01 mustang --- rd:4 wd:4

Jan 23 19:44:01 mustang disk 0, o:1, dev:sda

Jan 23 19:44:01 mustang disk 1, o:1, dev:sdb

Jan 23 19:44:01 mustang disk 2, o:1, dev:sdc

Jan 23 19:44:01 mustang disk 3, o:1, dev:sdd

Jan 23 19:44:01 mustang md: resync of RAID array md2

Jan 23 19:44:01 mustang md: minimum _guaranteed_  speed: 1000 KB/sec/disk.

Jan 23 19:44:01 mustang md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.

Jan 23 19:44:01 mustang md: using 128k window, over a total of 78150656 blocks.

Jan 23 19:44:01 mustang mdadm: NewArray event detected on md device /dev/md2

Jan 23 20:10:02 mustang mdadm: Rebuild20 event detected on md device /dev/md2

Jan 23 20:35:03 mustang mdadm: Rebuild40 event detected on md device /dev/md2

Jan 23 21:00:04 mustang mdadm: Rebuild60 event detected on md device /dev/md2

Jan 23 21:26:04 mustang mdadm: Rebuild80 event detected on md device /dev/md2

Jan 23 21:50:38 mustang md: md2: resync done.

Jan 23 21:50:38 mustang RAID5 conf printout:

Jan 23 21:50:38 mustang --- rd:4 wd:4

Jan 23 21:50:38 mustang disk 0, o:1, dev:sda

Jan 23 21:50:38 mustang disk 1, o:1, dev:sdb

Jan 23 21:50:38 mustang disk 2, o:1, dev:sdc

Jan 23 21:50:38 mustang disk 3, o:1, dev:sdd

Jan 23 21:50:38 mustang mdadm: RebuildFinished event detected on md device /dev/md2

mustang ~ # 

```

Then I did the mdadm -S command

```
 # mdadm -S /dev/md2

mdadm: stopped /dev/md2

```

and tried to restart

```
 mdadm -A /dev/md2 /dev/sda /dev/sdb /dev/sdc /dev/sdd

mdadm: failed to add /dev/sdb to /dev/md2: Invalid argument

mdadm: failed to add /dev/sdc to /dev/md2: Invalid argument

mdadm: failed to add /dev/sdd to /dev/md2: Invalid argument

mdadm: failed to add /dev/sda to /dev/md2: Invalid argument

mdadm: /dev/md2 assembled from 0 drives - not enough to start the array.

```

from the log when I tried the restart

```
Jan 23 22:00:37 mustang md: md2 stopped.

Jan 23 22:00:37 mustang md: unbind<sdd>

Jan 23 22:00:37 mustang md: export_rdev(sdd)

Jan 23 22:00:37 mustang md: unbind<sdc>

Jan 23 22:00:37 mustang md: export_rdev(sdc)

Jan 23 22:00:37 mustang md: unbind<sdb>

Jan 23 22:00:37 mustang md: export_rdev(sdb)

Jan 23 22:00:37 mustang md: unbind<sda>

Jan 23 22:00:37 mustang md: export_rdev(sda)

Jan 23 22:00:37 mustang mdadm: DeviceDisappeared event detected on md device /dev/md2

Jan 23 22:00:56 mustang md: md2 stopped.

Jan 23 22:00:57 mustang md: invalid superblock checksum on sdb

Jan 23 22:00:57 mustang md: sdb has invalid sb, not importing!

Jan 23 22:00:57 mustang md: md_import_device returned -22

Jan 23 22:00:57 mustang md: invalid superblock checksum on sdc

Jan 23 22:00:57 mustang md: sdc has invalid sb, not importing!

Jan 23 22:00:57 mustang md: md_import_device returned -22

Jan 23 22:00:57 mustang md: invalid superblock checksum on sdd

Jan 23 22:00:57 mustang md: sdd has invalid sb, not importing!

Jan 23 22:00:57 mustang md: md_import_device returned -22

Jan 23 22:00:57 mustang md: invalid superblock checksum on sda

Jan 23 22:00:57 mustang md: sda has invalid sb, not importing!

Jan 23 22:00:57 mustang md: md_import_device returned -22

Jan 23 22:01:28 mustang md: md2 stopped.

```

any ideas?

----------

## drescherjm

Do you have a recent version of mdadm installed?

----------

## Muddy

 *drescherjm wrote:*   

> Do you have a recent version of mdadm installed?

 

mdadm-2.5.2

----------

## drescherjm

I thought that might be the problem as a lot of updates/changes have occurred to linux software raid in the last couple of kernels. 

At home I am using that same version of mdadm with a 2.6.18 kernel but that it is at raid1 and at work we have many systems using a few different kernels and software raid 1, 5 and 6 but most of the systems still have sys-fs/mdadm-1.12.0 ( I just checked) which is no longer in portage and I assume there is no support for raid5 reshaping.

----------

## Muddy

yea, I've been pulling what's left of my hair out for a bit now.

My best friend suggested genkernel, so giving that a shot to see if it helps.

----------

## Muddy

genkernel did not do anything different, will do some other (outside Gentoo) trials and report back.

----------

## Laitr Keiows

 *Muddy wrote:*   

> and tried to restart
> 
> ```
>  mdadm -A /dev/md2 /dev/sda /dev/sdb /dev/sdc /dev/sdd
> 
> ...

 

What if you try to create them again? with --create

----------

## Muddy

Can't, it crabs and complains about being part of a raid array, even if I dd zero the superblock, I have to do a full dd zero on the entire drive (all four) and then try it again.

*edit*

Decided to run full emerge world to make sure everything is up2date.

On my last attempt, I found I could re-assemble the array if I ran chmod 777 on all drives and the /dev/md2 device.

Then upon reboot it did not work at all.

Now I'm thinking it's something to do with udev screwing with things, running full update on the box to ensure the next attempts will be unaffected by software or config problems.

----------

