# mdadm raid5 failure

## Kein

I recently set up a raid5 array using mdadm with 4 1TB disks and no spares.  It was working fine up until I rebooted my computer this morning.  mdadm still recognizes all the driver correctly and reports no problems when doing

```
mdadm -A /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
```

or

```
mdadm -A /dev/md0 -u<uid>
```

and the results of mdadm -E /dev/sd*1 and cat /proc/mdstat all look fine.

however, when I try and mount /dev/md0 it fails with a no superblock message.

fdisk -l reports

```
Disk /dev/md0: 3000.6 GB, 3000606523392 bytes

2 heads, 4 sectors/track, 732569952 cylinders

Units = cylinders of 8 * 512 = 4096 bytes

Disk identifier: 0x00000000

Disk /dev/md0 doesn't contain a valid partition table

```

and the results of fsck.jfs /dev/md0 are

```
fsck.jfs /dev/md0

fsck.jfs version 1.1.13, 17-Jul-2008

processing started: 9/23/2009 10.48.18

Using default parameter: -p

The current device is:  /dev/md0

Block size in bytes:  4096

Filesystem size in blocks:  732569952

**Phase 0 - Replay Journal Log

logredo failed (rc=-268).  fsck continuing.

**Phase 1 - Check Blocks, Files/Directories, and  Directory Entries

Duplicate block references have been detected in Metadata.  CANNOT CONTINUE.

```

can someone please help me?   I have a TB of data on there that I really don't want to lose...

----------

## py-ro

Large Block Devices activated in your Kernel?

Py

----------

## Kein

I belive so, I've been able to do a restart before.  I am using the genkerenel/initrd method because I couldn't make my own work for some reason with this box.  Is there any way to check if it's enabled?

----------

## Mike Hunt

```
grep LBDAF /usr/src/linux/.config
```

```
Symbol: LBDAF [=y]                                                      │

  │ Prompt: Support for large (2TB+) block devices and files                │

  │   Defined at block/Kconfig:26                                           │

  │   Depends on: BLOCK && !64BIT                                           │

  │   Location:                                                             │

  │     -> Enable the block layer (BLOCK [=y])
```

----------

## Kein

Hmmm grep LBDAF /usr/src/linux/.config returns nothing, the symbol is never defined

----------

## richard.scott

The /dev/md0 device has been assembled ok... you now need to mount it?

The "Disk /dev/md0 doesn't contain a valid partition table" just means that the device you are listing doesn't have a partition table... IMHO its a common thing to see for /dev/md devices as I have it on my system too.

what does "cat /proc/mdstat" show?

You should see something like this (all be it its for RAID1 not RAID5, but you'll ge the idea:

```
# cat /proc/mdstat

Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]

md0 : active raid1 sdb1[1] sda1[0]

      96256 blocks [2/2] [UU]

```

Aslo, try mount /dev/md0 /mnt/gentoo and then look in /mnt/gentoo

----------

## Kein

i thought every device had to have a partition table, but i guess i was wrong about that.  still, no luck mounting things.

```
~ $ cat /proc/mdstat

Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]

md0 : active raid5 sda1[0] sdd1[3] sdc1[2] sdb1[1]

      2930279808 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>

```

```
# mount /dev/md0 /cd1

mount: wrong fs type, bad option, bad superblock on /dev/md0,

       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try

       dmesg | tail  or so

```

dmesg after i try and mount is 

```
md: bind<sdb1>

md: bind<sdc1>

md: bind<sdd1>

md: bind<sda1>

raid5: device sda1 operational as raid disk 0

raid5: device sdd1 operational as raid disk 3

raid5: device sdc1 operational as raid disk 2

raid5: device sdb1 operational as raid disk 1

raid5: allocated 4216kB for md0

raid5: raid level 5 set md0 active with 4 out of 4 devices, algorithm 2

RAID5 conf printout:

 --- rd:4 wd:4

 disk 0, o:1, dev:sda1

 disk 1, o:1, dev:sdb1

 disk 2, o:1, dev:sdc1

 disk 3, o:1, dev:sdd1

 md0: unknown partition table

```

it's intresting to note that fsck thinks the super block is valid

```

# fsck.jfs -nv /dev/md0

fsck.jfs version 1.1.13, 17-Jul-2008

processing started: 9/23/2009 16.11.12

The current device is:  /dev/md0

Open(...READONLY...) returned rc = 0

Primary superblock is valid.

The type of file system for the device is JFS.

Block size in bytes:  4096

Filesystem size in blocks:  732569952

**Phase 1 - Check Blocks, Files/Directories, and  Directory Entries

Secondary file/directory allocation structure (2) is not a correct redundant copy of primary structure.

Duplicate reference to 1 block(s) beginning at offset 419430400 found in file system object IA16.

Duplicate reference to 4 block(s) beginning at offset 419430404 found in file system object IA16.

Duplicate reference to 4 block(s) beginning at offset 419430412 found in file system object IA16.

Duplicate reference to 4 block(s) beginning at offset 419432800 found in file system object IA16.

Duplicate reference to 4 block(s) beginning at offset 419465324 found in file system object IA16.

```

many more duplicate reference entries after that

----------

## py-ro

You are missing Large Block Device Support.

Activate it in your Kernel and reformat your md Device.

Py

----------

## Kein

I actualy think I do have Large device support.  I looked around in menuconfig and I have the option set in there.  the .config file has CONFIG_LBD=y set, which is what menuconfig said it should be.

Also, I've formated and used this drive for about a month.. I don't think my kernel would suddenly drop LBD support.

----------

## energyman76b

how about running fsck to fix stuff?

----------

## Akkara

Is there any chance that the device names (sd{a..d}) might have gotten transposed around during one of the mounts and that corrupted some things?   Like if on boot, bios detects drives in a different order which then ends up with different device names from the usual.  I *think* raid probably would detect this kind of error but I don't know raid well enough to know whether it could be a problem.

----------

