# SOLVED Lost raid 5 device, reiserfs.  Other problems too.

## gustafson

Hi, I came to work this morning and all my network services were inexplicably down.  Thinking it might have been network issues, and since no one was logged on, I decided to take the opportunity to reboot the system.  This is when it all began to crumble!!!

In my diagnostic efforts, I found this: (output of grep md4 /var/log/messages):

```
Sep 25 21:52:16 spaceshipone md: created md4

Sep 25 21:52:16 spaceshipone raid5: not enough operational devices for md4 (3/4 failed)

Sep 25 21:52:16 spaceshipone raid5: failed to run raid set md4

Sep 25 21:52:16 spaceshipone md: md4 stopped.

Sep 25 21:52:16 spaceshipone ReiserFS: md4: warning: sh-2006: read_super_block: bread failed (dev md4, block 2, size 4096)

Sep 25 21:52:16 spaceshipone ReiserFS: md4: warning: sh-2006: read_super_block: bread failed (dev md4, block 16, size 4096)

Sep 25 21:52:16 spaceshipone ReiserFS: md4: warning: sh-2021: reiserfs_fill_super: can not find reiserfs on md4

Sep 26 11:14:21 spaceshipone md: created md4

Sep 26 11:14:21 spaceshipone raid5: not enough operational devices for md4 (3/4 failed)

Sep 26 11:14:21 spaceshipone raid5: failed to run raid set md4

Sep 26 11:14:21 spaceshipone md: md4 stopped.

Sep 26 11:14:21 spaceshipone ReiserFS: md4: warning: sh-2006: read_super_block: bread failed (dev md4, block 2, size 4096)

Sep 26 11:14:21 spaceshipone ReiserFS: md4: warning: sh-2006: read_super_block: bread failed (dev md4, block 16, size 4096)

Sep 26 11:14:21 spaceshipone ReiserFS: md4: warning: sh-2021: reiserfs_fill_super: can not find reiserfs on md4

Sep 26 11:17:38 spaceshipone md: created md4

Sep 26 11:17:38 spaceshipone raid5: not enough operational devices for md4 (3/4 failed)

Sep 26 11:17:38 spaceshipone raid5: failed to run raid set md4

Sep 26 11:17:38 spaceshipone md: md4 stopped.

Sep 26 11:17:38 spaceshipone ReiserFS: md4: warning: sh-2006: read_super_block: bread failed (dev md4, block 2, size 4096)

Sep 26 11:17:38 spaceshipone ReiserFS: md4: warning: sh-2006: read_super_block: bread failed (dev md4, block 16, size 4096)

Sep 26 11:17:38 spaceshipone ReiserFS: md4: warning: sh-2021: reiserfs_fill_super: can not find reiserfs on md4

Sep 26 11:19:54 spaceshipone ReiserFS: md4: warning: sh-2006: read_super_block: bread failed (dev md4, block 2, size 4096)

Sep 26 11:19:54 spaceshipone ReiserFS: md4: warning: sh-2006: read_super_block: bread failed (dev md4, block 16, size 4096)

Sep 26 11:19:54 spaceshipone ReiserFS: md4: warning: sh-2021: reiserfs_fill_super: can not find reiserfs on md4

Sep 26 11:26:45 spaceshipone ReiserFS: md4: warning: sh-2006: read_super_block: bread failed (dev md4, block 2, size 4096)

Sep 26 11:26:45 spaceshipone ReiserFS: md4: warning: sh-2006: read_super_block: bread failed (dev md4, block 16, size 4096)

Sep 26 11:26:45 spaceshipone ReiserFS: md4: warning: sh-2021: reiserfs_fill_super: can not find reiserfs on md4

Sep 26 11:37:59 spaceshipone ReiserFS: md4: warning: sh-2006: read_super_block: bread failed (dev md4, block 2, size 4096)

Sep 26 11:37:59 spaceshipone ReiserFS: md4: warning: sh-2006: read_super_block: bread failed (dev md4, block 16, size 4096)

Sep 26 11:37:59 spaceshipone ReiserFS: md4: warning: sh-2021: reiserfs_fill_super: can not find reiserfs on md4

Sep 26 11:39:11 spaceshipone ReiserFS: md4: warning: sh-2006: read_super_block: bread failed (dev md4, block 2, size 4096)

Sep 26 11:39:11 spaceshipone ReiserFS: md4: warning: sh-2006: read_super_block: bread failed (dev md4, block 16, size 4096)

Sep 26 11:39:11 spaceshipone ReiserFS: md4: warning: sh-2021: reiserfs_fill_super: can not find reiserfs on md4
```

When attempting to mount it:

```
mount: wrong fs type, bad option, bad superblock on /dev/md4,

       missing codepage or other error

       (could this be the IDE device where you in fact use

       ide-scsi so that sr0 or sda or so is needed?)

       In some cases useful info is found in syslog - try

       dmesg | tail  or so
```

Note the whole system is raided so I don't think it is a raid problem.  (output of cat /proc/mdstat)

```
Personalities : [raid0] [raid1] [raid5] 

md1 : active raid0 sdd3[3] sdc3[2] sdb3[1] sda3[0]

      224512 blocks 32k chunks

      

md2 : active raid0 sdd5[3] sdc5[2] sdb5[1] sda5[0]

      10023936 blocks 32k chunks

      

md3 : active raid5 sdd6[3] sdc6[2] sdb6[1] sda6[0]

      387607872 blocks level 5, 32k chunk, algorithm 0 [4/4] [UUUU]

      

md5 : active raid5 sdd8[3] sdc8[2] sdb8[1] sda8[0]

      387583872 blocks level 5, 32k chunk, algorithm 0 [4/4] [UUUU]
```

Probably worthless given what I learned later (above), I also ran reiserfsck:

```
reiserfsck 3.6.19 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to reiserfs-list@namesys.com, **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

Will read-only check consistency of the filesystem on /dev/md4

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

bread: Cannot read the block (2): (Invalid argument).

reiserfs_open: bread failed reading block 2

bread: Cannot read the block (16): (Invalid argument).

reiserfs_open: bread failed reading block 16

reiserfs_open: the reiserfs superblock cannot be found on /dev/md4.

Failed to open the filesystem.

If the partition table has not been changed, and the partition is

valid  and  it really  contains  a reiserfs  partition,  then the

superblock  is corrupted and you need to run this utility with

--rebuild-sb.
```

I'm afraid to run rebuild-sb since I did this once before and lost my data.

I also tried debugreiserfs:

```
debugreiserfs 3.6.19 (2003 www.namesys.com)

bread: Cannot read the block (2): (Invalid argument).

reiserfs_open: bread failed reading block 2

bread: Cannot read the block (16): (Invalid argument).

reiserfs_open: bread failed reading block 16

reiserfs_open: the reiserfs superblock cannot be found on /dev/md4.

debugreiserfs: can not open reiserfs on "/dev/md4": no filesystem found
```

Thats all the diagnostic info I could come up with.  Does anybody have a recommendation on how to proceed?  There is lots of data on this disk.  Is there even the slightest chance that this is recoverable.  The hard disks can't have all failed, or else I wouldn't be typing now.Last edited by gustafson on Mon Sep 26, 2005 8:28 pm; edited 1 time in total

----------

## Dlareh

This does NOT appear to be a reiserfs problem; Do NOT run any reiserfs tools like --rebuild-sb

You need to try to force the assemble  Something like:

mdadm --assemble /dev/md4 /dev/sda7 /dev/sdb7 /dev/sdc7 /dev/sdd7

or:

mdadm --assemble /dev/md4 /dev/sda7 /dev/sdb7 /dev/sdc7

possibly with the -f options or something of that nature.  See the mdadm man page for details.

Once you get at least 3 U's  (4-1=3) in /proc/mdstat, then you can run reiserfsck.

Be sure you only mount things ro (mount -o ro /dev/md4 /mnt/point) until you have verified the integrity of your data.  It is possible that you will get better results depending on which partition you leave out of the array.

Once you are satisfied that everything is working, you can rebuild the fourth partition if it is still down.

----------

## overkll

From what you've posted, it looks like md4 never started successfully upon boot.  Since md4 never started, the Reiserfs cannot be accessed. 

```
Sep 25 21:52:16 spaceshipone raid5: not enough operational devices for md4 (3/4 failed)
```

 You can reboot to see if it was a fluke, or try manually starting (NOT CREATE!) md4 using mkraid or mdadm commands.  If you are successful in starting md4, then you can mount the fs.

----------

## gustafson

Here is more diagnositic info from /var/log/messages.

```
Sep 25 21:52:16 spaceshipone md: created md4

Sep 25 21:52:16 spaceshipone md: bind<sda7>

Sep 25 21:52:16 spaceshipone md: bind<sdb7>

Sep 25 21:52:16 spaceshipone md: bind<sdc7>

Sep 25 21:52:16 spaceshipone md: bind<sdd7>

Sep 25 21:52:16 spaceshipone md: running: <sdd7><sdc7><sdb7><sda7>

Sep 25 21:52:16 spaceshipone md: kicking non-fresh sdd7 from array!

Sep 25 21:52:16 spaceshipone md: unbind<sdd7>

Sep 25 21:52:16 spaceshipone md: export_rdev(sdd7)

Sep 25 21:52:16 spaceshipone md: kicking non-fresh sdb7 from array!

Sep 25 21:52:16 spaceshipone md: unbind<sdb7>

Sep 25 21:52:16 spaceshipone md: export_rdev(sdb7)

Sep 25 21:52:16 spaceshipone md: kicking non-fresh sda7 from array!

Sep 25 21:52:16 spaceshipone md: unbind<sda7>

Sep 25 21:52:16 spaceshipone md: export_rdev(sda7)

Sep 25 21:52:16 spaceshipone raid5: device sdc7 operational as raid disk 2

Sep 25 21:52:16 spaceshipone raid5: not enough operational devices for md4 (3/4 failed)

Sep 25 21:52:16 spaceshipone RAID5 conf printout:

Sep 25 21:52:16 spaceshipone --- rd:4 wd:1 fd:3

Sep 25 21:52:16 spaceshipone disk 2, o:1, dev:sdc7

Sep 25 21:52:16 spaceshipone raid5: failed to run raid set md4

Sep 25 21:52:16 spaceshipone md: pers->run() failed ...

Sep 25 21:52:16 spaceshipone md: do_md_run() returned -22

Sep 25 21:52:16 spaceshipone md: md4 stopped.

Sep 25 21:52:16 spaceshipone md: unbind<sdc7>

Sep 25 21:52:16 spaceshipone md: export_rdev(sdc7)

```

I've been trying things in mdadm to no avail.

```
# mdadm /dev/md4

/dev/md4: is an md device which is not active

/dev/md4: is too small to be an md component.
```

More suggestions would be helpful, I'll post if I make any progress.

----------

## overkll

 *Quote:*   

> I've been trying things in mdadm to no avail.
> 
> ```
> # mdadm /dev/md4
> 
> ...

 

As Diareh said, try the command:

```
mdadm --assemble /dev/md4 /dev/sda7 /dev/sdb7 /dev/sdc7 /dev/sdd7
```

----------

## gustafson

With apologies to Diareh (I missed that he posted... though I don't know how, haste makes waste).

```
# mdadm --assemble --force /dev/md4 /dev/sda7 /dev/sdb7 /dev/sdc7 /dev/sdd7 

mdadm: forcing event count in /dev/sda7(0) from 102106 upto 102116

mdadm: forcing event count in /dev/sdb7(1) from 102106 upto 102116

mdadm: /dev/md4 has been started with 3 drives (out of 4).
```

Now I have the following from cat /proc/mdstat:

```
md4 : active raid5 sda7[0] sdc7[2] sdb7[1]

      387607872 blocks level 5, 32k chunk, algorithm 0 [4/3] [UUU_]
```

An attempt at the reiserfsck gives:

```
# reiserfsck /dev/md4

reiserfsck 3.6.19 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to reiserfs-list@namesys.com, **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

Will read-only check consistency of the filesystem on /dev/md4

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

###########

reiserfsck --check started at Mon Sep 26 13:34:50 2005

###########

Replaying journal..

Reiserfs journal '/dev/md4' in blocks [18..8211]: 0 transactions replayed

Checking internal tree../  2 (of   2)/110 (of 110)/107 (of 148)block 31195144: The level of the node (63489) is not correct, (1) expected

 the problem in the internal node occured (31195144), whole subtree is skipped

finished     

Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.

Bad nodes were found, Semantic pass skipped

1 found corruptions can be fixed only when running with --rebuild-tree

###########

reiserfsck finished at Mon Sep 26 13:37:38 2005

###########
```

Should I go ahead and do the reiserfsck --rebuild-tree?  It seems the logical next step but I'm sure there are options of forcing the assembly with difference disk arrangements.  Thanks,

----------

## Dlareh

stop md4, and reassemble with only a, b, and d

Hopefully you will be better results.

--rebuild-tree is extremely dangerous; if fscks with md4 started with other combinations of three partitions still recommend --rebuild-tree, attempt to mount md4 ro and back everything up (rsync -a  to somewhere else).  Assuming this works, you can try --rebuild-tree but you may end up just remaking the filesystem.

----------

## gustafson

No luck there:

```
# mdadm --assemble /dev/md4 /dev/sda7 /dev/sdb7 /dev/sdd7 

mdadm: /dev/md4 assembled from 2 drives - not enough to start the array.
```

----------

## overkll

Take care of the raid issue before attempting to do any fsck.  Try adding the missing partition to the md4 array with:

```
mdadm /dev/md4 -a /dev/sdd7
```

If that doesn't work, try removing sdd7 then adding it:

```
mdadm /dev/md4 -r /dev/sdd7 -a /dev/sdd7
```

----------

## gustafson

OK, some progress here.  It looks like the raid array has been restored after mdadm /dev/md4 -a /dev/sdd7:

```
# cat /proc/mdstat 

Personalities : [raid0] [raid1] [raid5] 

md1 : active raid0 sdd3[3] sdc3[2] sdb3[1] sda3[0]

      224512 blocks 32k chunks

      

md2 : active raid0 sdd5[3] sdc5[2] sdb5[1] sda5[0]

      10023936 blocks 32k chunks

      

md3 : active raid5 sdd6[3] sdc6[2] sdb6[1] sda6[0]

      387607872 blocks level 5, 32k chunk, algorithm 0 [4/4] [UUUU]

      

md4 : active raid5 sdd7[3] sda7[0] sdc7[2] sdb7[1]

      387607872 blocks level 5, 32k chunk, algorithm 0 [4/4] [UUUU]

      

md5 : active raid5 sdd8[3] sdc8[2] sdb8[1] sda8[0]

      387583872 blocks level 5, 32k chunk, algorithm 0 [4/4] [UUUU]

      

unused devices: <none>
```

reiserfsck does the same thing as before.  I attempted to mount -ro and the following occurred:

```
# mount -ro  /dev/md4 /mnt/point

mount: wrong fs type, bad option, bad superblock on /dev/md4,

       missing codepage or other error

       In some cases useful info is found in syslog - try

       dmesg | tail  or so
```

As a result, I can't back up (using rsync) before doing the --rebuild-tree.  Is there anything else or should I just byte the bullet and do it?

----------

## overkll

Is md4 already mounted?  If not try mounting normally with the fstab defined mount point.  If it's already mounted you should be able to back up your data on md4 to another location/disk/partition.

Have you rebooted to see if the fix worked and md4 is assembled and mounted as normal?

Personally I try to avoid an fsck rebuild-tree if possible, especially without a backup.

----------

## Dlareh

What was the message you got when you added sdd ? Did it rebuild sdd using info from the other three?  I think you should have stopped the array and started it /without/ sdc like I suggested.

Have fun with --rebuild-tree if you dare, but be afraid, be very afraid.  I've seen it destroy reiserfs partitions that were still mountable.

----------

## gustafson

Dlareh:

I tried all all combinations of starting the array with only three disks.  They each failed with only two functional disks, or if they worked, they had the same error on reiserfsck.

Regarding the rest:

I was finally able to mount the md4 array by adding the ro option to the fstab file, and doing a mount -a.  I don't know why this worked when other attempts didn't, though at the moment it isn't my concern.  Right now, I'm copying all the files elsewhere, and then I'm going to mkreiserfs on the md4 array so I can start over with a clean filesystem.  

I did spot check a few text files and they looked ok.  There are a lot of big binary files which I haven't checked yet... (FEA simulations) maybe I'll get lucky there too.

It is going to take a while to do a thorough check of the files, eventually I'll report back.  For now, thanks to both of you for your help.  I continue to be impressed by the knowledge and helpfulness of the gentoo community.

----------

## overkll

You're welcome!  Glad to help!

----------

