# Raid5 2 disks failure

## bousket

There is an english version two post after this one.

Bonjour,

  je suis un nouvel utilisateur de mdadm. J'ai cree deux partitions raid en raid 5 (entre autres) il y a de cela un an. Il y a trois jours, mon pc ne voulait plus booter car il n'arrivait pas a monter mes partitions raid.

Il se trouve que ces deux partitions ne sont plus listées a l appel de la commande: 

```
cat /proc/mdstat.
```

La premiere partition /dev/md3 contient les partitions physiques /dev/sd[a-d]3

La seconde partition /dev/md4 contient les partitions physiques /dev/sd[a-d]5

Les appels a 

```
mdadm --examine /dev/sd[a-d]3
```

 indiquait que /dev/sdb3 a ete classe "faulty disk" peu apres la mise en place du raid mais je ne savais pas, a l epoque, que mdadm alertait des pannes par mail. J'ai donc tourné assez longtemps "sans disque de sauvegarde". Et lorsqu'il s'est produit une erreur sur un second disque ( partition /dev/sda3), l'array ne pouvait donc plus marcher.... le nombre d'events entre les partitions a et c/d etaient de 8.

Apres avoir googlé, j'ai trouvé un site qui proposait de recuperer les donnes en remontant l'array avec les trois disques pour ensuite reconstruire le tout en reinserant le premier 

```
mdadm --create --verbose /dev/md3 --level=5 --raid-devices=4 /dev/sda3 missing /dev/sdc3 /devsdd3
```

 . Le nombre d'events etant un peu different, je m'attendais à avoir perdu un peu de données mais j'espérais en recupérer la plupart.

J'ai ensuite exécuté la commande 

```
mdadm --misc --readonly /dev/md3
```

 pour passer cet array en lecture seule.

Le problème est que lorsque j'essaie de monter cette partition, j'obtiens le mesasge d erreur:

```

mount: wrong fs type, bad option, bad superblockon /dev/md3

...

try dmesg | tail  or so

```

et cette commande me donne:

```

VFS: Can't find ext3 filesystem on dev md3

```

Je n'ai pas poste les resultats des commandes 

```
mdadm --examine /dev/...
```

 citées au debut car il me semble que la commande 

```
mdadm --create ...
```

 a effacé les données mdadm sur chacun des disques. Ces données sont maintenant vierges.

La partition sur laquelle j'ai teste ces commandes est ma partition /home. Si je perds les données, ce n'est pas tres grave. L'autre partition par contre contient toutes mes données que je ne voudrais absolument pas perdre !

J'ai teste mes quatres disques avec l'utilitaire d'Hitachi et il apparait qu'ils sont sains.

Je voudrais arriver a recuperer ma /home pour pouvoir effectuer la bonne procedure sur ma partition data en etant sur de ne pas perdre les donnees (si le second disque qui a lache est sain, comme je l'espere).

Je poste ici les infos de la partition a laquelle je n ai pas touche (le nombre d'events de chaque disque de cette partition etient les mm sur la partition /home avant que la commande mdadm --create ne les efface)

```

/dev/sda5:

          Magic : a92b4efc

        Version : 00.90.00

           UUID : b08d3587:abf1dee1:75757fd8:d260933c

  Creation Time : Thu Aug 23 16:59:48 2007

     Raid Level : raid5

  Used Dev Size : 134793280 (128.55 GiB 138.03 GB)

     Array Size : 404379840 (385.65 GiB 414.08 GB)

   Raid Devices : 4

  Total Devices : 3

Preferred Minor : 4

    Update Time : Tue Sep 16 19:12:55 2008

          State : clean

 Active Devices : 3

Working Devices : 3

 Failed Devices : 1

  Spare Devices : 0

       Checksum : e52974a7 - correct

         Events : 54628

         Layout : left-symmetric

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State

this     2       8        5        2      active sync   /dev/sda5

   0     0       8       37        0      active sync   /dev/sdc5

   1     1       8       53        1      active sync   /dev/sdd5

   2     2       8        5        2      active sync   /dev/sda5

   3     3       0        0        3      faulty removed

/dev/sdb5:

          Magic : a92b4efc

        Version : 00.90.00

           UUID : b08d3587:abf1dee1:75757fd8:d260933c

  Creation Time : Thu Aug 23 16:59:48 2007

     Raid Level : raid5

  Used Dev Size : 134793280 (128.55 GiB 138.03 GB)

     Array Size : 404379840 (385.65 GiB 414.08 GB)

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 4

    Update Time : Thu Sep  4 20:20:38 2008

          State : clean

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0

       Checksum : e5180bb0 - correct

         Events : 482

         Layout : left-symmetric

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State

this     3       8       21        3      active sync   /dev/sdb5

   0     0       8       37        0      active sync   /dev/sdc5

   1     1       8       53        1      active sync   /dev/sdd5

   2     2       8        5        2      active sync   /dev/sda5

   3     3       8       21        3      active sync   /dev/sdb5

/dev/sdc5:

          Magic : a92b4efc

        Version : 00.90.00

           UUID : b08d3587:abf1dee1:75757fd8:d260933c

  Creation Time : Thu Aug 23 16:59:48 2007

     Raid Level : raid5

  Used Dev Size : 134793280 (128.55 GiB 138.03 GB)

     Array Size : 404379840 (385.65 GiB 414.08 GB)

   Raid Devices : 4

  Total Devices : 3

Preferred Minor : 4

    Update Time : Tue Sep 16 20:29:59 2008

          State : clean

 Active Devices : 2

Working Devices : 2

 Failed Devices : 2

  Spare Devices : 0

       Checksum : e52986ee - correct

         Events : 54636

         Layout : left-symmetric

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State

this     0       8       37        0      active sync   /dev/sdc5

   0     0       8       37        0      active sync   /dev/sdc5

   1     1       8       53        1      active sync   /dev/sdd5

   2     2       0        0        2      faulty removed

   3     3       0        0        3      faulty removed

/dev/sdd5:

          Magic : a92b4efc

        Version : 00.90.00

           UUID : b08d3587:abf1dee1:75757fd8:d260933c

  Creation Time : Thu Aug 23 16:59:48 2007

     Raid Level : raid5

  Used Dev Size : 134793280 (128.55 GiB 138.03 GB)

     Array Size : 404379840 (385.65 GiB 414.08 GB)

   Raid Devices : 4

  Total Devices : 3

Preferred Minor : 4

    Update Time : Tue Sep 16 20:29:59 2008

          State : clean

 Active Devices : 2

Working Devices : 2

 Failed Devices : 2

  Spare Devices : 0

       Checksum : e5298700 - correct

         Events : 54636

         Layout : left-symmetric

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State

this     1       8       53        1      active sync   /dev/sdd5

   0     0       8       37        0      active sync   /dev/sdc5

   1     1       8       53        1      active sync   /dev/sdd5

   2     2       0        0        2      faulty removed

   3     3       0        0        3      faulty removed

```

Last edited by bousket on Sat Sep 20, 2008 2:41 pm; edited 1 time in total

----------

## bousket

PS: merci a tous ceux qui auront eu le courage de lire le mail  :Very Happy: 

Quelqu un a une idee ?

----------

## bousket

Translation of my first post:

Hi,

i m a new mdadm user. I have two raid5 partitions: /dev/md3 and /dev/md5. My gentoo don't boot because it can't mount these two partitions.

When I type "cat /proc/mdstat", I see others partitions but not the both raid5 partitions.

See at the bottom of my first post for "mdadm --examine /dev/sd[a-d]5" results.

I've found a solution (http://blog.akilles.org/2008/05/13/exciting-days-with-md-raid5-disk-crash/). I ve tested it on /dev/md3:

```

mdadm --create --verbose /dev/md3 --level=5 --raid-devices=4 /dev/sda3 missing /dev/sdc3 /devsdd3

mdadm --misc --readonly /dev/md3

```

to have read only access. And now, when I try to mount my partition, I obtain:

```

mount: wrong fs type, bad option, bad superblockon /dev/md3

...

try dmesg | tail  or so

```

and in dmesg:

```

VFS: Can't find ext3 filesystem on dev md3

```

The mdadm informations have been erased (?!) after this command.

I ve tested my physical drives with the Hitachi utilities and they are fine.

I want to know how I can get back these mdadm infos ( if it s possible) to get back my first partition (/dev/md3)

And what can I do to get back my second partition ? ( I think the disks are fine. I want to save data with three disks a-c-d and rebuild the array with b after)

Thanks for any idea !

----------

## eccerr0r

exactly what happened?

Sounds like you are setting up two additional new RAIDs, and autodetect is not working, not that two hard drives died (which seems to be the case from the subject).

You can use mdadm --assemble to make the kernel aware of RAIDs that it didn't autoassemble.

However you did --create which may have blown away the parity/data ordering that it may have saved before...

So what happened, was there anything you did that made the volume disappear (maybe it was just a reboot?)

----------

## bousket

Just a reboot.

In fact, I have two raid5 partitions. I ve installed it last year. The first one is my /home dir and the second is a data dir.

The week before it appens, I do nothing ( no installation, update, change conf, etc), just use my pc to watch films, listen music and so on... And a day, when I boot, my system didn t want to mount my raid partitions.

Here is my /var/log/messages when it happens:

```

Sep 16 20:22:57 localhost ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen

Sep 16 20:22:57 localhost ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0

Sep 16 20:22:57 localhost res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Sep 16 20:22:57 localhost ata1.00: status: { DRDY }

Sep 16 20:23:02 localhost ata1: port is slow to respond, please be patient (Status 0xd0)

Sep 16 20:23:07 localhost ata1: device not ready (errno=-16), forcing hardreset

Sep 16 20:23:07 localhost ata1: hard resetting link

Sep 16 20:23:07 localhost ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Sep 16 20:23:37 localhost ata1.00: qc timeout (cmd 0xec)

Sep 16 20:23:37 localhost ata1.00: failed to IDENTIFY (I/O error, err_mask=0x5)

Sep 16 20:23:37 localhost ata1.00: revalidation failed (errno=-5)

Sep 16 20:23:37 localhost ata1: failed to recover some devices, retrying in 5 secs

Sep 16 20:23:42 localhost ata1: hard resetting link

Sep 16 20:23:43 localhost ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Sep 16 20:24:13 localhost ata1.00: qc timeout (cmd 0xec)

Sep 16 20:24:13 localhost ata1.00: failed to IDENTIFY (I/O error, err_mask=0x5)

Sep 16 20:24:13 localhost ata1.00: revalidation failed (errno=-5)

Sep 16 20:24:13 localhost ata1: failed to recover some devices, retrying in 5 secs

Sep 16 20:24:18 localhost ata1: hard resetting link

Sep 16 20:24:18 localhost ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Sep 16 20:24:48 localhost ata1.00: qc timeout (cmd 0xec)

Sep 16 20:24:48 localhost ata1.00: failed to IDENTIFY (I/O error, err_mask=0x5)

Sep 16 20:24:48 localhost ata1.00: revalidation failed (errno=-5)

Sep 16 20:24:48 localhost ata1.00: disabled

Sep 16 20:24:54 localhost ata1: port is slow to respond, please be patient (Status 0xff)

Sep 16 20:24:55 localhost ata1: soft resetting link

Sep 16 20:24:55 localhost ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Sep 16 20:24:55 localhost ata1: EH complete

Sep 16 20:24:55 localhost sd 0:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00

Sep 16 20:24:55 localhost end_request: I/O error, dev sda, sector 42989735

Sep 16 20:24:55 localhost end_request: I/O error, dev sda, sector 42989735

Sep 16 20:24:55 localhost md: super_written gets error=-5, uptodate=0

Sep 16 20:24:55 localhost raid5: Disk failure on sda3, disabling device. Operation continuing on 2 devices

Sep 16 20:24:55 localhost RAID5 conf printout:

Sep 16 20:24:55 localhost --- rd:4 wd:2

Sep 16 20:24:55 localhost disk 0, o:1, dev:sdc3

Sep 16 20:24:55 localhost disk 1, o:1, dev:sdd3

Sep 16 20:24:55 localhost disk 2, o:0, dev:sda3

Sep 16 20:24:55 localhost RAID5 conf printout:

Sep 16 20:24:55 localhost --- rd:4 wd:2

Sep 16 20:24:55 localhost disk 0, o:1, dev:sdc3

Sep 16 20:24:55 localhost disk 1, o:1, dev:sdd3

Sep 16 20:24:55 localhost Buffer I/O error on device md3, logical block 734510

Sep 16 20:24:55 localhost lost page write due to I/O error on md3

Sep 16 20:24:55 localhost Buffer I/O error on device md3, logical block 1146883

Sep 16 20:24:55 localhost lost page write due to I/O error on md3

Sep 16 20:24:55 localhost mdadm[3064]: Fail event detected on md device /dev/md3, component device /dev/sda3

Sep 16 20:24:55 localhost Buffer I/O error on device md3, logical block 734501

Sep 16 20:24:55 localhost lost page write due to I/O error on md3

Sep 16 20:24:55 localhost Aborting journal on device md3.

Sep 16 20:24:55 localhost Buffer I/O error on device md3, logical block 1478

Sep 16 20:24:55 localhost lost page write due to I/O error on md3

Sep 16 20:24:55 localhost ------------[ cut here ]------------

Sep 16 20:24:55 localhost WARNING: at fs/buffer.c:1183 mark_buffer_dirty+0x20/0x70()

Sep 16 20:24:55 localhost Modules linked in: nvidia(P)

Sep 16 20:24:55 localhost Pid: 2625, comm: kjournald Tainted: P         2.6.25-gentoo-r6 #1

Sep 16 20:24:55 localhost [<c011ba5b>] warn_on_slowpath+0x40/0x4f

Sep 16 20:24:55 localhost [<c045f600>] schedule+0x4e3/0x5a1

Sep 16 20:24:55 localhost [<c022be8c>] __generic_unplug_device+0x11/0x1c

Sep 16 20:24:55 localhost [<c022c47f>] generic_unplug_device+0x15/0x20

Sep 16 20:24:55 localhost [<c045faec>] __wait_on_bit+0x50/0x58

Sep 16 20:24:55 localhost [<c0175073>] sync_buffer+0x0/0x33

Sep 16 20:24:55 localhost [<c0175073>] sync_buffer+0x0/0x33

Sep 16 20:24:55 localhost [<c045fb53>] out_of_line_wait_on_bit+0x5f/0x67

Sep 16 20:24:55 localhost [<c012ad49>] wake_bit_function+0x0/0x3c

Sep 16 20:24:55 localhost [<c0175023>] __wait_on_buffer+0x16/0x18

Sep 16 20:24:55 localhost [<c01770f6>] sync_dirty_buffer+0x6b/0x9b

Sep 16 20:24:55 localhost [<c023bb9d>] __percpu_counter_add+0x4f/0x6e

Sep 16 20:24:55 localhost [<c0174c32>] mark_buffer_dirty+0x20/0x70

Sep 16 20:24:55 localhost [<c01bc519>] __journal_unfile_buffer+0x8/0x11

Sep 16 20:24:55 localhost [<c01bc5e3>] journal_refile_buffer+0x35/0x51

Sep 16 20:24:55 localhost [<c01bde2a>] journal_commit_transaction+0x4e4/0xb92

Sep 16 20:24:55 localhost [<c012ad1c>] autoremove_wake_function+0x0/0x2d

Sep 16 20:24:55 localhost [<c012316f>] try_to_del_timer_sync+0x44/0x4a

Sep 16 20:24:55 localhost [<c01c0776>] kjournald+0xa4/0x1c2

Sep 16 20:24:55 localhost [<c012ad1c>] autoremove_wake_function+0x0/0x2d

Sep 16 20:24:55 localhost [<c01c06d2>] kjournald+0x0/0x1c2

Sep 16 20:24:55 localhost [<c012ac5a>] kthread+0x38/0x5e

Sep 16 20:24:55 localhost [<c012ac22>] kthread+0x0/0x5e

Sep 16 20:24:55 localhost [<c0104693>] kernel_thread_helper+0x7/0x10

Sep 16 20:24:55 localhost =======================

Sep 16 20:24:55 localhost ---[ end trace a4ec2ab0158fc65e ]---

Sep 16 20:24:55 localhost EXT3-fs error (device md3) in ext3_reserve_inode_write: Journal has aborted

Sep 16 20:24:55 localhost Buffer I/O error on device md3, logical block 0

Sep 16 20:24:55 localhost lost page write due to I/O error on md3

Sep 16 20:24:55 localhost EXT3-fs error (device md3) in ext3_dirty_inode: Journal has aborted

Sep 16 20:24:55 localhost journal commit I/O error

Sep 16 20:24:55 localhost ext3_abort called.

Sep 16 20:24:55 localhost EXT3-fs error (device md3): ext3_journal_start_sb: Detected aborted journal

Sep 16 20:24:55 localhost Remounting filesystem read-only

Sep 16 20:24:55 localhost Buffer I/O error on device md3, logical block 0

Sep 16 20:24:55 localhost lost page write due to I/O error on md3

Sep 16 20:25:26 localhost Buffer I/O error on device md3, logical block 1

Sep 16 20:25:26 localhost lost page write due to I/O error on md3

Sep 16 20:25:26 localhost Buffer I/O error on device md3, logical block 720896

Sep 16 20:25:26 localhost lost page write due to I/O error on md3

Sep 16 20:25:26 localhost Buffer I/O error on device md3, logical block 720897

Sep 16 20:25:26 localhost lost page write due to I/O error on md3

Sep 16 20:25:26 localhost Buffer I/O error on device md3, logical block 720898

Sep 16 20:25:26 localhost lost page write due to I/O error on md3

Sep 16 20:25:26 localhost Buffer I/O error on device md3, logical block 720904

Sep 16 20:25:26 localhost lost page write due to I/O error on md3

Sep 16 20:25:26 localhost Buffer I/O error on device md3, logical block 721022

Sep 16 20:25:26 localhost lost page write due to I/O error on md3

Sep 16 20:25:26 localhost Buffer I/O error on device md3, logical block 727042

Sep 16 20:25:26 localhost lost page write due to I/O error on md3

Sep 16 20:25:26 localhost Buffer I/O error on device md3, logical block 1146883

Sep 16 20:25:26 localhost lost page write due to I/O error on md3

Sep 16 20:25:26 localhost Buffer I/O error on device md3, logical block 1278049

Sep 16 20:25:26 localhost lost page write due to I/O error on md3

Sep 16 20:29:03 localhost sd 0:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00

Sep 16 20:29:03 localhost end_request: I/O error, dev sda, sector 312576563

Sep 16 20:29:03 localhost end_request: I/O error, dev sda, sector 312576563

Sep 16 20:29:03 localhost md: super_written gets error=-5, uptodate=0

Sep 16 20:29:03 localhost raid5: Disk failure on sda5, disabling device. Operation continuing on 2 devices

Sep 16 20:29:03 localhost RAID5 conf printout:

Sep 16 20:29:03 localhost --- rd:4 wd:2

Sep 16 20:29:03 localhost disk 0, o:1, dev:sdc5

Sep 16 20:29:03 localhost disk 1, o:1, dev:sdd5

Sep 16 20:29:03 localhost disk 2, o:0, dev:sda5

Sep 16 20:29:03 localhost RAID5 conf printout:

Sep 16 20:29:03 localhost --- rd:4 wd:2

Sep 16 20:29:03 localhost disk 0, o:1, dev:sdc5

Sep 16 20:29:03 localhost disk 1, o:1, dev:sdd5

Sep 16 20:29:03 localhost Buffer I/O error on device md4, logical block 1604

Sep 16 20:29:03 localhost lost page write due to I/O error on md4

Sep 16 20:29:03 localhost Aborting journal on device md4.

Sep 16 20:29:03 localhost Buffer I/O error on device md4, logical block 1545

Sep 16 20:29:03 localhost lost page write due to I/O error on md4

Sep 16 20:29:03 localhost mdadm[3064]: Fail event detected on md device /dev/md4, component device /dev/sda5

Sep 16 20:29:05 localhost ext3_abort called.

Sep 16 20:29:05 localhost EXT3-fs error (device md4): ext3_journal_start_sb: Detected aborted journal

Sep 16 20:29:05 localhost Remounting filesystem read-only

Sep 16 20:29:11 localhost (bousket-3660): starting (version 2.22.0), pid 3660 user 'bousket'

Sep 16 20:29:11 localhost EXT3-fs error (device md3): ext3_find_entry: reading directory #1219229 offset 0

Sep 16 20:29:11 localhost (bousket-3660): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0

Sep 16 20:29:11 localhost (bousket-3660): Resolved address "xml:readwrite:/home/bousket/.gconf" to a read-only configuration source at position 1

Sep 16 20:29:11 localhost (bousket-3660): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2

Sep 16 20:29:11 localhost (bousket-3660): None of the resolved addresses are writable; saving configuration settings will not be possible

Sep 16 20:29:11 localhost (bousket-3660): No writable configuration sources successfully resolved. May be unable to save some configuration changes

Sep 16 20:29:11 localhost (bousket-3660): Failed to open saved state file: Failed: Failed to open gconfd logfile; won't be able to restore listeners after gconfd shutdown (Read-only file system)

Sep 16 20:29:11 localhost EXT3-fs error (device md3): ext3_find_entry: reading directory #1219229 offset 0

Sep 16 20:29:16 localhost (bousket-3660): Failed to open saved state file: Failed: Failed to open gconfd logfile; won't be able to restore listeners after gconfd shutdown (Read-only file system)

Sep 16 20:29:20 localhost (bousket-3660): Failed to open saved state file: Failed: Failed to open gconfd logfile; won't be able to restore listeners after gconfd shutdown (Read-only file system)

Sep 16 20:29:27 localhost (bousket-3660): Failed to open saved state file: Failed: Failed to open gconfd logfile; won't be able to restore listeners after gconfd shutdown (Read-only file system)

Sep 16 20:29:36 localhost Buffer I/O error on device md4, logical block 94633986

Sep 16 20:29:36 localhost lost page write due to I/O error on md4

Sep 16 20:29:41 localhost (bousket-3660): Failed to open saved state file: Failed: Failed to open gconfd logfile; won't be able to restore listeners after gconfd shutdown (Read-only file system)

Sep 16 20:29:41 localhost (bousket-3660): Failed to open saved state file: Failed: Failed to open gconfd logfile; won't be able to restore listeners after gconfd shutdown (Read-only file system)

Sep 16 20:29:41 localhost (bousket-3660): Failed to open saved state file: Failed: Failed to open gconfd logfile; won't be able to restore listeners after gconfd shutdown (Read-only file system)

Sep 16 20:29:41 localhost (bousket-3660): Failed to open saved state file: Failed: Failed to open gconfd logfile; won't be able to restore listeners after gconfd shutdown (Read-only file system)

Sep 16 20:29:41 localhost (bousket-3660): GConf server is not in use, shutting down.

Sep 16 20:29:41 localhost (bousket-3660): Could not open saved state file '/home/bousket/.gconfd/saved_state.tmp' for writing: Read-only file system

Sep 16 20:29:41 localhost (bousket-3660): Exiting

Sep 16 20:29:41 localhost EXT3-fs error (device md3): ext3_get_inode_loc: unable to read inode block - inode=358721, block=720932

Sep 16 20:29:41 localhost EXT3-fs error (device md3): ext3_get_inode_loc: unable to read inode block - inode=358721, block=720932

Sep 16 20:29:41 localhost EXT3-fs error (device md3): ext3_get_inode_loc: unable to read inode block - inode=358721, block=720932

Sep 16 20:29:41 localhost EXT3-fs error (device md3): ext3_get_inode_loc: unable to read inode block - inode=358721, block=720932

Sep 16 20:29:41 localhost EXT3-fs error (device md3): ext3_get_inode_loc: unable to read inode block - inode=358721, block=720932

Sep 16 20:29:41 localhost EXT3-fs error (device md3): ext3_get_inode_loc: unable to read inode block - inode=358721, block=720932

Sep 16 20:29:41 localhost EXT3-fs error (device md3): ext3_get_inode_loc: unable to read inode block - inode=358721, block=720932

Sep 16 20:29:53 localhost EXT3-fs error (device md3): ext3_get_inode_loc: unable to read inode block - inode=361624, block=721022

```

After having read that file, I thought there was a problem with /dev/sda3 (I don't know why) and mdadm decided to disable this drive. But it appears I ever had only three disks on this array (I don't know why the first go down). So, the array failed with only two disks.

I didn't know mdadm send emails to warn the user on problem. So, I didn't seen the first problem, which append long time ago.

I tried to see /proc/mdstat but it didn't recognize my raid5 partitions, only my swap( which is in raid0). I decided to check conf files (in case...), and mdadm.conf was clear (?!). I rewrote it with a save I had on another partition but mdadm didn't want to detect my partitions. I ve tested the Hitachi utility and my disks are fine.

I just want to recover the data with the two last disks in good health (/dev/sdc3 and /dev/sdd3) and /dev/sda3 which is the last to crash because I think they are good. After, I ll test to rebuild the array with /dev/sdb3 that mdadm has disabled long time ago.

----------

## bousket

Ok. I ve done 

```

mdadm --assemble /dev/md4

mdadm --manage /dev/md3 --stop

mdadm --assemble /dev/md3

```

Now, cat /proc/mdstat gives me:

```

Personalities : [raid0] [raid6] [raid5] [raid4] 

md2 : active raid0 sdd2[1] sdc2[0] sdb2[3] sda2[2]

      2088192 blocks 64k chunks

      

md3 : inactive sdb3[3](S)

      5245120 blocks

       

md4 : inactive sdc5[0](S) sdb5[3](S) sda5[2](S) sdd5[1](S)

      539173120 blocks

       

unused devices: <none>

```

There is only one disk in /dev/md3 (why?) and /dev/md4 is ok but two disks are down. What can I do for the both?

Thx for your help   :Razz: 

----------

## eccerr0r

I think for --assemble you need to specify all the disks.

However things don't look very nice... To maintain a RAID you need to be very careful and watch when disks get offlined.  sdb3 though originally part of the array, may be hopelessly out of date and desynched with the other remaining two disks. and it seems sda3 recently failed (?) -- the I/O errors are not a good sign. 

You should "mdadm --assemble /dev/md/X device1 device2 device3 device4" to get it to assemble the array.

Sure sda is still good?  Curious, are these PATA disks with converters or true SATA disks?

----------

## bousket

Yup, my disks are true SATA disks.

In fact, the four disks are setup in /etc/mdadm.conf, but when I ve done "mdadm --create ...", the three disks(a-c-d) events have been reseted. It s why I have just sdb3 when I auto assemble (because it has more events than others). When I manualy assemble, it s ok.

Now, I ve got two inactive arrays:

The first with three disks and no order information. If I know the initial order of the disks in the array, is there some way to assemble it and software (like testdisk?) to get back my data?

The second with two disks on four. And, same question, how to force --assemble to insert the third disk (sda3) whith the two first (sdc3-sdd3) in order to have an active partition I can read ?

Thx

----------

## eccerr0r

Ahh... Okay yeah I see what you're saying now.  Looks like the command you used, to (force) create the array was indeed the right thing to do, as long as it was the exact same command, stripe size, stripe order, etc. as how you created it in the first place (and hoping defaults are identical if you didn't take note of the defaults.)

If you actually less /dev/md/3, does the partition look like it's still got your information in it?  If you had text files on it, look for something that may be on it (uncompressed kernel source code, etc.) and see if it's intact).  If not, likely you have the order or size wrong.  And since you --created, likely the old data got overwritten by now.

----------

## bousket

Thx for your help  :Smile: 

I m at work, so I can't test now but when I tried to mount the partition this week end, I ve got something like : "bad fs, no superblock on the partition". I ll test this evening if I can less on /dev/md3 but I think I m blocked unless I get back the partition informations.

Is there a way to get back the super block? (I ve seen on some forum that ext3 make backup of superblock infos on many places on the disk...).

----------

## eccerr0r

The problem is, likely the superblock is still there (but that's just an assumption).  What needs to be done is to make sure you have the array set up the same as before so when looking for the superblock it points to the right place.

Failing that assumption (meaning the first superblock was destroyed) then you could use other copies of the superblock, but this seems more like a unusual case where multiple things failed (granted, you did have two out of 4 disks tossed out of the array).   If you man e2fsck and look for -b option, there's some data about that... though you should try to make sure that your assembly is correct first.

----------

## bousket

For the order, I m sure it is correct because I ve done the same disks order for the two raid5 partitions and I ve kept the order informations intact on /dev/md4.

I ve tried "less -f /dev/md3" (-f because it ask me to) and there was no text I could read. Only the same character everywhere (strange?). The header of a disk (partition information, etc) is long?

I can't mount /dev/md3 because of the loss of the superblock (I suppose). I'll test to look after the -b option in e2fsck man page.

----------

## ianw1974

From what I know, using --create was not the good thing to do.  This would have created a new array, not assemble the faulty one.  If you know you have a faulty disk, and need to assemble and scan an array, an entry in /etc/mdadm.conf would look something like:

```
ARRAY /dev/md3 devices=missing,/dev/sdb3,/dev/sdc3,/dev/sdd3
```

for example.  Then after such example (of course, I'm missing some other lines like DEVICES), you'd just do:

```
mdadm --assemble

mdadm --scan
```

I just listed the above line, so you'd see how you'd activate it without /dev/sda3 assuming it was the failed partition in my example.

With Raid 5 you can only afford to lose one disk.  If you lost more than one Raid 5 partition from the array, then you lost the lot.  Also, using --create and creating a new array, you'd overwrite the data you originally had there.

----------

## NeddySeagoon

bousket,

If your raid set won't assemble, its game over. Provided it assembles there is some chance you can salvage something.

Do not run e2fsck - that often makes a bad situation worse.

Attempt to mount the filesystem read only, using some of the alternate superblocks.

Read man e2fsck for hints on the block numbers to find superblocks at. Note that mount wants the location of the superblock in different units to those used by e2fsck. Getting it wrong is harmless. The filesystem just won't mount.

----------

## bousket

I agree there is few chance I get back my data. But I ve to test all solutions before giving up ! I ve too many things (personal and work) I don't want to loose. My array can't assemble because I ve only 2 disks on 4 on raid5 partition so I have to create a new array on the old one to get back the most part of files with data recovey softwares (... this is the only solution I ve found on the net...).

To eccerr0r (and the others  :Smile:  ):

I've tried "e2fsck /dev/md3" and I've got the message: Superblock invalid. (I see your post too late NeddySeagoon).

So, I've tested

```
e2fsck -b 8193 /dev/md3
```

and I obtain

```

Device or ressource busy while trying to open /dev/md3.

Filesystem mounted or opened exclusively by another program?

```

I had disabled the mount in /etc/fstab and mount say it isn't mounted.

What does it mean? I'll search but you've got an idea?

But I ve seen another strange thing. Here is the result of "cat /proc/mdstat":

```

md2 (raid0 swap partition    ): sdd2[1] sdc2[0] sdb2[3]               sda2[2]

md3 (raid5 I want to get back): sdd3[3] sdc3[2] sdb3[1] (now missing) sda3[0]

```

In fact, when I insert an IDE disk, it happends that my motherboard switches the two first disks with the two others (I don't know why but, in this case, I have to boot on the third disk to have GRUB instead of the first). I think (when I see md2 report) it is possible that the order of the disks is now different than when I create the raid.

mdadm always order the disk by decreasing the letter ( d-c-b-a) and the indexes are the order in the array? In this case, it explain why I can't read my superblock. Then, is the command 

```
mdadm --create [level, devices] /dev/sdc3 /dev/sdd3 /dev/sda3 missing(/dev/sdb3 when I get back my data)
```

 is enough to have the same order than md2 ?

Thx everyone for your time and your help.

----------

## NeddySeagoon

bousket,

If e2fsck didn't do anything, there is no harm done.

Its unlikely you would have had a backup superblock at  8193 as that only applies to ext2/3 filesystems with a 1k block size.

You either have to force that (you would remember) or make a small filesystem Its much more likely you have a 4k block size and have a backup superblock at 32768

----------

## eccerr0r

 *bousket wrote:*   

> But I ve seen another strange thing. Here is the result of "cat /proc/mdstat":
> 
> ```
> 
> md2 (raid0 swap partition    ): sdd2[1] sdc2[0] sdb2[3]               sda2[2]
> ...

 

Ahh... this may be the biggest clue yet, you need /proc/mdstat to show up as the same order as before (indicating the setup is mdadm --create is identical as before).  See if you have a log somewhere of what the order was before, and see if you can get it to list the same.  Again, make sure the stripe size and stripe order is the same as well.  I will have to say that /proc/mdstat may only be a hint, hopefully it's enough clues to set up the array again.

My RAIDs look like this for multiple RAID personalities:

```
md1 : active raid5 hdg2[3] hde2[2] hdc2[1] hda2[0]

      23470656 blocks level 5, 32k chunk, algorithm 2 [4/4] [UUUU]

      

md2 : active raid5 hdg3[3] hde3[2] hdc3[1] hda3[0]

      313122624 blocks level 5, 32k chunk, algorithm 2 [4/4] [UUUU]

      

md3 : active raid5 hdg4[3] hde4[2] hdc4[1] hda4[0]

      14289600 blocks level 5, 32k chunk, algorithm 2 [4/4] [UUUU]

      

md0 : active raid1 hdg1[3] hde1[2] hdc1[1] hda1[0]

      256896 blocks [4/4] [UUUU]

```

As far as I can discern, it's always listed in reverse order of how it's created.  Also I'm not sure the numbers in [] are applicable from RAID level to RAID level, but you could try.  The drive and the number in [] you'll have to match as before, hoping that data is available ...

Another hint is to make sure you have all working disks in the machine again, and hope that the disk ordering by BIOS and the kernel is identical to what it was before.  I have to watch out for this as well, the 4 SATA ports on my motherboard on my new RAID aren't labeled and I bet I'll run into a similar issue replacing disks...  :Sad: 

And yes, we can only pray that your data is recoverable... raid5 with two bad disks usually means it's gone to heaven.

----------

## bousket

Hi,

  I've tried the other order but I obtain the same thing. I can read some short piece of text but no entire file. I don't have the time these days but I will test some data recovery softwares. I hope it will work  :Smile:  Thx all and if you have some additional ideas, I take it !

----------

## eccerr0r

If you see your files that are no larger than the stripe size that you set, then likely you have the order or layout different than what you created it... also have to watch out that not all files on contiguous, you'll need to check multiple locations within the volume and see if the stripe size and order is correct...

but yes, rebuilding from raid failure where more than the number of redundant disks fail is not easy  :Sad: 

----------

