# raid0 problems

## insaner

hello! 

sorry, i am really new to linux, but I ve read alot the last few days and i think I got a real problem, at least one, I cannot solve by myself.  :Wink: 

I got confronted with a server that contains several scsi-harddisks with raid arrays. After an "emerge -aDv world" and compiling a new kernel and reboot, the raid0 "md1" is not working anymore.

Here is what i tried:

tuxsurf mnt # mount

/dev/md0 on / type ext3 (rw,noatime)

proc on /proc type proc (rw)

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)

udev on /dev type tmpfs (rw,nosuid,size=10240k,mode=755)

devpts on /dev/pts type devpts (rw,nosuid,noexec,gid=5,mode=620)

/dev/sda1 on /boot type ext3 (rw,noatime)

none on /dev/shm type tmpfs (rw)

usbfs on /proc/bus/usb type usbfs (rw,noexec,nosuid,devmode=0664,devgid=85)

nfsd on /proc/fs/nfs type nfsd (rw,noexec,nosuid,nodev)

tuxsurf mnt # 

---------------------------> md1 is not listed. so i looked in fstab:

/dev/md1                /mnt/md1        ext3            noatime         0 3

------------------------------> raid should be mountet at startup, but does not. i try it manually:

tuxsurf mnt # mount /dev/md1 /mnt/md1

mount: /dev/md1: can't read superblock

tuxsurf mnt # 

--------------------------> after some reading i figured, that this is bad. i run fsck:

tuxsurf mnt # fsck /dev/md1

fsck 1.40.9 (27-Apr-2008)

e2fsck 1.40.9 (27-Apr-2008)

fsck.ext3: Invalid argument while trying to open /dev/md1

The superblock could not be read or does not describe a correct ext2

filesystem.  If the device is valid and it really contains an ext2

filesystem (and not swap or ufs or something else), then the superblock

is corrupt, and you might try running e2fsck with an alternate superblock:

    e2fsck -b 8193 <device>

tuxsurf mnt # 

------------------------------> tells me, repairing superblock could help. I check for backups in superblocks:

tuxsurf dev # mke2fs -n /dev/sdg1

mke2fs 1.40.9 (27-Apr-2008)

Warning: 256-byte inodes not usable on older systems

Filesystem label=

OS type: Linux

Block size=4096 (log=2)

Fragment size=4096 (log=2)

18317312 inodes, 73240327 blocks

3662016 blocks (5.00%) reserved for the super user

First data block=0

Maximum filesystem blocks=0

2236 block groups

32768 blocks per group, 32768 fragments per group

8192 inodes per group

Superblock backups stored on blocks: 

        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 

        4096000, 7962624, 11239424, 20480000, 23887872, 71663616

tuxsurf dev # mke2fs -n /dev/sdf1

mke2fs 1.40.9 (27-Apr-2008)

Warning: 256-byte inodes not usable on older systems

Filesystem label=

OS type: Linux

Block size=4096 (log=2)

Fragment size=4096 (log=2)

4489216 inodes, 17956645 blocks

897832 blocks (5.00%) reserved for the super user

First data block=0

Maximum filesystem blocks=0

548 block groups

32768 blocks per group, 32768 fragments per group

8192 inodes per group

Superblock backups stored on blocks: 

        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 

        4096000, 7962624, 11239424

tuxsurf dev # mke2fs -n /dev/sde1

mke2fs 1.40.9 (27-Apr-2008)

Warning: 256-byte inodes not usable on older systems

Filesystem label=

OS type: Linux

Block size=4096 (log=2)

Fragment size=4096 (log=2)

4489216 inodes, 17956645 blocks

897832 blocks (5.00%) reserved for the super user

First data block=0

Maximum filesystem blocks=0

548 block groups

32768 blocks per group, 32768 fragments per group

8192 inodes per group

Superblock backups stored on blocks: 

        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 

        4096000, 7962624, 11239424

tuxsurf dev # 

----------------------> there seem to be some backups. I try to stop the raid array:

tuxsurf mnt # mdadm --stop /dev/md1

mdadm: stopped /dev/md1

tuxsurf mnt # 

--------------------> looks like it is stopped. so why overwrite some superblocks before not trying to rebuild the raid0 again? i try 

tuxsurf dev # mdadm --create /dev/md1 --level=0 --raid-devices=3 /dev/sdg1 /dev/sdf1 /dev/sde1

mdadm: /dev/sdg1 appears to be part of a raid array:

    level=raid0 devices=3 ctime=Sat Nov 10 15:17:07 2007

mdadm: /dev/sdf1 appears to be part of a raid array:

    level=raid0 devices=3 ctime=Sat Nov 10 15:17:07 2007

mdadm: /dev/sde1 appears to contain an ext2fs file system

    size=436614208K  mtime=Thu Aug 21 17:39:41 2008

mdadm: /dev/sde1 appears to be part of a raid array:

    level=raid0 devices=3 ctime=Sat Nov 10 15:17:07 2007

Continue creating array? n       

mdadm: create aborted.

tuxsurf dev # 

--------------> this tells me, that something is running, and i should stop it first. I thought, i already stopped it with the "mdadm --stop" command? I also don´t understand, why there is "sde1" appearing as part of raid array AND as a ext2fs file system. ->

tuxsurf mnt # mdadm --query --detail /dev/md1  

mdadm: md device /dev/md1 does not appear to be active.

------> it is not active but still running? i don´t get that too. anyway, now i try to overwrite those superblocks:

tuxsurf mnt # e2fsck -f -b 32768 /dev/sde1

e2fsck 1.40.9 (27-Apr-2008)

e2fsck: Device or resource busy while trying to open /dev/sde1

Filesystem mounted or opened exclusively by another program?

tuxsurf mnt # 

---------> i have no more ideas. Anyone who can help? Thanks alot!

----------

## HeissFuss

It sounds like the array may not have assembled properly on boot.  Can you give us the following?

Contents of grep -v ^# /etc/mdam.conf

cat /proc/mdstat

fdisk -l

Also, you won't find valid ext3 filesystems on the individual disks since this is raid0.

----------

## insaner

thx for your quick reply!

###Contents of grep -v ^# /etc/mdam.conf:###

I think "mdadm" was never installed on the system, even when the raid was still working. In the search for a raidtool to do some diagnostics i found "mdadm" and emerged it just yesterday. I didn´t make any changes in the conf-file, so there aren t any contents of "grep -v ^# /etc/mdam.conf", because everything is commented out. I have no idea how my predecessor built up the raid. Maybe it got lost when i cleaned the system two days ago, but i don t think so.

### cat /proc/mdstat ###

tuxsurf etc # cat /proc/mdstat 

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] 

md0 : active raid5 sdd3[3] sdc3[2] sdb3[1] sda3[0]

      209334720 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>

tuxsurf etc # 

--------------> md0 is another raid, which is and was alyways running fine.

###fdisk -l ###

tuxsurf etc # fdisk -l 

Disk /dev/sda: 73.5 GB, 73557090304 bytes

255 heads, 63 sectors/track, 8942 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk identifier: 0x100f59dc

   Device Boot      Start         End      Blocks   Id  System

/dev/sda1   *           1          12       96358+  83  Linux

/dev/sda2              13         255     1951897+  82  Linux swap / Solaris

/dev/sda3             256        8942    69778327+  fd  Linux raid autodetect

Disk /dev/sdb: 73.5 GB, 73557090304 bytes

255 heads, 63 sectors/track, 8942 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk identifier: 0x1ce375f8

   Device Boot      Start         End      Blocks   Id  System

/dev/sdb1   *           1          12       96358+  83  Linux

/dev/sdb2              13         255     1951897+  82  Linux swap / Solaris

/dev/sdb3             256        8942    69778327+  fd  Linux raid autodetect

Disk /dev/sdc: 73.5 GB, 73557090304 bytes

255 heads, 63 sectors/track, 8942 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk identifier: 0x5c397a76

   Device Boot      Start         End      Blocks   Id  System

/dev/sdc1   *           1          12       96358+  83  Linux

/dev/sdc2              13         255     1951897+  82  Linux swap / Solaris

/dev/sdc3             256        8942    69778327+  fd  Linux raid autodetect

Disk /dev/sdd: 73.5 GB, 73557090304 bytes

255 heads, 63 sectors/track, 8942 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk identifier: 0x79d7651d

   Device Boot      Start         End      Blocks   Id  System

/dev/sdd1   *           1          12       96358+  83  Linux

/dev/sdd2              13         255     1951897+  82  Linux swap / Solaris

/dev/sdd3             256        8942    69778327+  fd  Linux raid autodetect

Disk /dev/sde: 73.5 GB, 73557090304 bytes

255 heads, 63 sectors/track, 8942 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System

/dev/sde1               1        8942    71826583+  fd  Linux raid autodetect

Disk /dev/sdf: 73.5 GB, 73557090304 bytes

255 heads, 63 sectors/track, 8942 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System

/dev/sdf1               1        8942    71826583+  fd  Linux raid autodetect

Disk /dev/sdg: 300.0 GB, 300000000000 bytes

255 heads, 63 sectors/track, 36472 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System

/dev/sdg1               1       36472   292961308+  83  Linux

Disk /dev/md0: 214.3 GB, 214358753280 bytes

2 heads, 4 sectors/track, 52333680 cylinders

Units = cylinders of 8 * 512 = 4096 bytes

Disk identifier: 0x00000000

Disk /dev/md0 doesn't contain a valid partition table

tuxsurf etc # 

thanks again for you help!

----------

## insaner

###additonal information  - i got a backup from an old mdstat-file when the raid was running before the systemupdate, which looks like this:###

Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10] [faulty] 

md1 : active raid0 sdg1[2] sdf1[1] sde1[0]

      436614208 blocks 64k chunks

md0 : active raid5 sdd3[3] sdc3[2] sdb3[1] sda3[0]

      209334720 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>

###....and i found a file called "raidtab", not sure if this is helpful:###

tuxsurf etc # cat raidtab 

raiddev /dev/md0

  raid-level 5

  nr-raid-disks 4

  nr-spare-disks 0

  persistent-superblock 1

  parity-algorithm left-symmetric

  chunk-size 64

  device /dev/sda3

  raid-disk 0

  device /dev/sdb3

  raid-disk 1

  device /dev/sdc3

  raid-disk 2

  device /dev/sdd3

  raid-disk 3

raiddev /dev/md1

  raid-level 0

  nr-raid-disks 3

  nr-spare-disks 0

  persistent-superblock 1

  chunk-size 64

  device /dev/sde1

  raid-disk 0

  device /dev/sdf1

  raid-disk 1

  device /dev/sdg1

  raid-disk 2

tuxsurf etc #

###and an example from /var/log###

Aug 27 17:17:30 localhost kernel: md: Autodetecting RAID arrays.

Aug 27 17:17:30 localhost kernel: md: Scanned 6 and added 6 devices.

Aug 27 17:17:30 localhost kernel: md: autorun ...

Aug 27 17:17:30 localhost kernel: md: considering sdf1 ...

Aug 27 17:17:30 localhost kernel: md:  adding sdf1 ...

Aug 27 17:17:30 localhost kernel: md:  adding sde1 ...

Aug 27 17:17:30 localhost kernel: md: sdd3 has different UUID to sdf1

Aug 27 17:17:30 localhost kernel: md: sdc3 has different UUID to sdf1

Aug 27 17:17:30 localhost kernel: md: sdb3 has different UUID to sdf1

Aug 27 17:17:30 localhost kernel: md: sda3 has different UUID to sdf1

Aug 27 17:17:30 localhost kernel: md: created md1

Aug 27 17:17:30 localhost kernel: md: bind<sde1>

Aug 27 17:17:30 localhost kernel: md: bind<sdf1>

Aug 27 17:17:30 localhost kernel: md: running: <sdf1><sde1>

Aug 27 17:17:30 localhost kernel: md1: setting max_sectors to 128, segment boundary to 32767

Aug 27 17:17:30 localhost kernel: raid0: looking at sdf1

Aug 27 17:17:30 localhost kernel: raid0:   comparing sdf1(71826496) with sdf1(71826496)

Aug 27 17:17:30 localhost kernel: raid0:   END

Aug 27 17:17:30 localhost kernel: raid0:   ==> UNIQUE

Aug 27 17:17:30 localhost kernel: raid0: 1 zones

Aug 27 17:17:30 localhost kernel: raid0: looking at sde1

Aug 27 17:17:30 localhost kernel: raid0:   comparing sde1(71826496) with sdf1(71826496)

Aug 27 17:17:30 localhost kernel: raid0:   EQUAL

Aug 27 17:17:30 localhost kernel: raid0: FINAL 1 zones

Aug 27 17:17:30 localhost kernel: raid0: too few disks (2 of 3) - aborting!

Aug 27 17:17:30 localhost kernel: md: do_md_run() returned -12

Aug 27 17:17:30 localhost kernel: md: md1 stopped.

----------

## cyrillic

 *insaner wrote:*   

> From raidtab :
> 
> raiddev /dev/md1
> 
>   raid-level 0
> ...

 

You won't be able to start a raid0 array with any of the disks missing.

What happened to /dev/sdg ?

----------

## HeissFuss

Yep, there's something up with sdg1.

For one thing, that device is 300GB vs 73.5GB for the other two partitions.  The type is also wrong.  Could it be that a disk is missing?

Can you give us the following:

mdadm --examine /dev/sde1 /dev/sdf1 /dev/sdg1

----------

## insaner

good question about sdg1 - i don´t really know what´s about it:

tuxsurf mnt # mdadm --examine /dev/sde1 /dev/sdf1 /dev/sdg1 /dev sde1

/dev/sde1:

          Magic : a92b4efc

        Version : 00.90.03

           UUID : 6e865607:2a60b8bb:d754480a:57e633f6

  Creation Time : Sat Nov 10 15:17:07 2007

     Raid Level : raid0

  Used Dev Size : 71826496 (68.50 GiB 73.55 GB)

   Raid Devices : 3

  Total Devices : 3

Preferred Minor : 1

    Update Time : Sat Nov 10 15:17:07 2007

          State : active

 Active Devices : 3

Working Devices : 3

 Failed Devices : 0

  Spare Devices : 0

       Checksum : 4016ec9 - correct

         Events : 0.3

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State

this     0       8       65        0      active sync   /dev/sde1

   0     0       8       65        0      active sync   /dev/sde1

   1     1       8       81        1      active sync   /dev/sdf1

   2     2       8       97        2      active sync   /dev/sdg1

/dev/sdf1:

          Magic : a92b4efc

        Version : 00.90.03

           UUID : 6e865607:2a60b8bb:d754480a:57e633f6

  Creation Time : Sat Nov 10 15:17:07 2007

     Raid Level : raid0

  Used Dev Size : 71826496 (68.50 GiB 73.55 GB)

   Raid Devices : 3

  Total Devices : 3

Preferred Minor : 1

    Update Time : Sat Nov 10 15:17:07 2007

          State : active

 Active Devices : 3

Working Devices : 3

 Failed Devices : 0

  Spare Devices : 0

       Checksum : 4016edb - correct

         Events : 0.3

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State

this     1       8       81        1      active sync   /dev/sdf1

   0     0       8       65        0      active sync   /dev/sde1

   1     1       8       81        1      active sync   /dev/sdf1

   2     2       8       97        2      active sync   /dev/sdg1

/dev/sdg1:

          Magic : a92b4efc

        Version : 00.90.03

           UUID : 6e865607:2a60b8bb:d754480a:57e633f6

  Creation Time : Sat Nov 10 15:17:07 2007

     Raid Level : raid0

  Used Dev Size : 71826496 (68.50 GiB 73.55 GB)

   Raid Devices : 3

  Total Devices : 3

Preferred Minor : 1

    Update Time : Sat Nov 10 15:17:07 2007

          State : active

 Active Devices : 3

Working Devices : 3

 Failed Devices : 0

  Spare Devices : 0

       Checksum : 4016eed - correct

         Events : 0.3

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State

this     2       8       97        2      active sync   /dev/sdg1

   0     0       8       65        0      active sync   /dev/sde1

   1     1       8       81        1      active sync   /dev/sdf1

   2     2       8       97        2      active sync   /dev/sdg1

mdadm: No md superblock detected on /dev.

mdadm: No md superblock detected on sde1.

tuxsurf mnt #

-----> when i try to mount sdg1:

tuxsurf mnt # mount -t ext3 /dev/sdg1 /mnt

mount: wrong fs type, bad option, bad superblock on /dev/sdg1,

       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try

       dmesg | tail  or so

tuxsurf mnt #

----------

## HeissFuss

So the present sdg1 is actually a raid device.  The majority of that device is being wasted though.

If you set the partition type of /dev/sdg1 to Linux raid autodetct, you should be able to assemble the raid.

mdadm -A /dev/md1 /dev/sde1 /dev/sdf1 /dev/sdg1

----------

## insaner

thx! that will help - but will it destroy the existing data?

----------

## HeissFuss

No, it's safe to change the partition type.

----------

## insaner

sorry for not answering earlier, i was afk over the weekend. I am happy now, all seems to work fine thx to HeissFuss!

to conclude this topic here the final steps for the solution:

I used FDISK to make the settings for sdg1:

fdisk

Command (m for help): n

Command action

   e   extended

   p   primary partition (1-4)

p

Partition number (1-4): 1

First cylinder (1-36471, default 1): 

Using default value 1

Last cylinder or +size or +sizeM or +sizeK (1-36471, default 36471): 

Using default value 36471

Command (m for help): t

Selected partition 1

Hex code (type L to list codes): fd

Changed system type of partition 1 to fd (Linux raid autodetect)

quit

tuxsurf ~ # mdadm -A /dev/md1 /dev/sde1 /dev/sdf1 /dev/sdg1

mdadm: /dev/md1 has been started with 3 drives.

tuxsurf ~ # mount /dev/md1 /mnt/md1

---------> md1 is mounted and all data seems to be there again.

----------

