# raid device invalid after boot... ?

## FuzzyOne

2 SATA drives as RAID1 on ASUSP4C800 (Promise), gentoo 2004.0, 2.6.3-gentoo-r1 kernel with RAID support compiled in.  manual creation of raid is ok:

handling MD device /dev/md0

analyzing super-block

disk 0: /dev/sdb1, 245111706kB, raid superblock at 245111616kB

disk 1: /dev/sdc1, 245111706kB, raid superblock at 245111616kB

cat /proc/mdstat

Personalities : [raid1]

md0 : active raid1 sdc1[1] sdb1[0]

      245111616 blocks [2/2] [UU]

      [>....................]  resync =  1.4% (3450672/245111616) finish=410.4min speed=9811K/sec

manual mounting and disk access is ok.   but after i reboot the raid is invalid:

  autodetecting RAID... trying md0:  invalid

  /dev/md0 is not a RAID0 or LINEAR

(which is weird because it's configured as RAID1)

and when i try to access it:

/dev/md0: Invalid argument

mount: /dev/md0: can't read superblock

but i can recreate the raid manually (mkraid) just fine without any data loss.

/etc/raidtab:

raiddev /dev/md0

raid-level 1

nr-raid-disks 2

chunk-size 32

persistent-superblock 1

device /dev/sdb1

raid-disk 0

device /dev/sdc1

raid-disk 1

what am i missing?

----------

## toofastforyahuh

I have almost the same problem on ASUS SK8V, 2.6.6 vanilla and 2.6.5-gentoo-r1 on amd64.  I also have RAID compiled in and can build and manually mount the RAID1 just fine, but bootup fails in the init process.

Specifically the init script brings me to the error (type password or control D) prompt when raidstart fails.

raidstart does not like /dev/md0 during init.  raidstart -a produces "/dev/md0: Invalid argument".  I can do a raidstop /dev/md0 OK, but I cannot subsequently to a raidstart -a.

If I mv /etc/raidtab /etc/raidtab.old I can eventually get through the boot process.  At this point if I login and mv /etc/raidtab.old /etc/raidtab and try raidstart -a it works and I can mount it manually!

What's wrong with the init process?  Thanks!

----------

## toofastforyahuh

I should also note that this is a RAID 1 array with 2 SATA drives on the Promise controller.  The array is *not* defined in BIOS because this is software RAID.  Also mkraid needed the --really-force option to work, but I had no other problems during the procedure.  I followed the TLDP software RAID howto and also this howto:

[url]http://www.siliconvalleyccie.com/linux-adv/raid.htm

[/url]

I even used -c -c during the ext3 formatting to make sure the drives worked OK.

The drives appear as /dev/sdc1 and /dev/sdd1.  It's strange because this appears to work manually but the init scripts die on it.

I should note that the log says things like:

md: raidstart(pid 7965) used deprecated START_ARRAY ioctl.  This will not be supported beyond 2.6

Basically the init scripts call raidstart /dev/md0 during boot but for some reason it fails then. with:

/dev/md0:  Invalid argument

However, if I disable RAID during boot and then restore my /etc/raidtab and /etc/fstab afterward and raidstart manually it works perfectly.  What gives?

----------

## senter

have you set the partition type to fd, Linux raid autodetect, on both partitions ?

----------

## toofastforyahuh

 *senter wrote:*   

> have you set the partition type to fd, Linux raid autodetect, on both partitions ?

 

Yes, I followed the procedure here to the letter

[url]http://www.siliconvalleyccie.com/linux-adv/raid.htm

[/url]

/dev/sdc1 and /dev/sdd1 are fd type in fdisk. 

```

# fdisk -l | grep sd

Disk /dev/sdc: 160.0 GB, 160041885696 bytes

/dev/sdc1               1       19457   156288352   fd  Linux raid autodetect

Disk /dev/sdd: 160.0 GB, 160041885696 bytes

/dev/sdd1               1       19457   156288321   fd  Linux raid autodetect

```

Also my raidtab has the persistent-superblock set to 1.  I did compile raid into the kernel, so I shouldn't (and can't) load any raid1 modules.

My raidtab is exceedingly simple:

```

raiddev /dev/md0

     raid-level 1

     nr-raid-disks      2

     nr-spare-disks     0

     chunk-size 4

     persistent-superblock      1

     device     /dev/sdc1

     raid-disk  0

     device     /dev/sdd1

     raid-disk  1

```

mkraid /dev/md0 did not work.  It aborted without any logfile messages.  I had to use mkraid --really-force /dev/md0 for it to work.

dmesg output (from a normal, nonraid bootup) shows nothing odd regarding SCSI or RAID personalities:

```

md: linear personality registered as nr 1

md: raid0 personality registered as nr 2

md: raid1 personality registered as nr 3

md: raid5 personality registered as nr 4

raid5: measuring checksumming speed

   generic_sse:  7264.000 MB/sec

raid5: using function: generic_sse (7264.000 MB/sec)

raid6: int64x1   1980 MB/s

raid6: int64x2   2992 MB/s

raid6: int64x4   3199 MB/s

raid6: int64x8   2058 MB/s

raid6: sse2x1    1234 MB/s

raid6: sse2x2    2347 MB/s

raid6: sse2x4    3152 MB/s

raid6: using algorithm sse2x4 (3152 MB/s)

md: raid6 personality registered as nr 8

md: multipath personality registered as nr 7

md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27

md: Autodetecting RAID arrays.

md: autorun ...

md: ... autorun DONE.

sata_promise version 0.92

scsi2 : sata_promise

  Vendor: ATA       Model: ST3160023AS       Rev: 1.02

  Type:   Direct-Access                      ANSI SCSI revision: 05

SCSI device sdc: 312581808 512-byte hdwr sectors (160042 MB)

SCSI device sdc: drive cache: write through

 /dev/scsi/host1/bus0/target0/lun0: p1

Attached scsi disk sdc at scsi1, channel 0, id 0, lun 0

Attached scsi generic sg2 at scsi1, channel 0, id 0, lun 0,  type 0

  Vendor: ATA       Model: ST3160023AS       Rev: 1.02

  Type:   Direct-Access                      ANSI SCSI revision: 05

SCSI device sdd: 312581808 512-byte hdwr sectors (160042 MB)

SCSI device sdd: drive cache: write through

 /dev/scsi/host2/bus0/target0/lun0: p1

Attached scsi disk sdd at scsi2, channel 0, id 0, lun 0

Attached scsi generic sg3 at scsi2, channel 0, id 0, lun 0,  type 0

```

At this point--if I restore raidtab and add the fstab line--I can raidstart /dev/md0 and mount it as per my fstab without problem.  It works fine.

But if I reboot with this fstab line and raidtab, init fails at this script:

/etc/init.d/checkfs

```

                                if [ "${retval}" -gt 0 -a -x /sbin/raidstart ]

                                then

                                        /sbin/raidstart "${i}"

                                        retval=$?

                                fi

........

                                if [ "${retval}" -gt 0 ]

                                then

                                        rc=1

                                        eend ${retval}

                                else

                                        ewend ${retval}

                                fi

                        fi

                done

                # A non-zero return means there were problems.

                if [ "${rc}" -gt 0 ]

                then

                        echo

                        eerror "An error occurred during the RAID startup"

                        eerror "Dropping you to a shell; the system will reboot"                        eerror "when you leave the shell."

                        echo; echo

                        /sbin/sulogin ${CONSOLE}

                        einfo "Unmounting filesystems"

                        /bin/mount -a -o remount,ro &>/dev/null

                        einfo "Rebooting"

```

And indeed, if I give my root password and try to raidstart /dev/md0 (or even raidstart -a) at the bash prompt I get:

/dev/md0:  Invalid argument

So the behavior is strange.  I can start the RAID manually, but init scripts cannot do it automatically.  This happens both on my 2.6.5-gentoo-r1 (with initrd) and 2.6.6 kernels (without a ramdisk).  The system is amd64 on ASUS SK8V motherboard, with both drives on the Promise controller, but there is no conflicting Promise RAID array set up in BIOS.

Also, I am not trying to boot from the RAID.  The boot disk is a 3rd hard drive on another controller.

----------

## Donny

I have exact the same "errors" as FuzzyOne and toofastforyahuh on an ASUS K8V SE De luxe.

The raid works fine (installed it following the documents from tldp)

But after boot it fails on the /dev/md0

When I comment #raiddev /dev/md0 out in /etc/raidtab it boots without a problem exept no raid  :Shocked: 

So after starting the raid manual again all is working fine.

Does anyone no what I miss or doing wrong?

genkernel -> 2.6.3-gentoo-r2

2004.0

raidtab is the same as FuzzyOne

error (type password or control D) prompt when raidstart fails on boot

```

fdisk -l

Disk /dev/hda: 41.1 GB, 41110142976 bytes

16 heads, 63 sectors/track, 79656 cylinders

Units = cylinders of 1008 * 512 = 516096 bytes

   Device Boot    Start       End    Blocks   Id  System

/dev/hda1   *         1        63     31720+  83  Linux

/dev/hda2            64      1056    500472   82  Linux swap

/dev/hda3          1057     79656  39614400   83  Linux

Disk /dev/sda: 122.9 GB, 122942324736 bytes

255 heads, 63 sectors/track, 14946 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System

/dev/sda1             1     14946 120053713+  fd  Linux raid autodetect

Disk /dev/sdb: 122.9 GB, 122942324736 bytes

255 heads, 63 sectors/track, 14946 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System

/dev/sdb1             1     14946 120053713+  fd  Linux raid autodetect
```

```
cat /proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6]

md0 : active raid0 sda1[0] sdb1[1]

      240107264 blocks 32k chunks

unused devices: <none>

```

```
cat /etc/raidtab

raiddev /dev/md0

raid-level 0

nr-raid-disks 2

persistent-superblock 1

chunk-size 32

device /dev/sda1

raiddisk 0

device /dev/sdb1

raid-disk 1

```

```
/dev/hda1               /boot           ext3            noauto,noatime          1 2

/dev/hda3               /               reiserfs        noatime                 0 1

/dev/hda2               none            swap            sw                      0 0

/dev/cdroms/cdrom0      /mnt/cdrom      auto            noauto,ro,user          0 0

none                    /proc           proc            defaults                0 0

none                    /dev/shm        tmpfs           defaults                0 0

/dev/md0                /raid           reiserfs        noatime                 0 1

```

Thanks for reading all this.

----------

## toofastforyahuh

I still have this problem.

Did anything about software raid change in 2.6.x?  Do the raidtools need updating?

Is it OK to have /dev/md0 instead of /dev/md/0?

I'm really at a loss here.

----------

## Donny

I used mdadm and not raidtools, but got same error.

The feeling I have is that it has something to do with /dev/md0 but what I have really no idea I am lost.

----------

## Donny

Has anyone an idea in what direction to search to solve this problem?

----------

## toofastforyahuh

I think I fixed it!

First let me preface this with I find the gentoo init process confusing compared to ye olde /etc/rc.d/rc3.d, and when combined with the black magic of devfs/sysfs/every_other_fs I just get confused to no end.  I have no idea when devices are available and when/where they are called and when/where they get symlinked, etc.   And therein lies the problem.

It appears the scripts were trying to initialize the RAID before my SCSI devices were even set up!

From dmesg:

```

st: Version 20040403, fixed bufsize 32768, s/g segs 256

i2c /dev entries driver

md: raidstart(pid 4751) used deprecated START_ARRAY ioctl. This will not be supported beyond 2.6

md: could not lock unknown-block(291,1456).

md: could not import unknown-block(291,1456)!

md: autostart unknown-block(5,74672) failed!

```

If I log in from where the init scripts died (namely, the checkfs script) I saw /dev/scsi was totally empty.  It was like gentoo is doing things out of order, trying to set up a SCSI software RAID before it set up the SCSI devices.

Given that it was 1 AM here I didn't want to hunt down and rewrite any of the init scripts to reorder them myself.  I just found a quick hack, and that is to use the /etc/modules.autoload.d/kernel-2.6 to load the sata_promise module (I am using the onboard Promise chip for my software RAID).

Again, here is where I don't understand the gentoo init procedure.  Somehow sata_promise and a whole ton of other modules (USB, Firewire, etc) get loaded automagically and I only had a few entries in my modules.autoload, none of which are disk related.  But by putting sata_promise in that file it forced the module to get loaded before the raidstart gets called in checkfs.  That appears to set up the /dev/scsi/... devices.

BUT that isn't enough.  Although /dev/scsi is populated with the various devices, for some oddball reason the symlinks to /dev/sdc1 and /dev/sdd1 did not exist during init either.  My other 2 SCSI devices (which are USB flash card readers) did show up as /dev/sda1 and /dev/sdb1.

So my next step was to replace /dev/sdc1 and /dev/sdd1 in my /etc/raidtab with the actual SCSI devices.  And it seems to work fine.

My current raidtab is now:

```

raiddev /dev/md0

     raid-level   1

     nr-raid-disks   2

     nr-spare-disks   0

     chunk-size   32

     persistent-superblock   1

#     device   /dev/sdc1

     device   /dev/scsi/host0/bus0/target0/lun0/part1

     raid-disk   0

#     device   /dev/sdd1

     device   /dev/scsi/host1/bus0/target0/lun0/part1

     raid-disk   1

```

Hope this helps.

----------

## Donny

Glad you fixed it toofastforyahuh   :Very Happy: 

I fixed mine too by loading the modules "raid0 and md " on boot.

hopes it helps someone.

----------

## toofastforyahuh

Somehow not long after my last post in this thread I was able to get the /dev/sdb1 /dev/sdc1 symlinks to work.  I honestly don't remember how.  Probably some more udev nonsense.

Then for months my software RAID worked great--or so I thought.  Except for whatever reason the second drive was not being added by the kernel at boot.  I still don't know how that happened either, since both drives were added fine when the RAID was first set up.

The solution in this case was to add the missing drive again with radihotadd, and now the RAID appears to work correctly with both drives even after a reboot.  (They are brand new drives and no, it was not a drive failure.)

It's amazing how complicated this seemingly simple task of setting up md is.  Just emerge the raidtools (or mdadm), set up the /etc/raidtab, load the kernel modules,  make the RAID, and it should just work.....but there's always some gremlin throwing a wrench into the works.

----------

## blais

hi

i have the very same same problem using a P4P800 with two 120GB IDE drives and RAID0. I won't bother repeating my logfiles, they're the same as toofastforyahuh

this is then probably not a SCSI issue.

i don't have a solution for it.  when i boot from the raid (/dev/md0 as /) it stops with an error and i have / mounted in read-only mode and the maintenance prompt at the console.

could it be that I have to let the drives "resync" before mounting?

----------

## blais

oops  i mean RAID-1 in my message above.

I used to have these in RAID-0 and it worked fine.

Now the problem started when I switched to RAID-1 (and yes, i did recreate the fs)

----------

## MagicITX

Can anyone help with this?  My setup is different but the problem is the same.  I have:

/dev/md1  RAID-1 with 2 drives

/dev/md2  RAID-1 with 2 drives

/dev/md0  RAID-0 with /dev/md1 and /dev/md2

md1 and md2 start fine but init fails at md0.  

From the recovery shell I see that dmesg ends with "md: md0 stopped".  If I run 'mdadm -As /dev/md0' it tells me 'mdadm: no devices found for /dev/md0'.

I can recreate the array with:

mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/md1 /dev/md2

The message I get here is a little weird.  It says:

mdadm: /dev/md1 appears to be part of a raid array:

    level=0 devices=2 ctime=Sat Mar 19 05:56:33 2005

mdadm: /dev/md2 appears to be part of a raid array:

    level=0 devices=2 ctime=Sat Mar 19 05:56:33 2005

Continue creating array?

I answer "y"es and it says 'mdadm: array /dev/md0 started.'  What makes this weird is the md1 and md2 arrays are level 1, not level 0 as reported.  

At this point if I cat /proc/mdstat I get:

Personalities : [raid0][raid1]

md0 : active raid0 md2[1] md1[0]

    586066944 blocks 64k chunks

md2 : active raid1 sdc1[0] sdd1[1]

    293033536 blocks [2/2] [UU]

md1 : active raid1 sda1[0] sdb1[1]

    293033536 blocks [2/2] [UU]

My /etc/mdadm.conf file has:

DEVICE /dev/sda1

DEVICE /dev/sdb1

DEVICE /dev/sdc1

DEVICE /dev/sdd1

ARRAY /dev/md1 devices=/dev/sda1,/dev/sdb1

ARRAY /dev/md2 devices=/dev/sdc1,/dev/sdd1

ARRAY /dev/md0 devices=/dev/md1,/dev/md2

So like the other posts in this thread, I have a RAID array that won't start during init or afterward but can be recreated from the recovery shell.

Any ideas?

----------

## MagicITX

Some other posts suggested this could be due to udev.  That isn't the problem in my case.  I get the same error with devfs.

----------

## Phk

 *MagicITX wrote:*   

> Can anyone help with this?

 

I am no good help, since i'm still troubled with my own setup....

But, forget that kind of approach, try this one:

 :Arrow:  (Which is the way i've installed my system)

and this one:

 :Arrow:  [HOWTO] HPT, Promise, Medley, Intel, Nvidia RAID Dualboot

The second is very important, since it tells you to use the "gen2dmraid" BOOT cd which automaticly mount your raided partitions.

Good luck!

----------

## MagicITX

Thanks for the feedback.  My problem was a little different but I've been able to solve it.

The problem was in /etc/mdadm.conf.  When mdadm starts up an array it looks through mdadm.conf for the devices to use.  Previously my file contained this:

```
DEVICE /dev/sda1

DEVICE /dev/sdb1

DEVICE /dev/sdc1

DEVICE /dev/sdd1

ARRAY /dev/md1 devices=/dev/sda1,/dev/sdb1

ARRAY /dev/md2 devices=/dev/sdc1,/dev/sdd1

ARRAY /dev/md0 devices=/dev/md1,/dev/md2

```

The RAID-1 arrays md1 and md2 would start because the devices they use are defined with DEVICE.  However when it got to md0 it wouldn't start because md1 and md2 are not DEVICE entries.  I changed mdadm.conf to this and it now works.

```
DEVICE /dev/sda1

DEVICE /dev/sdb1

DEVICE /dev/sdc1

DEVICE /dev/sdd1

DEVICE /dev/md1

DEVICE /dev/md2

ARRAY /dev/md1 devices=/dev/sda1,/dev/sdb1

ARRAY /dev/md2 devices=/dev/sdc1,/dev/sdd1

ARRAY /dev/md0 devices=/dev/md1,/dev/md2

```

If you think about it, it makes sense.  To build an array out of arrays, the middle arrays will need both ARRAY entries to defined them then DEVICE entries to use them for further arrays.

----------

## Phk

Yeah, it makes sense  :Wink: 

Glad you worked it out!! I'm still in a mess....

If you want to know\help, visit my issues page..... ----> HERE

I'm posting the new problem in 30 minutes or so.

See us!

----------

