# [SOLVED?] EVMS: not a valid root device; Start udevd w/ldap

## Vieri

Hello,

I installed Gentoo on an Intel EM64T using the 2005.1-r1 AMD64 Universal CD with the default 2.6.12-r10.

I compiled it with genkernel --menuconfig --evms2 all

I selected ramdisk support, raid and device mapper.

Grub.conf contained:

title=INF-BL07 64bit EM64T NOCONA 2.6.12-r10 SCSI EVMS RAID1

root (hd0,0)

kernel /kernel-genkernel-x86_64-2.6.12-gentoo-r10 root=/dev/ram0 init=/linuxrc mem=4096M ramdisk=8192 doscsi vga=0 real_root=/dev/evms/root udev doevms2

initrd /initramfs-genkernel-x86_64-2.6.12-gentoo-r10

This system booted fine.

However, I made the "mistake?" of updating the whole system:

emerge --update --deep --newuse system

emerge --update --deep --newuse world

When I rebooted this system with the 2.6.12-r10 kernel, it "hung" on "Starting udev...".

If I pressed CTRL-C, the init process resumed but "hung again" on "Cleaning /tmp...".

So I commented out some lines in /etc/init.d/bootmisc (especially:

mkdir -p /tmp/.{ICE,X11}-unix

chown 0:0 /tmp/.{ICE,X11}-unix

chmod 1777 /tmp/.{ICE,X11}-unix

[[ -x /sbin/restorecon ]] && restorecon /tmp/.{ICE,X11}-unix

and all the >/dev/nulls)

and that allowed the system to boot. (I don't understand why)

I supposed that upgrading the whole system (baselayout 1.11.14-r5) without recompiling a recent kernel could have caused udev to "hang" so I recompiled the current 2.6.15-r5 kernel:

genkernel --menuconfig --evms2 all

I selected ramdisk support, raid and device mapper.

I updated grub.conf but when I rebooted I got these messages:

>> Activating udev OK

>> Activating EVMS OK

Determining root device...

Block device /dev/evms/root is not a valid root device

The root block device is unspecified or not detected.

Specify device to boot or "shell" for a shell.

So I "shelled"and noticed that even though evms_activate yields no errors/warnings, ls /dev/evms/ only lists "dm" and "dm/control".

Does anyone know why I can't see the /dev/evms/root or /dev/evms/.nodes/sda devices?

How could I debug? Any suggestions?Last edited by Vieri on Tue Feb 28, 2006 6:53 pm; edited 1 time in total

----------

## jschellhaass

What type of SCSI card are you using to boot?  

I believe genkernel only puts sata drivers in the initrd.  You can try compiling the disk controller driver into the kernel instead of as a module.

jeff

----------

## Vieri

The SCSI cards are LSI 1020 Ultra320 (integrated, one channel).

Two SCSI disks are connected via a PERC 4/im RAID controller.

Will double-check whether the LSI is built into the kernel (actually, genkernel worked fine for 2.6.12 - the problem has arisen for 2.6.15, oddly).

----------

## Vieri

I tried installing Gentoo with the new 2006.0 amd64 image on a Dell PowerEdge 1855 EM64T. This system only has a USB CD drive. 2006.0 and 2005.1 could not find/detect it. 2005.1-r1 detected it as /dev/sr0 and booted just fine.

It seems that the enhancements made to 2005.1-r1 were not propagated to 2006.0...

----------

## Vieri

Just in case someone has the same problem, here's how I pinpointed mine (thanks to the evms mailing list).

- the evms not detecting the disk was due to the fact that the scsi adapter was not built in the kernel (was wrongly assuming 2.6.15 had more or less same defaults as 2.6.12; if you have the same system, enable FUSION drivers in the kernel)

- the endless "Starting udevd..." was due to my "special" configuration and I suppose quite a few users may be in this situation. Authentication in my system is done via LDAP so nsswitch.conf contained references to ldap. Somehow, the latest stable udev tries to resolve a tss user/group and udevd hangs on that.

To fix this problem there are 2 or 3 quick solutions:

-upgrade to an unstable udevd (untested and may not work but the developers are aware of this problem)

-edit /etc/nsswitch.conf and remove ldap. Of course that's not a permanent solution unless you change your authentication scheme. But at least you will be able to boot ok.

-edit /etc/udev/rules.d/50-udev.rules and comment the entry for the tss user/group (search for KERNEL=="tpm)

There's a Gentoo bug report on this issue: https://bugs.gentoo.org/show_bug.cgi?id=99564

I think the latest udev ebuild was marked stable too soon (LDAP environments weren't tested?).

[EDIT1]:

upgrading to an unstable udevd does not solve the issue (as of Feb 28th 2006)

[EDIT2]:

Comenting out the line

KERNEL=="tpm*",   NAME="%k", OWNER="tss", GROUP="tss", MODE="0600"

is a quick solution to avoid udev eternal lookups.

However there's another step that also blocks the init process: "Cleaning /tmp"

/etc/init.d/bootmisc

on the line

chown 0:0 /tmp/.{ICE,X11}-unix

If I comment that line out then the system boots ok (not a definite solution though).

System is EM64T (amd64 iso), latest udev and latest baselayout.

nsswitch.conf needs ldap in my case.Last edited by Vieri on Tue Feb 28, 2006 5:57 pm; edited 1 time in total

----------

## skyPhyr

Hi Vieri,

Great timing - I've been battling with this exact same problem:- https://forums.gentoo.org/viewtopic-p-3146322.html

I can confirm these fixes also work for me.

Cheers,

Alan.

----------

## Vieri

Glad it could help someone.

Strangely, this udev "bug" I mentioned above has been reported 6 months ago.

Hope they at least put a big ewarn for ldap users.

----------

## sedorox

This is weird... I have 2 machines... one is my new ldap test box.. the other is my 'production' box... Today (spring break, yay!) I booted up both. The 'test' box came up just fine, however, the 'production' box didn't. It hung at the udev thing. Your solution (commenting out the TPM device) did the trick.

Here's the kicker.... Both boxes has ldap (as server) and have the entries for nsswitch.conf... Both have udev-85... (I did notice -86 is out.. still gotta test). But one box had a problem, and one didn't...

Funky.....

----------

## Vieri

Do both boxes have the same sys-apps/baselayout version?

----------

## sedorox

 *Vieri wrote:*   

> Do both boxes have the same sys-apps/baselayout version?

 

Actually, yes, they are the same:

'test' box:

1.12.0_pre16-r1

'production' box:

1.12.0_pre16-r1

Thought I should do updates.. there are updates to both udev and baselayout....

----------

## twam

Same problem here with sys-apps/baselayout-1.12.0_pre16-r3 on 2 machines: emt64 and a pentium-m. :/

----------

## net

 :Evil or Very Mad:  The same problem here after the laste emerge -uD world yesterday.

(system stable x86 : Linux sk-srv 2.6.14-hardened-r5 #1 PREEMPT Wed Feb 1 22:17:18 CET 2006 i686 Pentium II (Deschutes) GenuineIntel GNU/Linux)

As a workaround I removed ldap from nssswitch.conf

Any idea about that ?

It's not a big probem at this time, but i'm working on ldap , so it has to work in the future.

Regards

----------

## sedorox

I've developed some other problems on my test box.. (yea.. the tpm bug finally appeared) but not a few things lag on start.. and i have problems with when slapd starts.. it tries to bind to itself.. and other stuff... nsswitch.conf related (looking for users) so I'm hesitant about upgrading my production box, however, I think I'm going to do it package by package, and see what breaks it...

----------

## BernieKe

putting the following in /etc/ldap.conf fixed the udev problem for me:

```
bind_policy soft
```

----------

## sedorox

Ok.. here's the thing... I upgrade my 'production' box slowly, and it isn't baselayout. Its sys-auth/nss_ldap.

The system was running: 239-r1

As soon as I upgraded to 249 I started having issues

Mine is when slapd starts, it tries to bind to itself (itsn't this a bad thing?) and of course udev, and apache, and other things that start before slapd does.

The only fix was to do the 'bind_policy soft' thingy.  besides downgrading, that I've found.

Granted, I don't know which version broke this, but at least we know what package it is... Maybe I should file a bug report? (tho I don't know what to report)

----------

## Ausdonky

Hi guys..

After having spent the last 4 hours thinking that my semi-production box had farked itself after a forced reboot (I was getting segfaults from udev?!) I managed to find out that there was nothing wrong with it?! It was ldap.. I managed to boot the bugger then re-enable ldap in the nsswitch.conf file but of course this was just a temp fix. Anyway.. after re-enabling ldap i rebooted to see if it still had issues but this time it just hung on udevd. In a fit of rage i gave the keyboard a good whack and then out of habit hit Ctrl-C and to my amazment it booted! I would assume that this will cause udev to not load devices after the point i break at but it will get you to a shell to fix it if you need to (rather than having to boot a livecd or similar)

btw i applied the patch as per above to the udev.rules file and this worked great. I also tried setting the bind_policy to soft but this didnt seem to work..

HTH

Andrew

----------

## cantao

I've had the same problem, as described here:

https://forums.gentoo.org/viewtopic-t-448608-highlight-.html

and commenting out the appropriate line on /etc/udev/rules.d/50-udev.rules worked flawlessly. No need to mess with /etc/ldap.conf (yes, I'm using ldap also).

I know it's something that can be easily sent to oblivion by a bad etc-update, but nice hack anyway  :Smile: 

Thanks a lot, Cantão!

----------

## sedorox

 *cantao wrote:*   

> 
> 
> and commenting out the appropriate line on /etc/udev/rules.d/50-udev.rules worked flawlessly. No need to mess with /etc/ldap.conf (yes, I'm using ldap also).
> 
> 

 

This works.. however I found when starting other services (like ldap itself) or apache... etc.. that it tries to bind to ldap.. and since, for some reason, its one of the last things to be started, that it fails, so I needed the ldap.conf setting...I wish I knew exactly what caused this in the first place.. was working so fine untill that one package update...

----------

## McManus

 *sedorox wrote:*   

>  *cantao wrote:*   
> 
> and commenting out the appropriate line on /etc/udev/rules.d/50-udev.rules worked flawlessly. No need to mess with /etc/ldap.conf (yes, I'm using ldap also).
> 
>  
> ...

 

I am experiencing exactly the same thing.  Any ideas, short of removing ldap support?

----------

## sedorox

 *McManus wrote:*   

> 
> 
> I am experiencing exactly the same thing.  Any ideas, short of removing ldap support?

 

Sorry it took me a while to get back to you.... here is what I have changed in my ldap.conf that has seemed to work:

```

bind_policy soft

nss_reconnect_tries 3

```

I also still have the tpm device commented out in /etc/udev/rules/50-udev.rules

----------

## MorpheuS.Ibis

i am just kind of a n00b in this but i also use LDAP and udev...

what about make nsswitch.conf a symlink and change it using local initscript (/etc/conf.d/local.start)? local starts at the end of booting process so network connection should be up and also the LDAP server (if you have it on that machine). also, change the symlink back when stopping the system (/etc/conf.d/local.stop)...

this actually kind of needs to have the local initscript for its disposal (so you dont mess with traffic shaping or something like that when experimenting with LDAP) so maybe creating an initscript for it (copied and a bit edited local shoud be sufficient) should be good idea. but there is one more thing....its too simple to work, but why dont give it a try?   :Wink: 

----------

