# dbus and ldap == hangs at boot

## cantao

Hi Friends!

I have a laptop that is most of the time used as a desktop, connected to a lan. Users authenticate through LDAP and their /homes are mounted using NFS.

Everything is working very fine, but... If I try to use the laptop disconnected from the lan, the boot hangs when starting DBUS. According to a few sparse references I have found, the problem seems that DBUS is trying to figure something out through LDAP (via nss_ldap). As the laptop is not connected, no LDAP server, so hanging forever.

Setting bind_policy to soft in /etc/ldap.conf was of no avail either.

Any hints?

Thanks a lot in a advance, Cantão!

----------

## ellingsw

I have a desktop machine that is configured to authenticate users via LDAP and /home is NFS mounted.  My machine is connected to the network all the time, however, no ethernet devices are up by the time D-BUS attempts to start.

During boot my machine hangs for a bit after it displays "Cleaning /tmp directory" then it hangs indefinitely when it gets to "Starting D-BUS system messagebus".  And no my problem with cleaning /tmp is not due to the tpm devices line in /etc/udev/rules.d/50-udev.rules because I already commented it out.

As of right now, I've been waiting for my desktop to boot for about 5 hours.  I can't tell you which version of D-BUS is installed because I cannot get my machine booted far enough to look.

Also, one other thing that really bothers me is if the LDAP server is unavailable I cannot even login as root on my desktop.

----------

## ellingsw

I was finally able to boot my system to a point where I could disable dbus and reboot.  Some services failed to start because they were unable to find the dbus service but at least I can login now.

I currently have dbus-0.60-r4 installed, which was installed on March 19.  I performed an emerge sync, updated to dbus-0.61-r1 then rebooted, however, this did not fix the issue.

----------

## cantao

Hi ellingsw!

Sorry for the delay in response! It' Sunday *and* I was in the middle of a power shortage.

I'm currently phisically far from the mentioned laptop, but what I can tell you now is that the "Cleaning /tmp" issue simply vanished with -- I guess -- the last baselayout update. But the D-BUS issue is really annoying.

I have created a local (non LDAP) user to be used when disconnected from the net, and took the machine for a presentation within a client's office. My surprise? The laptop simply stuck at the D-BUS thing. Very unconfortable to have 5 pairs of eyes staring blankly at you  :Very Happy: 

Anyway, I have to fix this @*#($@%&$ for a new presentation wednesday. I'll try the last ~x86 dbus to see what happens. I'll keep in touch, in case of success.

Worth telling you that my /etc/nsswitch.conf is set as "files ldap". To my understanding, the system should look *first* on local files, then on LDAP, if the information (users, groups, whatever) is not locally available. Am I correct about this?

Best regards and good luck, Cantão!

----------

## .:chrome:.

set in /etc/ldap.conf

```
bind_policy soft
```

----------

## cantao

Hi k.gothmog!

Yes, I tried that (it's on the original post). It didn't work and had a side effect, disallowing me to ssh into the machine. Had to revert it.

Thanks a lot, Cantão!

----------

## UberLord

 *ellingsw wrote:*   

> During boot my machine hangs for a bit after it displays "Cleaning /tmp directory"

 

It's not hanging, it's working out the order to start services. Due to doing a complex topological sort in bash it's a bit slow. We're trying to recitify this for baselayout-1.13 (tsort is not an option for the curious btw)

----------

## cantao

Hi UberLord!

 *Quote:*   

> It's not hanging, it's working out the order to start services. Due to doing a complex topological sort in bash it's a bit slow. We're trying to recitify this for baselayout-1.13 (tsort is not an option for the curious btw)

 

Yep, the problem here seems the D-BUS thing. The question is, if I have "files ldap" in nsswitch.conf, and the messagebus user is local (/etc/passwd, /etc/groups), why should the service look into LDAP?

Worse, why can't ellingsw log in as root without LDAP (presuming root is local too)?

Cheers, Cantão!

----------

## UberLord

 *cantao wrote:*   

> Yep, the problem here seems the D-BUS thing. The question is, if I have "files ldap" in nsswitch.conf, and the messagebus user is local (/etc/passwd, /etc/groups), why should the service look into LDAP?

 

Depends on how dbus works. If it's set to enumerate users/groups then it will look in LDAP regardless of order in nsswitch.conf

 *Quote:*   

> Worse, why can't ellingsw log in as root without LDAP (presuming root is local too)?

 

I would assume that the rc process hasn't completed, which disables login until it has.

----------

## cantao

Hi UberLord!

 *Quote:*   

> Depends on how dbus works. If it's set to enumerate users/groups then it will look in LDAP regardless of order in nsswitch.conf 

 

Hum... Very good guess. Upstream problem, perhaps? I'm going to do some research  :Smile: 

Cheers, Cantão!

----------

## UberLord

It will also do a ldap lookup if it searchs for a user/uid not in /etc/passwd

----------

## cantao

 *UberLord wrote:*   

> It will also do a ldap lookup if it searchs for a user/uid not in /etc/passwd

 

Exactly. But, AFAIK, messagebus is the user related to D-BUS, and it's a local user.

Cheers, Cantão!

----------

## ellingsw

I found the command "chown 0:0 /tmp/.{ICE,X11}-unix" was re-introduced into /etc/init.d/bootmisc after I had already changed it to "chown root:root /tmp/.{ICE,X11}-unix"---I can't remember the bug number.  I have since updated to baselayout-1.12.4-r7 and the command is now commented out by default in the init script.  I have not rebooted yet to see if my problem with "Cleaning /tmp" is fixed or at least does not delay as long.

According to the documentation on nsswitch.conf you are correct cantao but it does not appear to work that way.  Of course, this does apply to calls to getpwent functions.  I don't know if there are other functions that implement password `db' lookups and what programs would use them, however, I doubt /bin/login would use them.

I have the following in /etc/nsswitch.conf:

```
passwd:      files ldap

shadow:      files ldap

group:       files ldap
```

I do not have the bind_policy option in /etc/ldap.conf and according to nss_ldap(5) the default policy is hard_open.  I can try the "bind_policy soft" option but I doubt it will help.  For one, it doesn't work for cantao.  And two; IIRC, login reports invalid user id and password for root instead of waiting for the LDAP timeout.  root is a local account in /etc/password.

UberLord, I am presented with a login prompt so I know the rc process has completed.  I can sit in front of the screen and watch my system go through the boot process but still not be able to login as root at the login prompt if the ethernet interface fails to come up.

----------

## UberLord

 *ellingsw wrote:*   

> UberLord, I am presented with a login prompt so I know the rc process has completed.  I can sit in front of the screen and watch my system go through the boot process but still not be able to login as root at the login prompt if the ethernet interface fails to come up.

 

Is that because the root password on LDAP is different from how it is locally or something?

----------

## ellingsw

 *UberLord wrote:*   

> Is that because the root password on LDAP is different from how it is locally or something?

 

What?!  What difference would root's password in LDAP make if LDAP is unavailable because the network interface is down.  I AM typing in the correct password for the local root account.

As for "Cleaning /tmp".  After updating to baselayout-1.12.4-r7 and rebooting, I do not see a noticeable delay during boot when "Cleaning /tmp".

The "bind_policy soft" option has an unfortunate consequence that makes it unusable.  If bind_policy is set to soft, all users existing in LDAP cannot ssh into the box.  When a user tries to ssh in, sshd authenticates the user---and displays /etc/issue in my case---then is immediately disconnected by sshd.

Here is proof for reference:

~ $> ssh hostname

This is a private system. If you do not have

an account on this system please disconnect

now.  Connections are logged and any attempt

to hack into this system will be reported to

the appropriate authorities.

Connection to hostname closed by remote host.

Connection to hostname closed.

==> /var/log/auth <==

Sep 10 18:03:28 hostname sshd[24836]: Accepted publickey for username from 192.168.2.101 port 3489 ssh2

Sep 10 18:03:28 hostname sshd(pam_unix)[24838]: session opened for user username by (uid=0)

Sep 10 18:03:28 hostname sshd[24836]: nss_ldap: could not search LDAP server - Server is unavailable

Sep 10 18:03:28 hostname sshd[24836]: fatal: login_get_lastlog: Cannot find account for uid 1001

Sep 10 18:03:28 hostname sshd[24836]: syslogin_perform_logout: logout() returned an error

Sep 10 18:03:28 hostname sshd(pam_unix)[24838]: session closed for user username

==> /var/log/error <==

Sep 10 18:03:28 hostname sshd[24836]: nss_ldap: could not search LDAP server - Server is unavailable

Sep 10 18:03:28 hostname sshd[24836]: fatal: login_get_lastlog: Cannot find account for uid 1001

==> /var/log/messages <==

Sep 10 18:03:28 hostname sshd[24836]: Accepted publickey for username from 192.168.2.101 port 3489 ssh2

Sep 10 18:03:28 hostname sshd(pam_unix)[24838]: session opened for user username by (uid=0)

Sep 10 18:03:28 hostname sshd[24836]: nss_ldap: could not search LDAP server - Server is unavailable

Sep 10 18:03:28 hostname sshd[24836]: fatal: login_get_lastlog: Cannot find account for uid 1001

Sep 10 18:03:28 hostname sshd[24836]: syslogin_perform_logout: logout() returned an error

Sep 10 18:03:28 hostname sshd(pam_unix)[24838]: session closed for user username

----------

## cantao

Well, as ellingsw said, bind_policy soft is a shoot on the feet, at least for remote admin  :Smile: 

I have just given up. I'll duplicate accounts locally on the laptop and free it from LDAP.

Thanks to everybody, Cantão!

----------

## mattsk

I'm having similar problems. Have you tried using the timelimit or bind_timelimit in /etc/ldap.conf?  Or have you tried downgrading versions of nss_ldap?

In my case, if the ldap service isn't started yet (or has been stopped for some reason) then the system is *really* slow. I've worked out that it's the system trying to connect to the ldap server for a user or group search, and I also have

```
files ldap
```

set for the appropriate services in /etc/nsswitch.conf. When this happens, I can't even log in as root at the console - it times out after 60 seconds.

I am reasonably certain that the problem lies with nss_ldap. I recently updated it, and it's been since then that the problems have started. It's been a while since I've been forced to, but I'm fairly sure I was previously able to log in as root at the console, if the ldap server (which resides on the same computer, as it happens) was down for whatever reason. In addition to this, commands run at the command line sometimes took a long time to complete (even ls) - but these symptoms would go away if I removed the 'ldap' keyword on the line entries in /etc/nsswitch.conf. In fact I have to do this to be able to restart the ldap server (since it seems to do a ldap search for the ldap user). In one case so far I even had to reboot the machine into single user mode (done by editing the grub boot paramaters on the fly) to edit the nsswitch.conf file just so I could boot.  I haven't tried rebooting since, but I suspect I"ll have to do it again.

During my struggles so far, I've tried setting the soft bind option - and it stopped ssh logins from happenign for me to (I'd like to know why that is - the debug slapd logs I poured through didn't offer up any clue that I could see). And setting:

```
timelimit 30

bind_timelimit 30
```

in /etc/ldap.conf doesn't seem to have made a difference. Part of the problem seems to stem from the fact that the default behaviour of nss_ldap is to never stop trying to connect to the ldap server.

The only real solution I have when

So with this in mind, you may want to try the above settings (maybe with smaller timeouts) , or downgrade your nss_ldap and see if either fixes the problem. I can't remember which version I upgraded from - but the emerge.log seems to indicate that it was 226. I'm currently using 249. I also just realised that I'm running version 239-r1 on one of my other servers, so I'll do some tests and see how that computer behaves when the ldap service is down.

I'd like to know *why* it insists on still checking the ldap service. Fortunately, in my case, it's quite rare for the ldap server to be offline when that computer is online.

----------

## mattsk

Update: my other server running version 239-r1 doesn't skip a beat when the ldap server is down. If I call groups <user> when the ldap server is up I get hte groups for that user, and if I call it when the server is down, I get an "unknown user" error.

The servers have identical /etc/nsswitch entries for "passwd:, groups:, and shadow:"

I just downgraded to that version on the main and all seems to be well. No pauses, and no need to edit nsswitch.conf to restart the ldap server.

I've temporarily solved the problem by putting

```
>=sys-auth/nss_ldap-249
```

in /etc/portage/package.mask

I know this is all a month and a half after you gave up, but I hope this is still helpful.

This seems to be an nss_ldap bug - but I can't find anything else about it on the net so far. Does anybody know if it goes away in later versions (I notice that there are versions 250, 250-r1, 252, and 253 all masked with the ~x86 keyword)

----------

## cantao

Hi Mattsk!

Thanks a lot for your reply.

Tomorrow I'll have the laptop in hands, so I can give the several nss_ldap versions a try. I'll see what happens with the unstable versions and I'll post the results here.

 *mattsk wrote:*   

> I can't remember which version I upgraded from - but the emerge.log seems to indicate that it was 226. I'm currently using 249

 

There goes a great package:

```
emerge -v genlop
```

and then:

```
genlop nss_ldap
```

It should tell you the history behind your versions os nss_ldap (or any other package, indeed).

Thanks a lot, Cantão!

----------

## cantao

It only gets worse...

I could not check the laptop (is was travelling), but I performed several updates on some machines (monolitic Xorg -> non-monolitic Xorg, openssl and gnutls), all by the book, with several revdep-rebuilds to make sure everything was ok.

And then, these updated machines are hanging at DBUS also, even connected to the LAN and with the LDAP server working fine.

I'm trying to boot them to re-emerge dbus and nss_ldap. Let's see what happens.

Cheers, Cantão!

----------

## cantao

 *Quote:*   

> I'm trying to boot them to re-emerge dbus and nss_ldap. Let's see what happens.

 

Nope, that's not the problem. Chechink /var/log/message and the init scripts I discovered that DBUS is trying to connect to LDAP before net.eth0. After one zillon attempts, it gives up and continues.

Updating baselayout as an attempt...

Cheers, Cantão!

----------

## blubbi

Now I ran into the same problem.

System is stable and up-to-date.

Only thing that is ~x86 is udev to solve the nss_ldap and udev problem (a user called tss causes a ldap lookup) described here

https://bugs.gentoo.org/show_bug.cgi?id=99564

I worked around this problem in the following way.

I don't have any local users so I added all system users to /etc/ldap.conf:

```
echo "nss_initgroups_ignoreusers $(cat /etc/passwd | cut -d : -f1 | xargs |sed -e 's/ /,/g')" >> nss_initgroups_ignoreusers
```

My nsswitch.conf loks like this:

passwd: files compat ldap

shadow: files compat ldap

group: files compat ldap

Now the system boots and I can login as root if there is no access to the LDAP.

But still, there is famd, nscd and kdm_config wich try to connect to ldap. I have no clue what they are trying to lookup on the LDAP. Okay, nscd tries to cache the users from LDAP but what about kdm_config and famd?

regards

blubbi

----------

## cantao

Hi Friends!

In fact, I gave up on this... I recreated all users locally on the laptop, without any mention to LDAP on pam or nsswitch.conf. This way I can use the NFS mounted /home if the lan is up, or a local user if the laptop is unconnected.

Not the most elegant solution, but it worked anyway.

Thanks to all, Cantão!

----------

## bunder

i currently have this problem as well.  i believe i'm just going to hardmask the new openldap/nss_ldap/udev versions until they fix these bootup/login bugs.  tbh, this has been a problem for quite some time, and i can't believe that it hasn't been fixed properly.  bind_policy soft and nss_reconnect only make this problem worse.

cheers

----------

## ellingsw

Heads Up!

I had to reboot today because my system was not responding to user input from the console even though I could login remotely.  Upon boot, the problem with my machine hanging indefinitely at boot when it gets to "Starting D-BUS system messagebus" is back.

I was able to get the system booted by restarting and performing an interactive boot... skipping dbus of course when it asked if I wanted to start it.  After the system and NIC were up, I was able to start dbus without a problem.

I have not determined the cause of it this time but I'll see if I can figure it out.

The latest version of nss_ldap on my machine is 249 and was installed on Oct 14, 2006.  My last reboot was Nov 12, 2006, which I did not have a problem with.

I have rsync 4 times since then and performed at least 2 updates with an emerge world in there somewhere after upgrading to gcc 4.1.1-r1.

Since Nov 12, the following packages, which are used during boot, have been updated:

baselayout (1.12.6)

udev (103)

dbus (0.62-r2)

There are other packages that have been updated but I doubt they are related to this issue.

----------

## nbensa

```
nbensa@zeddmore ~ $ cat /etc/portage/package.mask

=sys-auth/nss_ldap-254

=sys-auth/nss_ldap-253

=sys-auth/nss_ldap-252

=sys-auth/nss_ldap-250-r1

=sys-auth/nss_ldap-250

=sys-auth/nss_ldap-249

```

Yes, that's right: 

```
nbensa@zeddmore ~ $ eix nss_ldap

[I] sys-auth/nss_ldap

     Available versions:  239-r1 [m]249 [m](~)250 [m](~)250-r1 [m](~)252 [m]253

     Installed versions:  239-r1(01:00:55 PM 12/25/2006)(-debug)

     Homepage:            http://www.padl.com/OSS/nss_ldap.html

     Description:         NSS LDAP Module

```

So unless you really need features only availble in >nss_ldap-239-r1, I suggest downgrading to =nss_ldap-239-r1

----------

## ellingsw

OK, but as I stated:

 *ellingsw wrote:*   

> The latest version of nss_ldap on my machine is 249 and was installed on Oct 14, 2006.  My last reboot was Nov 12, 2006, which I did not have a problem with.

 

My machine was working with nss_ldap-249 and nss_ldap-249 is still installed.

One thing I did notice was /etc/dbus-1/system.d/hal.conf was referencing UID 0 and thought the problem might be the same as the /etc/init.d/bootmisc issue.  I changed the reference to root but it did not help.

I thought I would try to monitor the logs while dbus is trying to start with the network interface down but, after letting it set there for a couple of hours, nothing was reported in the logs.

----------

## surnu

I just added in /etc/init.d/dbus 

depend() {

   after nscd dns net

}

i don't understand why there is dns switch, because i dont want install bind server to my workstation, net switch seems to be better option.

Now my workstation don't waste time to connect unconnectable ldap server.

Also i use nss_ldap-253, seems to better than older ones.

----------

## ellingsw

I though there were other ways to specify dependencies but couldn't remember till I ran "/etc/init.d/dbus help".

I put the following in /etc/conf.d/dbus:

```
RC_NEED="net"
```

I haven't rebooted yet but this should take care of the problem.

----------

## UberLord

 *surnu wrote:*   

> i don't understand why there is dns switch, because i dont want install bind server to my workstation

 

Because LDAP may need DNS to resolve the LDAP server?

Besides, that's not forcing a dns server on you. It's just saying IF you have one installed and in a runlevel then dbus is to be started after it.

----------

## UberLord

 *ellingsw wrote:*   

> I though there were other ways to specify dependencies but couldn't remember till I ran "/etc/init.d/dbus help".
> 
> I put the following in /etc/conf.d/dbus:
> 
> ```
> ...

 

Yes, that should work nicely.

----------

