# net.eth0, net.eth1 stuck in starting state, but are started

## furlongm

I have a problem with my network, it seems that net.eth0 and net.eth1 never reach the started state, even though they are started and are fully functional. I wouldn't mind, only services that depend on them never get started, like sshd. On startup I get:

```

* WARNING:  icecream is scheduled to start when net.eth0 or net.eth1 has started.

* WARNING:  sshd is scheduled to start when net.eth0 or net.eth1 has started.

```

rc-status shows that that icecream and sshd don't get started:

```

Runlevel: default

 gpm                                                                                                                [ started  ]

 xdm                                                                                                                [ started  ]

 dbus                                                                                                               [ started  ]

 famd                                                                                                               [ started  ]

 hald                                                                                                               [ started  ]

 lisa                                                                                                               [ started  ]

 sshd                                                                                                               [ stopped  ]

 cupsd                                                                                                              [ started  ]

 ivman                                                                                                              [ started  ]

 local                                                                                                              [ started  ]

 mysql                                                                                                              [ started  ]

 icecream                                                                                                           [ stopped  ]

 vixie-cron                                                                                                         [ started  ]

 alsasound                                                                                                          [ started  ]

```

even when ifconfig shows that both eth0 and eth1 are up:

```

eth0      Link encap:Ethernet  HWaddr 00:12:3F:EA:A4:A6

          inet addr:192.168.1.148  Bcast:192.168.1.255  Mask:255.255.255.0

          UP BROADCAST NOTRAILERS RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:1451 errors:0 dropped:0 overruns:0 frame:0

          TX packets:2177 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:861225 (841.0 Kb)  TX bytes:280844 (274.2 Kb)

          Interrupt:18

eth1      Link encap:Ethernet  HWaddr 00:13:CE:50:C0:2A

          inet addr:192.168.1.107  Bcast:192.168.1.255  Mask:255.255.255.0

          UP BROADCAST NOTRAILERS RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:524 errors:1 dropped:1 overruns:0 frame:0

          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:24796 (24.2 Kb)  TX bytes:1317 (1.2 Kb)

          Interrupt:19 Base address:0x8000 Memory:dfcfd000-dfcfdfff

```

/etc/init.d/net.eth0 status shows that it is stuck in the starting state (same for eth1):

```

~ # /etc/init.d/net.eth1 status

 * status:  starting

```

If I want icecream or sshd to start, I cannot do it manually until I restart both net.eth0 and net.eth1, then I can start them manually after stopping them.

The relevant lines from /etc/conf.d/rc are as follows, and seem to follow advice given in other posts:

```

RC_PARALLEL_STARTUP="no"

RC_HOTPLUG="yes"

RC_COLDPLUG="yes"

RC_PLUG_SERVICES=""

RC_NET_STRICT_CHECKING="no"

```

Has anyone else witnessed similar behaviour or is there a workaround for this? I've searched through previous forum posts, and have tried various different values for the above variables in /etc/conf.d/rc, but the problem does not go away. 

Any ideas?

Thanks,

Marcus.

----------

## UberLord

Does /etc/init.d/net.eth0 restart fix it?

----------

## furlongm

 *UberLord wrote:*   

> Does /etc/init.d/net.eth0 restart fix it?

 

No. Another odd thing is that I have to run net.eth0 restart twice to get it to restart. Same for net.eth1. Output is below.

Once net.eth0 and net.eth1 have both been restarted one time, I can manually start the services. If both haven't been restarted it tells me that they are scheduled to start as before.

```

~ # /etc/init.d/net.eth0 restart

 * Stopping eth0

 *   Loading networking modules for eth0

 *     modules: apipa arping ccwgroup tuntap macchanger macnet rename netplugd iwconfig wpa_supplicant essidnet iptunnel ifconfig iproute2 system dhcpcd ip6to4

 *   Bringing down eth0

 *     Stopping dhcpcd on eth0 ...                                                                                                                                                                [ ok ]

 *     Stopping netplug on eth0 ...                                                                                                                                                               [ ok ]

 *     Shutting down eth0 ...                                                                                                                                                                     [ ok ]

 * WARNING:  net.eth0 has already been started.

~ # /etc/init.d/net.eth0 restart

 * Stopping eth0

 *   Loading networking modules for eth0

 *     modules: apipa arping ccwgroup tuntap macchanger macnet rename netplugd iwconfig wpa_supplicant essidnet iptunnel ifconfig iproute2 system dhcpcd ip6to4

 *   Bringing down eth0

 *     Shutting down eth0 ...                                                                                                                                                                     [ ok ]

 * Starting eth0

 *   Loading networking modules for eth0

 *     modules: apipa arping ccwgroup tuntap macchanger macnet rename netplugd iwconfig essidnet iptunnel iproute2 system dhcpcd ip6to4

 *       netplugd provides plug

 *       iwconfig provides wireless

 *       iproute2 provides interface

 *       dhcpcd provides dhcp

 *   Configuring eth0 for MAC address 00:12:3F:EA:A4:A6 ...                                                                                                                                       [ ok ]

 *   Starting netplug on eth0 ...                                                                                                                                                                 [ ok ]

 *     Backgrounding ...

~ # rc-status

Runlevel: default

 gpm                                                                                                                                                                                        [ started  ]

 xdm                                                                                                                                                                                        [ started  ]

 dbus                                                                                                                                                                                       [ started  ]

 famd                                                                                                                                                                                       [ started  ]

 hald                                                                                                                                                                                       [ started  ]

 lisa                                                                                                                                                                                       [ started  ]

 sshd                                                                                                                                                                                       [ stopped  ]

 cupsd                                                                                                                                                                                      [ started  ]

 ivman                                                                                                                                                                                      [ started  ]

 local                                                                                                                                                                                      [ started  ]

 mysql                                                                                                                                                                                      [ started  ]

 icecream                                                                                                                                                                                   [ stopped  ]

 vixie-cron                                                                                                                                                                                 [ started  ]

 alsasound                                                                                                                                                                                  [ started  ]

```

----------

## UberLord

Not entirely sure whats happening there.

You could try baselayout-1.13.0_alpha6 which may cure it though.

Might be an idea to reboot when changing between 1.12 and 1.13 at this time.

----------

## mambro

I've the same problem...  :Sad:   something new?

----------

## furlongm

 *UberLord wrote:*   

> Not entirely sure whats happening there.
> 
> You could try baselayout-1.13.0_alpha6 which may cure it though.
> 
> Might be an idea to reboot when changing between 1.12 and 1.13 at this time.

 

Just got around to trying this now. baselayout-1.13.0_alpha10 does indeed fix the problem. but introduces a few other problems that make it unusable for now (no gettys; mysql not starting; wireless disconnecting every 10 minutes only if wired is connected, otherwise fine).

----------

## UberLord

 *furlongm wrote:*   

> Just got around to trying this now. baselayout-1.13.0_alpha10 does indeed fix the problem. but introduces a few other problems that make it unusable for now (no gettys; mysql not starting; wireless disconnecting every 10 minutes only if wired is connected, otherwise fine).

 

Interesting.

mysql loads fine on my sparc64 box (although that's gentoo freebsd)

Wireless disconnecting every 10 minutes isn't a baselayout issue

The gettys one is sure interesting though. Do you mean that you don't get to the login prompt?

----------

## furlongm

 *UberLord wrote:*   

>  *furlongm wrote:*   Just got around to trying this now. baselayout-1.13.0_alpha10 does indeed fix the problem. but introduces a few other problems that make it unusable for now (no gettys; mysql not starting; wireless disconnecting every 10 minutes only if wired is connected, otherwise fine). 
> 
> Interesting.
> 
> mysql loads fine on my sparc64 box (although that's gentoo freebsd)
> ...

 

mysql is actually starting, it just reports that it doesn't (i.e. it fails). Further attempts to run /etc/init.d/mysql start|stop|restart give no output apart from "starting" or "stopping"

The wireless issue seems to be a dhcpcd or kernel issue, it gets TKIP replays, but if and only if eth0 (wired) is connected and running dhcpcd too.

The gettys thing, it seems to be getting stuck on /etc/init.d/local, so never gets to run the gettys. /etc/conf.d/local.start looks like this:

```

echo 1024 > /proc/sys/dev/rtc/max-user-freq

echo 1 > /proc/sys/net/ipv4/ip_forward

echo "0x7fffffff" > /proc/sys/kernel/shmmax

hdparm -qW1 -qA1 -qM254 /dev/sda

dmesg > "/var/log/dmesgs/`uname -r`-`date`"

```

but that doesn't seem to be where it hangs. A snippet of the output from ps axf shows that it's getting stuck elsewhere..

```

 6325 ?        Ss     0:00 /sbin/netplugd -i eth0 -P -p /var/run/netplugd.eth0.pid -c /dev/null

 6825 ?        S      0:00  \_ /bin/sh /etc/netplug.d/netplug eth0 in

 6827 ?        S      0:00      \_ /bin/bash /sbin/runscript.sh /etc/init.d/net.eth0 --quiet start

 6857 ?        S      0:00          \_ /bin/bash /sbin/runscript.sh /etc/init.d/net.eth0 --quiet start

 7170 ?        S      0:10              \_ /sbin/dhcpcd eth0

 6691 ?        Ss     0:00 /sbin/wpa_supplicant -Dwext -c/etc/wpa_supplicant.conf -W -B -ieth1 -P/var/run/wpa_supplicant-eth1.pid

 6705 ?        Ss     0:00 /bin/wpa_cli -a/etc/wpa_supplicant/wpa_cli.sh -p/var/run/wpa_supplicant -ieth1 -P/var/run/wpa_cli-eth1.pid -B

 7291 ?        S      0:00  \_ /bin/sh /etc/wpa_supplicant/wpa_cli.sh eth1 CONNECTED

 7305 ?        S      0:00      \_ /bin/bash /sbin/runscript.sh /etc/init.d/net.eth1 --quiet start

 7377 ?        S      0:00          \_ /bin/bash /sbin/runscript.sh /etc/init.d/net.eth1 --quiet start

 7883 ?        S      0:00              \_ /sbin/dhcpcd -m 2000 eth1

 7264 ?        Ss     0:00 /bin/bash /sbin/rc default

 8849 ?        S      0:00  \_ /bin/bash /sbin/rc default

 8850 ?        S      0:00      \_ /bin/bash /sbin/runscript.sh /etc/init.d/local start

 8893 ?        S      0:00          \_ cat /lib/rcscripts/init.d/exclusive/net.eth0

```

```

# ls -l /lib/rcscripts/init.d/exclusive

total 0

prw-r--r-- 1 root root 0 2006-12-20 12:50 local

prw-r--r-- 1 root root 0 2006-12-20 12:49 net.eth0

prw-r--r-- 1 root root 0 2006-12-20 12:49 net.eth1

```

The cat is from function wait_service() in /lib/rcscripts/sh/rc-services.sh (line 112). If I kill /etc/init.d/local start, I get the gettys.

One other thing, shutdown is dodgy too with this baselayout, it hangs halfway through.

----------

## UberLord

Very good information.

What version of dhcpcd?

----------

## furlongm

 *UberLord wrote:*   

> Very good information.
> 
> What version of dhcpcd?

 

The current stable version 2.0.5-r1

----------

## UberLord

Try with 3.0.7, it's much better. And more importantly, the default timeout is 20 seconds and not 60.

----------

## furlongm

 *UberLord wrote:*   

> Try with 3.0.7, it's much better. And more importantly, the default timeout is 20 seconds and not 60.

 

You believe it's a timeout issue? FWIW, the above ps output was taken some hours after booting the machine. I'll try out 3.0.7 and report back anyway.

----------

## UberLord

Ah, you didn't by any chance emerge dhcpcd with the debug use flag? That could cause that. Luckily, that's also been removed in the 3.x code  :Smile: 

I've also posted alpha10-r2 to fix an issue with the new postgresql init script - it may fix your mysql issue, it may not.

----------

## furlongm

 *UberLord wrote:*   

> Ah, you didn't by any chance emerge dhcpcd with the debug use flag? That could cause that. Luckily, that's also been removed in the 3.x code 

 

Aha! Yes I did.. Could that also be the cause of the original problem?

 *UberLord wrote:*   

> I've also posted alpha10-r2 to fix an issue with the new postgresql init script - it may fix your mysql issue, it may not.

 

Ok, I'll sync and report back when I get a chance to reboot.

----------

## ValenceParadigm

I'll be trying dhcpcd 3.07 shortly, but is there a problem with 3.06?

I've been experiencing bizarre behavior from dhcp lately on the laptop.  

Tonight it wouldn't lease an address from an unencrypted wireless AP, that connected fine when I booted up with Win(bl)ows.  dhcpcd actually told me it was "timing out" when I tried it with the -d flag.  Earlier this week I was getting scenarios where all the dchp information on my (wired) eth0 would just disappear.  A quick "dhcpcd -n eth0" fixed things, until the machine sat idle for a minute or two.

Uberlord:  You also mentioned the default time-out on dhcpcd is now 20 and not 60.  A thought for what's going wrong in my case perhaps? 

-=VP=-

----------

## UberLord

 *furlongm wrote:*   

>  *UberLord wrote:*   Ah, you didn't by any chance emerge dhcpcd with the debug use flag? That could cause that. Luckily, that's also been removed in the 3.x code  
> 
> Aha! Yes I did.. Could that also be the cause of the original problem?

 

YES!

When emerged with the debug USE flag, dhcpcd never forks into the background so it can put the debug stuff on the screen.

----------

## UberLord

 *ValenceParadigm wrote:*   

> I'll be trying dhcpcd 3.07 shortly, but is there a problem with 3.06?

 

There are no fixes as such in 3.0.7.

However large chunks of code have been re-written so it compiles fine on alternative archs such as SPARC who addresses their memory differently than x86.

This was only pointed out when a fellow dev applied some insane code quality CFLAGS to dhcpcd - which are now used by default.

 *Quote:*   

> Tonight it wouldn't lease an address from an unencrypted wireless AP, that connected fine when I booted up with Win(bl)ows.  dhcpcd actually told me it was "timing out" when I tried it with the -d flag.  Earlier this week I was getting scenarios where all the dchp information on my (wired) eth0 would just disappear.  A quick "dhcpcd -n eth0" fixed things, until the machine sat idle for a minute or two.

 

Sounds like you have encryption stopped dhcpcd from working correctly on the wireless.

On the wired it sounds like there is an error renewing the IP address. Now, we follow RFC2131 exactly here and it may be that the router is not understanding a RENEW style REQUEST. dhcpcd -n just re-awakes dhcpcd and goes through the loop. Run it with the -d flag and email me all the relevant logs of this happening.

----------

## furlongm

 *UberLord wrote:*   

>  *furlongm wrote:*    *UberLord wrote:*   Ah, you didn't by any chance emerge dhcpcd with the debug use flag? That could cause that. Luckily, that's also been removed in the 3.x code  
> 
> Aha! Yes I did.. Could that also be the cause of the original problem? 
> 
> YES!
> ...

 

Yep, that was the problem alright. Thanks for your help!

Before I revert back to baselayout-1.12.6, here's a few more little things I found with 1.13.0-alpha10-r2.

The hanging on reboot solved itself, I'm guessing it was related to the dhcp problem.

dm-crypt:

```
 * Setting up dm-crypt mappings ...

/lib/rcscripts/addons/dm-crypt-start.sh: line 146: dm-crypt-execute-volumes: command not found                           [ok]

```

mysql:

```

* Starting mysql (/etc/mysql/my.cnf)

 * MySQL NOT started (1)                                                                                                 [ !! ]

 * ERROR:  mysql failed to start

```

although ps shows that it's running (is it meant to be running twice?)

```

 8667 ?        Ssl    0:00 /usr/sbin/mysqld --defaults-file=/etc/mysql/my.cnf --basedir=/usr --datadir=/var/lib/mysql --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock

 8673 ?        Ssl    0:00 /usr/sbin/mysqld --defaults-file=/etc/mysql/my.cnf --basedir=/usr --datadir=/var/lib/mysql --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock

```

Also when I get an error from start-stop-daemon when shutting down, that audio-entropyd hasn't got a pid file in /var/run.

----------

## UberLord

 *furlongm wrote:*   

> dm-crypt:
> 
> ```
>  * Setting up dm-crypt mappings ...
> 
> ...

 

Known error

https://bugs.gentoo.org/show_bug.cgi?id=157590

 *Quote:*   

> 
> 
> mysql:
> 
> ```
> ...

 

I'll look into this once I find a spare box I can put the hated mysql on.

Which version of mysql is that specifically?

 *Quote:*   

> Also when I get an error from start-stop-daemon when shutting down, that audio-entropyd hasn't got a pid file in /var/run.

 

Then its init script is in error - I shall look into this also.

Please open bugs for mysql and audio-entropyd on https://bugs.gentoo.org

You can assign them to me if you wish.

----------

## furlongm

 *UberLord wrote:*   

> 
> 
> I'll look into this once I find a spare box I can put the hated mysql on.
> 
> Which version of mysql is that specifically?
> ...

 

5.0.26-r1

----------

## ValenceParadigm

 *UberLord wrote:*   

> There are no fixes as such in 3.0.7.
> 
> However large chunks of code have been re-written so it compiles fine on alternative archs such as SPARC who addresses their memory differently than x86.
> 
> This was only pointed out when a fellow dev applied some insane code quality CFLAGS to dhcpcd - which are now used by default.
> ...

 

Okay that is way kewl - you dev guys are unreal.

 *UberLord wrote:*   

> Sounds like you have encryption stopped dhcpcd from working correctly on the wireless.

 

I wish that were the case.  The AP is completely unencrypted - despite my recommendations - the person responsible for this router is satisfied that wireless that restricts authentication by MAC alone is sufficient.  

When I got home the other night I tried it out on my home AP and it seems to work fine.  It could be something with wireless tools, because I'm running WPA at home here.  There's been a lot that's changed with wpa_supplicant too and authenticating against some APs is impossible now. Without wpa_gui, I'm having to learn how to do what I want to with wpa_cli.

 *UberLord wrote:*   

> 
> 
> On the wired it sounds like there is an error renewing the IP address. Now, we follow RFC2131 exactly here and it may be that the router is not understanding a RENEW style REQUEST. dhcpcd -n just re-awakes dhcpcd and goes through the loop. Run it with the -d flag and email me all the relevant logs of this happening.

 

I'll be headed out to that location again today - I'll see what I can capture.  You could be right about the router being complete crap.  You have to believe me when I say that this office is so completely cursed in a technology sense.  I saw a brand new mainboard completely stop working the instant the machine was brought into the office.   Take it to another location...works fine.  It's like the binary Bermuda triangle. 

Update (2006-12-23 06:30z):

The problem with the wired connection is completely the router's fault.  No software fault at all (my bad!).  It's a D-Link 2310, and has an issue with rebooting itself.  When coupled with ifplugd it makes for a bit of a lag between the time ifplugd shuts down eth0 and subsequently bringing it up again and leasing an address.  Thus the dropping of the IP information.

As for the leasing of addresses on a completely unencrypted wireless connection, that's a different topic.  I'm pretty much convinced that it now has nothing to do with dhcpcd.   I'm going to try a few more tests on known working systems before I post next - most likely in it's own thread if I find anything significant.

Sorry for the red-herring all.

-=VP=-

----------

