# pppd connection and route issues with lost connectivity

## mondjef

I have a strange issue with my PPPOE connection on my gentoo router box where it seems the connection is reset (ISP) end where it comes back up but something seems to still messed up.  All clients on the network lose internet access and the gentoo host it also losses access to the internet.  I can see that my ppp0 interface is active however I cannot ping any ip address or host name on the internet.  When I try the command 'route -n' it spits out the first couple routes then takes for ever to spit out the remaining route entries (like 3-5 mins).  I can resolve this be doing a /etc/init.d/net.ppp0 restart and the interface is reinitialized and the 'route -n' command spits out all routes to console again in a fraction of a second and all connectivity is restored.  I am at a loss on how to trouble shoot this....

Background: had adsl service for a number of years through a TD-W9970 modem that was bridged with this gentoo router box without any issues whatsoever.  I had switched to cable internet with another provider using ISP provided modem box in bridge mode (dhcp) and absolutely hated the connection quality and decided to go back to dsl with the previous provider...only this time on vdsl and higher speed than before.

Setup:  vdsl dsl service through a tp-link TD-W9970 modem which is set to bridge mode, gentoo router box as two nic....one (eth0) is part of a bridge (br0) which is used for LAN side and the other (eth1) is configured as WAN and establishes the connection to the dsl through the bridged modem.

Configuration:

/etc/conf.d/net

```

config_eth1="null"

config_ppp0=( "ppp" )

link_ppp0="eth1"

plugins_ppp0="pppoe"

username_ppp0="username"

password_ppp0="password"

pppd_ppp0="

defaultroute

usepeerdns

persist

mtu 1492

holdoff 0

maxfail 0

noauth

"

rc_net_ppp0_need="net.eth1"

```

ifconfig

```

br0: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST>  mtu 1500

        inet 192.168.0.1  netmask 255.255.255.0  broadcast 192.168.0.255

        inet6 fe80::21b:21ff:fe3d:eb49  prefixlen 64  scopeid 0x20<link>

        ether 00:1b:21:3d:eb:49  txqueuelen 1000  (Ethernet)

        RX packets 105310327  bytes 31885573514 (29.6 GiB)

        RX errors 0  dropped 1757  overruns 0  frame 0

        TX packets 211194618  bytes 237049811863 (220.7 GiB)

        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255

        inet6 fe80::42:eff:fe5b:4076  prefixlen 64  scopeid 0x20<link>

        ether 02:42:0e:5b:40:76  txqueuelen 0  (Ethernet)

        RX packets 2979585  bytes 391973280 (373.8 MiB)

        RX errors 0  dropped 0  overruns 0  frame 0

        TX packets 3806654  bytes 478101015 (455.9 MiB)

        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

        inet6 fe80::21b:21ff:fe3d:eb49  prefixlen 64  scopeid 0x20<link>

        ether 00:1b:21:3d:eb:49  txqueuelen 1000  (Ethernet)

        RX packets 107830427  bytes 34019995390 (31.6 GiB)

        RX errors 0  dropped 2290  overruns 0  frame 0

        TX packets 211225736  bytes 237967429532 (221.6 GiB)

        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

        device interrupt 19  memory 0xfdac0000-fdae0000

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

        inet6 fe80::224:1dff:fe21:376e  prefixlen 64  scopeid 0x20<link>

        ether 00:24:1d:21:37:6e  txqueuelen 1000  (Ethernet)

        RX packets 202514579  bytes 239218702935 (222.7 GiB)

        RX errors 0  dropped 0  overruns 0  frame 0

        TX packets 89107333  bytes 25614639891 (23.8 GiB)

        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536

        inet 127.0.0.1  netmask 255.0.0.0

        inet6 ::1  prefixlen 128  scopeid 0x10<host>

        loop  txqueuelen 1000  (Local Loopback)

        RX packets 4482554  bytes 616316515 (587.7 MiB)

        RX errors 0  dropped 0  overruns 0  frame 0

        TX packets 4482554  bytes 616316515 (587.7 MiB)

        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ppp0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST>  mtu 1492

        inet 172.83.166.119  netmask 255.255.255.255  destination 206.47.14.1

        ppp  txqueuelen 3  (Point-to-Point Protocol)

        RX packets 14185467  bytes 14990181258 (13.9 GiB)

        RX errors 0  dropped 0  overruns 0  frame 0

        TX packets 4818845  bytes 1593547290 (1.4 GiB)

        TX errors 0  dropped 23778 overruns 0  carrier 0  collisions 0

```

Do not ask why I use or have the br0 bridge as I can't quite remember and I had previously thought I could do away with it and tried as such and realized then that I need it still.  It is not the issue regardless.

route -n

```

Kernel IP routing table

Destination     Gateway         Genmask         Flags Metric Ref    Use Iface

0.0.0.0         0.0.0.0         0.0.0.0         U     4029   0        0 ppp0

127.0.0.0       127.0.0.1       255.0.0.0       UG    0      0        0 lo

172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0

192.168.0.0     0.0.0.0         255.255.255.0   U     0      0        0 br0

206.47.14.1     0.0.0.0         255.255.255.255 UH    0      0        0 ppp0

```

I could not find anything in the logs that I thought would be useful.  Everything on the bridged modem seems fine and shows dsl connection is established/line is synced when this happens. Any pointers on where to look or if I have anything incorrectly configured?  I get a disconnect about once every couple of days where I have to manually restart the interface.

----------

## user

Hi mondjef

hanging "route -n" is strange, same for "ip route" (sys-apps/iproute2)?

Your can add "debug" to pppd_ppp0 for more log verbosity 

and try "holdoff 3" (there was an old bug with holdoff 0)

----------

## mondjef

 *user wrote:*   

> Hi mondjef
> 
> hanging "route -n" is strange, same for "ip route" (sys-apps/iproute2)?
> 
> Your can add "debug" to pppd_ppp0 for more log verbosity 
> ...

 

Hi, thanks for the tips.

I changed the holdoff as you had suggested, but no change.

Correction though to my original post....the command 'route' hangs but 'route -n' and the other command you suggested 'ip route' does not hang during this issue.  Full disclosure, I have no idea what the -n flag does for the route command but the documentation says it displays the routing table in full numeric form.  Not sure how or why it would make a difference as to why this command would hang or not in my case though.

I did some more testing by way of rebooting the modem and when the modem comes back online and syncs the dsl line there is absolutely no activity from the pppoe side on the gentoo box to re-establish the ppp0 connection.  Actually, the ppp0 interface never goes away and just keeps its previously assigned dynamic public ip address.  Of course after the modem has been rebooted this ip is no long valid and all internet connectivity is lost.  To recover, I restart the ppp0 interface manually and it receives a new ip address and the routing table is updated and all internet connectivity is restored.  It is like the ppp0/pppd does not acknowledge there was a loss of connectivity on that interface at all and thus is not aware it needs to do anything to make things right.

This gentoo box has my firewall and traffic shaping for my entire network and would rather not have to fiddle with it or switch and have to use the router features on the tp-link vdsl modem.  I am thinking of maybe working around this issue by taking the vdsl modem out of bridge mode and have it establish the pppoe connection directly with ISP but I am not sure how to then connect this to eth1 and configure it so it can reach the internet connection of the modem while all traffic on this interfaces goes through my firewall as it does now and bypasses any firewall or dhcp servers on the modem.

----------

## mike155

[quote="mondjef"] *user wrote:*   

> Full disclosure, I have no idea what the -n flag does for the route command but the documentation says it displays the routing table in full numeric form.  Not sure how or why it would make a difference as to why this command would hang or not in my case though.

 

Well, "route -n" prints the routing table. "route -n" usually doesn't hang.

"route" (without -n) also prints the routing table, but it tries to convert IP addresses to names. In order to that, it makes calls to the DNS stub resolver. If the DNS stub resolver can't send queries to DNS servers, it looks like "route" is hanging. But, in reality, it's the DNS stub resolver that waits for answers from DNS servers.

----------

## mondjef

[quote="mike155"] *mondjef wrote:*   

>  *user wrote:*   Full disclosure, I have no idea what the -n flag does for the route command but the documentation says it displays the routing table in full numeric form.  Not sure how or why it would make a difference as to why this command would hang or not in my case though. 
> 
> Well, "route -n" prints the routing table. "route -n" usually doesn't hang.
> 
> "route" (without -n) also prints the routing table, but it tries to convert IP addresses to names. In order to that, it makes calls to the DNS stub resolver. If the DNS stub resolver can't send queries to DNS servers, it looks like "route" is hanging. But, in reality, it's the DNS stub resolver that waits for answers from DNS servers.

 

Yes thanks mike155, that makes sense now!  I run pihole in docker container on the same gentoo box and use pihole for both dhcp and dns configured to use cloudflare so when the route is broken any dns calls would fail/time out and that would explain why route command appears to hang while route -n does not at all.

Wish I can find a solution nevertheless to my issue though....  :Sad: 

----------

## pa4wdh

 *Quote:*   

> When I try the command 'route -n' it spits out the first couple routes then takes for ever to spit out the remaining route entries (like 3-5 mins).

 

This sounds like a problem with DNS and waiting for timeouts.

In you config_ppp0 you have usepeerdns, this means pppd will overwrite /etc/resolv.conf with DNS servers provided via the ppp link, which probably isn't what you want, i guess you want to keep using your pihole. Removing usepeerdns from your config_ppp0 might solve your problem, but don't forget to configure /etc/resol.conf the way you want it to be.

----------

## mondjef

 *Quote:*   

> This sounds like a problem with DNS and waiting for timeouts.

 

Yes, while this is the cause of the route command hanging, it is not the cause of the ppp0 connection issues.  Pinging ip addresses does not work either when in this state which takes DNS out of the picture.  The ultimate issue is the route becomes no longer valid and the ppp0 connection is not re-established and the route never gets updated which would restore the DNS.

 *Quote:*   

> In you config_ppp0 you have usepeerdns, this means pppd will overwrite /etc/resolv.conf with DNS servers provided via the ppp link, which probably isn't what you want, i guess you want to keep using your pihole. Removing usepeerdns from your config_ppp0 might solve your problem, but don't forget to configure /etc/resol.conf the way you want it to be.

 

This is only for the host (gentoo box itself), all other request from the network are routed through pihole via listening on br0 bridge interface which is bridged to eth0.  I can remove this setting and add the name server 192.168.0.1 (ip of br0) with no change to my original issue.  The reason I don't run it this way is that pihole runs in a docker container which fires up later in the boot process and there are other processes on the host machine that are looking for dns prior to this...i.e. ntp being one of them[/quote].

----------

## pingtoo

First clarification, I don't know much of ppp/pppoe, so I could be entirely wrong.

I found this quite interesting so I did some research this morning, I think in your case it is because the gentoo router's pppd have no way to know the link went down so it retain original IP configuration therefor all packers through ppp0 interface not working and propagate to upper layer for example DNS it hung because it is waiting for timeout.

a workaround would be turn on lcp-echo-failure 3lcp-echo-interval 10

With this setting should cause pppd terminate if your ISP drop connection.

However because openrc does not monitor service availability, you will need to use other service monitor tool to help to restart pppd

----------

## mondjef

I have added both the lcp-echo-failure and lcp-echo-interval options as pingtoo suggested as they do look promising and think I have seen them part of other configurations I have stumbled across on the net before.  Not quite sure if when these options detect the line is is down if it will terminate pppd or trigger pppd to restart the ppp0 interface.  I will monitor over the next few days and see what happens and will report back...fingers crossed.

----------

## mondjef

After adding the settings suggested by pingtoo and monitoring for a week now I can say that they seem to be making a difference.  Last night my ISP dropped my connection and the connection was re-established with a new ip address and all seems to be ok.

----------

