# Funky network delays [solved...?]

## Solra Bizna

<background>

My network is on the 192.168.0.0/22 subnet. I have a "primary" router (an AirPort Extreme*) which routes to 192.168.0.0/22, a secondary router (an AirPort Express) which bridges between two floors of this building, and a tertiary router (my Gentoo box) which routes to the same 192.168.0.0/22 as the AirPort Extreme. Machines that want to use this box for routing use 192.168.0.254 as their router. It's DMZ'ed on my primary router, so I can use it to forward ports, etc. This setup was working, until about 12 hours ago.

One of my servers has been threatening to fail for some time (cheap hardware). Yesterday afternoon (almost exactly 24 hours ago, in fact) its boot hard disk finally died. I managed, however, to add an extra hard disk to my tertiary router, copy everything from the dead server to the extra disk, and hack up a quick initscript to chroot into the extra disk and start a few needed servers. After a little tweaking, I had everything working (from an external standpoint) exactly as it had before the failure. Unfortunately...

</background>

All network stuff involving that router has an inexplicably high delay. Getting the routing tables on it take six to eight seconds per entry, and same for iptables entries (I have over 20). Additionally, after about five minutes of operating normally (aside from the above), the router suddenly stops forwarding packets. Rebooting the router causes it to forward packets again for a few minutes, but then I'm back to square one.

```
sbizna ~$ cat /proc/sys/net/ipv4/ip_forward 

1

sbizna ~$ time sudo route

Kernel IP routing table

Destination     Gateway         Genmask         Flags Metric Ref    Use Iface

192.168.0.0     *               255.255.252.0   U     0      0        0 eth0

loopback        localhost       255.0.0.0       UG    0      0        0 lo

default         192.168.0.1     0.0.0.0         UG    0      0        0 eth0

real    0m20.013s

user    0m0.006s

sys     0m0.007s
```

I am totally stumped.

The machine is a 1.6GHz Pentium IV with 512MB of RAM. It is completely responsive in every way not involving the network.

After it stops forwarding IP, I can still get to the services running directly on the box, as long as I don't mind waiting a while for the initial connection.

```
(from my laptop)

sbizna ~$ time ssh 192.168.0.254 true

real    0m10.373s

user    0m0.121s

sys     0m0.024s

sbizna ~$ ping -c 5 192.168.0.254

PING 192.168.0.254 (192.168.0.254) 56(84) bytes of data.

64 bytes from 192.168.0.254: icmp_seq=1 ttl=64 time=1.86 ms

64 bytes from 192.168.0.254: icmp_seq=2 ttl=64 time=3.40 ms

64 bytes from 192.168.0.254: icmp_seq=3 ttl=64 time=4.03 ms

64 bytes from 192.168.0.254: icmp_seq=4 ttl=64 time=1.88 ms

64 bytes from 192.168.0.254: icmp_seq=5 ttl=64 time=2.30 ms

--- 192.168.0.254 ping statistics ---

5 packets transmitted, 5 received, 0% packet loss, time 4004ms

rtt min/avg/max/mdev = 1.867/2.696/4.031/0.872 ms
```

My partition setup (in case something is relevant here):

```
sbizna ~$ mount

/dev/hda2 on / type ext3 (rw,noatime)

/dev/hda5 on /etc type reiserfs (rw,nosuid,nodev,noatime)

proc on /proc type proc (rw)

sysfs on /sys type sysfs (rw)

udev on /dev type tmpfs (rw,nosuid)

devpts on /dev/pts type devpts (rw)

/dev/hda3 on /mnt/readyroom type reiserfs (rw,nodev,noatime)

/dev/hda6 on /mnt/tenforward type reiserfs (rw,nosuid,nodev,noatime,usrquota)

/mnt/readyroom/usr on /usr type none (rw,bind)

/mnt/readyroom/var on /var type none (rw,bind)

/mnt/readyroom/tmp on /tmp type none (rw,bind)

/mnt/tenforward/home on /home type none (rw,bind)

/mnt/tenforward/mysql on /var/lib/mysql type none (rw,bind)

/dev/hdb1 on /mnt/sickbay type ext3 (rw,noatime,usrquota)

/usr/portage on /mnt/sickbay/usr/portage type none (rw,bind)

/dev on /mnt/sickbay/dev type none (rw,bind)

/sys on /mnt/sickbay/sys type none (rw,bind)

shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)

usbfs on /proc/bus/usb type usbfs (rw,devmode=0664,devgid=85)

none on /mnt/sickbay/proc type proc (rw,noexec,nosuid,nodev,noatime)
```

The services I'm running inside the chroot are: metalog (0.7), sshd (OpenSSH_3.9p1, port changed to avoid conflict), apache2 (2.0.50, port changed), mysql (4.0.20, port changed), postfix (2.1.5-r2), atalk (for afpd 1.6.4).

-:sigma.SB

*If it were up to me, the Gentoo box would be the only router, which would save me an awful lot of configuration troubles and random failures...

Edit: It appears to be a case of a badly mistimed change in my ISP's policy regarding incoming TCP packets. ISP switch time.Last edited by Solra Bizna on Fri Jan 06, 2006 12:43 am; edited 2 times in total

----------

## NeddySeagoon

Solra Bizna,

Do

```
 ifconfig -a
```

I bet you have loads of errors on one of your interfaces, so things are being resent.

----------

## jamapii

Maybe there's some kind of loop in the network?

Can you make an ascii art to make it clearer?

----------

## Solra Bizna

Okay, now that I've given my ISP a talking-to, they're not blocking incoming traffic, and everything's working. However, I can't rest, as there's still the funky delay on getting iptables/route stuff, and about 1/3 of the HTTP connections from inside the LAN to outside get dropped partway through.  :Confused: 

 *NeddySeagoon wrote:*   

> Solra Bizna,
> 
> Do
> 
> ```
> ...

 

```
sbizna ~$ sudo ifconfig -a

dummy0    Link encap:Ethernet  HWaddr 2A:0A:FE:FA:E8:E2  

          BROADCAST NOARP  MTU:1500  Metric:1

          RX packets:0 errors:0 dropped:0 overruns:0 frame:0

          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:0 

          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

eth0      Link encap:Ethernet  HWaddr 00:02:A5:F2:7A:A2  

          inet addr:192.168.0.254  Bcast:192.168.0.255  Mask:255.255.252.0

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:5257 errors:0 dropped:0 overruns:0 frame:0

          TX packets:4859 errors:0 dropped:0 overruns:0 carrier:0

          collisions:6 txqueuelen:1000 

          RX bytes:604810 (590.6 Kb)  TX bytes:1311211 (1.2 Mb)

lo        Link encap:Local Loopback  

          inet addr:127.0.0.1  Mask:255.0.0.0

          UP LOOPBACK RUNNING  MTU:16436  Metric:1

          RX packets:848 errors:0 dropped:0 overruns:0 frame:0

          TX packets:848 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:0 

          RX bytes:178576 (174.3 Kb)  TX bytes:178576 (174.3 Kb)
```

 *jamapii wrote:*   

> Maybe there's some kind of loop in the network? 
> 
> Can you make an ascii art to make it clearer?

 Here's my attempt:

```
WAN <--> modem <--> Primary Router <--> Ground Floor Wired Machines

                        #   #

                        #   ########### Ground Floor Wireless Machines

                        #

                  Secondary Router <--> Second Floor Wired Machines

                        #           \-> Geordi (tertiary router)

                        #

                        ############### Second Floor Wireless Machines

### = wireless connection

<-> = wired connection
```

That's only the physical routes, though. For example, if Arkady (my personal laptop) is connected to wireless on the second floor, and makes a connection to outside, the packets are routed like follows: Arkady ## Secondary -> Geordi -> Secondary ## Primary -> WAN. Packets within the LAN simply follow the shortest physical route.

I know it's not terribly efficient, but it suffices. And the logistics of the situation seem to demand more use of wireless than would otherwise be prudent...if only we had another NIC and a few more long Ethernet cables :/

-:sigma.SB

----------

## think4urs11

 *Solra Bizna wrote:*   

> inet addr:192.168.0.254  Bcast:192.168.0.255  Mask:255.255.252.0

 

Maybe thats not the real problem with your setup but these settings are inconsistent.

bcast and mask do not match. As you are using a /22 the bcast should be 192.168.3.255, shouldn't it?

----------

## Solra Bizna

Indeed, they are inconsistent.

```
laforge ~ # grep eth0 /etc/conf.d/net

config_eth0=( "192.168.0.254 netmask 255.255.252.0" )

routes_eth0=( "default gw 192.168.0.1" )
```

Fortunately, no machine is actually outside of 192.168.0.0/24 at the moment, so this hasn't caused any problems.

Speaking of problems, the weird network delays and cutoffs I've been experiencing have magically disappeared.  :Confused: 

-:sigma.SB

----------

## sasq

 *NeddySeagoon wrote:*   

> Do
> 
> ```
>  ifconfig -a
> ```
> ...

 

I have errors on my eth0 iface and HTTP requests are sending extremely slow from my LAN PCs. On the router everything is OK, net works at the speed of light  :Smile: 

My router has two ifaces: eth0 for LAN, and ppp0 for Internet [ADSL modem].

Here's my output from ifconfig -a:

```

eth0      Link encap:Ethernet  HWaddr 00:0E:2E:33:93:F8

          inet addr:10.0.0.1  Bcast:10.255.255.255  Mask:255.255.0.0

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:239233 errors:35269 dropped:0 overruns:0 frame:0

          TX packets:276742 errors:0 dropped:0 overruns:8 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:25773042 (24.5 Mb)  TX bytes:199567085 (190.3 Mb)

          Interrupt:11 Base address:0xe400

          ....

ppp0      Link encap:Point-to-Point Protocol

          inet addr:83.30.221.45  P-t-P:213.25.2.188  Mask:255.255.255.255

          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1492  Metric:1

          RX packets:281908 errors:0 dropped:0 overruns:0 frame:0

          TX packets:240655 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:3

          RX bytes:201954649 (192.5 Mb)  TX bytes:21064352 (20.0 Mb)

```

QUESTION IS:

How to handle with this errors on eth0? Has it something to do with MTU?

----------

## NeddySeagoon

sasq,

Its either hardware (the card or cable) or the system cannot handle the data rate on the interface and data is being lost.

Rarely, its a bug in the kernel module but these normally stop all data, so they don't get released.

Try substituting hardware, is you have some spare.

----------

## sasq

No, it's not hardware problem, I'm sure.

Why?

Because on that same hardware I have had NND Linux router and everything was OK. Since I've installed Gentoo on the same box, all my LAN PCs cannot upload anything and HTTP requests are sending very slow or never  :Razz:  I remember it is the second time already, when I can't use Gentoo as a router/masquerade and the problem is the same  :Sad: 

I think there may be a problem with MTU, because my ISP [Polish Telecommunication  :Razz: ] sets very strange MTUs on his ADSL modem interface - 1492 instead of standard 1500. Maybe this is a problem - packets incoming through eth0 iface [from LAN] are sized 1500, and cannot be forwarded through ppp0 which have set 1492. The other way is OK, because smaller packets from ppp0 can be send through eth0. Connections established from router are OK and everything goes fast.

But i don't know :J You are the masters  :Smile: 

----------

## NeddySeagoon

sasq,

Its not MTU - if your MTU is really as you say, the 1500 byte packets will be split into two.

This males it slow because the overhead is doubled and the work involved in the splitting.

It will still work without errors.

----------

## sasq

Allright, maybe it's not MTU, but my problem is still there  :Sad: 

Why packets from LAN to internet goes slow or none? Where are that errors from? It's so weird that I either don't know where to search the cause  :Confused: 

----------

## NeddySeagoon

sasq,

Try swapping the network cable or network card.

----------

## sasq

OK, I'll try, but as I mentioned above, all the hardware lasts the same, nothing was changed. The only change is replacing NND Linux by Gentoo Linux.

----------

## PaulBredbury

Check the DNS, regarding the B.ROOT-SERVERS.NET change in Jan-2006.

----------

