# arp problem

## Mike81

Hi,

well, previously I thought it was just a Gentoo problem... but now I am able to reproduce it on Debian and Gentoo (it is not so easy to reproduce: Empty arp table is not enough. I also have to wait ~10min without any traffic to the gateway). I'll update the thread: 

We are experience the following problem in our network:

The systems haven't sent packets to the router for a while (I am logged in via SSH to these systems via LAN).

When you now view the arp table, it will look like:

```
gentoo-test ~ # arp -a -n

? (192.168.5.10) at 8c:89:a5:XX:XX:XX [ether] on eth0
```

192.168.5.10 is a Windows system, connected via SSH.

When you ping google.com for example, it will take some time (and fail at the beginning):

```
gentoo-test ~ # ping google.com

PING google.com (173.194.35.128) 56(84) bytes of data.

From gentoo-test.intern (192.168.5.147): icmp_seq=1 Destination Host Unreachable

From gentoo-test.intern (192.168.5.147): icmp_seq=2 Destination Host Unreachable

From gentoo-test.intern (192.168.5.147): icmp_seq=3 Destination Host Unreachable

From gentoo-test.intern (192.168.5.147): icmp_seq=4 Destination Host Unreachable

From gentoo-test.intern (192.168.5.147): icmp_seq=5 Destination Host Unreachable

From gentoo-test.intern (192.168.5.147): icmp_seq=6 Destination Host Unreachable

From gentoo-test.intern (192.168.5.147): icmp_seq=7 Destination Host Unreachable

From gentoo-test.intern (192.168.5.147): icmp_seq=8 Destination Host Unreachable

From gentoo-test.intern (192.168.5.147): icmp_seq=9 Destination Host Unreachable

64 bytes from muc03s01-in-f0.1e100.net (173.194.35.128): icmp_seq=10 ttl=58 time=16.5 ms

64 bytes from muc03s01-in-f0.1e100.net (173.194.35.128): icmp_seq=11 ttl=58 time=15.8 ms

64 bytes from muc03s01-in-f0.1e100.net (173.194.35.128): icmp_seq=12 ttl=58 time=16.1 ms

^C

--- google.com ping statistics ---

12 packets transmitted, 3 received, +9 errors, 75% packet loss, time 11128ms

rtt min/avg/max/mdev = 15.849/16.167/16.525/0.313 ms, pipe 4
```

While pinging, a tcpdump on gentoo-test will capture

```
12:55:28.710420 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28

12:55:29.711018 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28

12:55:30.709549 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28

12:55:31.739028 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28

12:55:32.737369 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28

12:55:33.751372 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28

12:55:34.781021 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28

12:55:35.779318 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28

12:55:36.779004 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28

12:55:37.822902 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28

12:55:38.821268 00:0c:29:XX:XX:XX > ff:ff:ff:ff:ff:ff, ARP, length 42: Request who-has 192.168.5.254 tell 192.168.5.147, length 28

12:55:38.821894 bc:05:43:XX:XX:XX > 00:0c:29:XX:XX:XX, ARP, length 60: Reply 192.168.5.254 is-at bc:05:43:XX:XX:XX, length 46
```

We see this on almost every system in our network (well, it is hard to make sure that there is not traffic for while). When the arp entry for the router timed out, it will take some time to get it back.

So for example you can be sure, that the first connection attempt to any server on the internet will fail with an error like "no route to host". The second and any further attempt will succeed until the arp entry will be removed (because of a normal timeout, the result of inactivity) again.

So my question is:

What could be the reason for that?

On most systems we never noticed the problem, because there is "always" traffic between the system and the router, so the arp entry wouldn't timeout. But when we can make sure that there is not traffic for ~10-15mins, the next attempt will take some time. We are currently trying to reproduce it on some Windows systems, too.

What could it be?

Switch?

Router?

What could we do to prevent that? Setting a static ARP entry? Ping the gateway to prevent a timeout? That sounds like a bad hack...

Could somebody confirm, that this isn't normal? That a idle system should get the arp entry within ms?

----------

## AngelKnight

What is 192.168.5.254?Default gateway or otherwise an entry in the forwarding table of an affected host?Is it directly attached Layer-2-wise to the hosts that exhibit the problem?Is it necessary to switch frames through a bridge for the affected hosts to reach this address?Has the holder of this IP address any reason to be tardy about responding to ARP requests?

Lots of questions, I know.  I don't mean to badger you as opposed to get all of the first-line obvious ones out up front.  Ideally this helps you think through the problem. It may  (secondarily) give someone reading this thread enough information to solve it.

----------

