# Routing Problem. Arp Flux? Flux-related? Or the opposite?

## Auz

I get to use Gentoo at work (Linux 2.6.39-gentoo-r3 #1 SMP Mon Aug 8 14:51:45 BST 2011), but since one of the suggested fixes for this problem I've got so far is "I can install Windows 7 instead..." I could do with a little outside help.

The network is set up as follows: a private network in the office with a checkpoint firewall to the outside world, plus a VPN to our colo. My problem is my box always picks the firewall to get to a certain machine at the colo.

The route is supposedly set correctly:

```
Kernel IP routing table

Destination     Gateway         Genmask         Flags Metric Ref    Use Iface

0.0.0.0         172.17.140.97   0.0.0.0         UG    203    0        0 eth1

127.0.0.0       127.0.0.1       255.0.0.0       UG    0      0        0 lo

169.254.0.0     0.0.0.0         255.255.0.0     U     204    0        0 vboxnet0

172.17.140.0    0.0.0.0         255.255.252.0   U     203    0        0 eth1
```

Pinging most machines at the colo works:

```
traceroute to 10.60.2.40 (10.60.2.40), 30 hops max, 60 byte packets

 1  172.17.140.96  0.235 ms  0.225 ms  0.219 ms

 2  10.99.2.2  2.104 ms  2.102 ms  2.096 ms

 3  10.60.2.40  7.234 ms  7.236 ms  7.230 ms
```

But one in particular goes the wrong way

```
traceroute to 10.60.2.58 (10.60.2.58), 30 hops max, 60 byte packets

 1  172.17.140.100  0.254 ms  0.325 ms  0.393 ms

 2  80.169.33.169  2.020 ms  2.018 ms  2.649 ms

 3  80.169.31.173  5.673 ms  6.039 ms  6.038 ms

 4  80.169.31.173  6.032 ms !H * *
```

The only thing I can find to suspect is Arp Flux, as .96 and .97 share the same MAC address

```
? (172.17.140.96) at 00:13:72:40:0f:36 [ether] on eth1

? (172.17.140.97) at 00:13:72:40:0f:36 [ether] on eth1
```

There are times when I can reach the machine in question... usually after a reboot and possibly once after I cleared the arp cache, but only briefly before the issue re-asserts itself.

Any help or direction to go on this would be appreciated...

----------

## NeddySeagoon

Auz,

```
The only thing I can find to suspect is Arp Flux, as .96 and .97 share the same MAC address 
```

Thats badly broken.  It is a requirement of basic networking that Mac addresses on any network must be unique.

That applies to the internet too.

That its not your problem is shown by 

```
traceroute to 10.60.2.58 (10.60.2.58), 30 hops max, 60 byte packets

 1  172.17.140.100  0.254 ms  0.325 ms  0.393 ms

 2  80.169.33.169  2.020 ms  2.018 ms  2.649 ms

 3  80.169.31.173  5.673 ms  6.039 ms  6.038 ms

 4  80.169.31.173  6.032 ms !H * *
```

The packet is set to the correct, 'next hop', now its out of your hands as you no longer influence the route.

----------

## Auz

Thanks for the quick reply...

 *Quote:*   

> Thats badly broken. It is a requirement of basic networking that Mac addresses on any network must be unique. 

 

I'd agree... there's apparently some rational they have for it, and nobody else is running into problems with it, so persuading them to change might be fun.

Meanwhile...

 *Quote:*   

> That its not your problem is shown by
> 
> ```
> traceroute to 10.60.2.58 (10.60.2.58), 30 hops max, 60 byte packets
> 
> ...

 

It's the first hop that's wrong. It should go to 172.17.140.96. Then if the target is outside, hit .100, eg for bbc.co.uk

```
traceroute to bbc.co.uk (212.58.241.131), 30 hops max, 60 byte packets

 1  172.17.140.96  0.207 ms  0.199 ms  0.201 ms

 2  172.17.140.100  0.351 ms  0.434 ms  0.510 ms

 3  80.169.33.169  4.569 ms  4.581 ms  4.590 ms

 4  80.169.31.173  6.568 ms  6.576 ms  8.289 ms

 5  195.66.224.103  10.530 ms  10.529 ms  10.541 ms

 6  212.58.238.129  10.556 ms  10.488 ms  10.528 ms

 7  212.58.241.131  10.482 ms  10.415 ms  10.356 ms
```

And when things are working (as now, after a reboot)

```
traceroute to 10.60.2.58 (10.60.2.58), 30 hops max, 60 byte packets

 1  172.17.140.96  0.185 ms  0.188 ms  0.185 ms

 2  10.99.2.2  1.986 ms * *

 3  10.60.2.58  2.082 ms  2.186 ms  2.174 ms
```

----------

## NeddySeagoon

Auz,

I should have read your post more carefully. Sorry about that.

```
Kernel IP routing table

Destination     Gateway         Genmask         Flags Metric Ref    Use Iface

0.0.0.0         172.17.140.97   0.0.0.0         UG    203    0        0 eth1

127.0.0.0       127.0.0.1       255.0.0.0       UG    0      0        0 lo

169.254.0.0     0.0.0.0         255.255.0.0     U     204    0        0 vboxnet0

172.17.140.0    0.0.0.0         255.255.252.0   U     203    0        0 eth1
```

```
traceroute to 10.60.2.58 (10.60.2.58), 30 hops max, 60 byte packets

 1  172.17.140.100
```

How does 172.17.140.100 get to be a next hop to anywhere for you?

Its not not one of the gateways listed in your routing table.

For 172.17.140.0/22 you don't need a gateway

For 127.0.0.0/8 the gateway is 127.0.0.1 (localhost)

For 169.254.0.0/16 you don't use a gateway

Everything else goes to 172.17.140.97 as your next hop.  172.17.140.100 is not mentioned.

I wonder how that gets into your routing table, or gets used as a next hop if its not there ?

----------

## Auz

 *Quote:*   

> I wonder how that gets into your routing table, or gets used as a next hop if its not there ?

 

I don't know... any suggestions as to where to look? I've run tcpdump, but I'm not sure what to look for.

----------

## NeddySeagoon

Auz,

You could try a dirty hack.

Your system should never communicate with 172.17.140.100 directly

Add a host route to direct traffic for 172.17.140.100 to 172.17.140.97.

Its a bit odd that he two next hops towards the internet are in the same subnet and in the subnet you are on, as is shown by

```
traceroute to bbc.co.uk (212.58.241.131), 30 hops max, 60 byte packets

 1  172.17.140.96  0.207 ms  0.199 ms  0.201 ms

 2  172.17.140.100  0.351 ms  0.434 ms  0.510 ms 
```

172.17.140.96 is your odd gateway that shares a MAC address with 172.17.140.97 which is in your routing table.

172.17.140.100 is a gateway to the outside world.

Better yet might be a static route to 10.0.0.0/8, via 172.17.140.96 or whatever netmask you need for your application

----------

