# Intermittent iptables error

## dclark13

Hi,

I use pound (http://www.apsis.ch/pound/index_html) to load balance web traffic among several different back-end web servers.  Recently I've noticed some issues in the pound logs where a connection to the back-end server timed out.  After looking into it a bit, it seems that once in a while (say ~200 times per day out of 2,000,000+ requests) iptables on the back-end server blocks a packet from the pound server.  I say this because I can see entries like the following in the logs on the back-end web servers:

```

Nov  4 03:10:11 wwwut3 [1280662.309643] RULE 5 -- DENY IN=eth0 OUT= SRC=166.70.134.196 DST=166.70.134.194 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=65011 DF PROTO=TCP SPT=33942 DPT=80 WINDOW=5840 RES=0x00 SYN URGP=0

Nov  4 03:10:14 wwwut3 [1280665.307411] RULE 5 -- DENY IN=eth0 OUT= SRC=166.70.134.196 DST=166.70.134.194 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=65012 DF PROTO=TCP SPT=33942 DPT=80 WINDOW=5840 RES=0x00 SYN URGP=0

Nov  4 03:10:20 wwwut3 [1280671.307415] RULE 5 -- DENY IN=eth0 OUT= SRC=166.70.134.196 DST=166.70.134.194 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=65013 DF PROTO=TCP SPT=33942 DPT=80 WINDOW=5840 RES=0x00 SYN URGP=0

Nov  4 03:10:32 wwwut3 [1280683.307406] RULE 5 -- DENY IN=eth0 OUT= SRC=166.70.134.196 DST=166.70.134.194 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=65014 DF PROTO=TCP SPT=33942 DPT=80 WINDOW=5840 RES=0x00 SYN URGP=0

Nov  4 03:10:56 wwwut3 [1280707.307406] RULE 5 -- DENY IN=eth0 OUT= SRC=166.70.134.196 DST=166.70.134.194 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=65015 DF PROTO=TCP SPT=33942 DPT=80 WINDOW=5840 RES=0x00 SYN URGP=0

Nov  4 03:11:44 wwwut3 [1280755.307410] RULE 5 -- DENY IN=eth0 OUT= SRC=166.70.134.196 DST=166.70.134.194 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=65016 DF PROTO=TCP SPT=33942 DPT=80 WINDOW=5840 RES=0x00 SYN URGP=0

```

which, to my reading, is blocking traffic that would normally be allowed by the firewall:

```

iptables --list --numeric

Chain INPUT (policy DROP)

target     prot opt source               destination

ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED

...

Cid43519296.0  tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:80 state NEW

...

RULE_5     all  --  0.0.0.0/0            0.0.0.0/0

...

Chain Cid43519296.0 (1 references)

target     prot opt source               destination

ACCEPT     all  --  166.70.134.196       0.0.0.0/0

ACCEPT     all  --  166.70.134.197       0.0.0.0/0

...

Chain RULE_5 (3 references)

target     prot opt source               destination

LOG        all  --  0.0.0.0/0            0.0.0.0/0           LOG flags 0 level 6 prefix `RULE 5 -- DENY '

DROP       all  --  0.0.0.0/0            0.0.0.0/0

```

I've swapped out the machine running pound on the front-side to no effect, and the issue occurs intermittently on all of the back-end web servers.

Does anyone have any suggestions as to what to look for to help in troubleshooting this?  Any help would be much appreciated.

----------

## Hu

Based on the little bit of iptables you have provided, it would appear that sometimes the SYN packet has state INVALID, causing it not to match any of your stateful rules, and thereby allowing it to fall through to the deny rule.

----------

## dclark13

Thanks, Hu.  Okay, that makes sense, then - so iptables is not behaving improperly.  What would cause the SYN packet state to be INVALID though?  If I understand correctly, it would be the sender that would be at fault there, so the front-end load balancer is sometimes sending bad packets?  It's not hardware related, since I swapped out the machine there and still experienced the same issues, so ... kernel?  I do have a 2.6.24 kernel on those particular machines (2.6.29 and 2.6.30 on the back-end web servers).

----------

## Hu

In this context, INVALID refers to the logical state of the packet, not to its specific contents.  There are other targets for matching packets which are not well-formed.  According to man iptables: *Quote:*   

> Possible states are INVALID meaning that the packet could not be identified for some reason which  includes  running out  of  memory  and  ICMP  errors which don't correspond to any known connection

 

Since you said this is intermittent, and you have a high enough traffic load that you decided to use a load balancer, I would guess that very high traffic levels cause your backend systems to run low on memory for connection tracking.

I have not examined the netfilter code to determine what constitutes a low memory condition.  Depending on the design of netfilter, it may reach a low memory condition while there exists free RAM in the system.  For example, netfilter might have a hard limit on the number of simultaneous conntrack records it retains.  If I had to guess, I would say that a SYN packet that arrives while there is insufficient memory to create a new conntrack for it might be classified as INVALID when the filter table processes it.

----------

## dclark13

There does seem to be a correlation with traffic levels - looking through the logs I can see a growth in errors throughout the day, peaking at our highest levels of traffic and then falling back off in late evening.  For instance, over a 7-day period there were twice as many errors at noon as there were at 6:00AM.

However, everything I read about max conntrack entries leads me to think that I would see a logged message like:

```

nf_conntrack: table full, dropping packet.

```

or something along those lines if there were too many simultaneous connections.  But I don't see anything like that.  Plus I've been in the command line on a web server when one of the hiccups occurred and was actively checking the values of net.netfilter.nf_conntrack_max and net.netfilter.nf_conntrack_count from sysctl, and the server was nowhere near the limit (nor is it even close to using most of it's RAM, in general).

Also, our web pages are pretty heavily database-oriented.  I'd think that if there was a problem with too many conntrack entries, that I'd see similar problems communicating with the database once in a while, but I never do.

I have to say my inclination is that the problem is with the load balancing software itself in some circumstances.  If you have any other thoughts or suggestions for what to examine though, I'd love to hear them.

Thanks for all the help!

----------

## Hu

Yes, I would also expect that the steps you describe should have revealed a problem if my prior theory was right.  Unfortunately, I have no further guesses based on the information available, nor any suggestions about areas to investigate further.

----------

## Bones McCracker

Have you checked to see if a max_conntrack overflow condition is occurring upstream of your web servers (for example, on the firewall/router or on the pound proxy machine)?

----------

## dclark13

BoneKracker,

Yeah, I believe  I did.  That is, there are no errors logged on the pound load balancers from nf_conntrack.  So far I've not heard of any other way to look for that particular problem.

----------

## luispa

Under this conditions I guess you have to capture traffic on both sides during peak hours (where you get most errors) and be able to isolate the problem, this would probably help on the troubleshooting. Regarding how to troubleshoot if it's a conntrack overfow, I'm not sure how to do it. 

Luis

----------

