# SOLVED - nf_conntrack: table full, dropping packet.

## Sum1

Last 36 hours are a blur --

Users and myself start noticing steadily decreasing speed and responsiveness to surfing the net.

I do some pinging on our WAN-facing NIC and see dropped packets at a rate of 8% up to 15%.

Call my ISP -- A.T.&T. business class dsl.

They agree the situation is really bad, they see all the dropped packets too, and come to the conlusion it must be our old dsl modem.

They send a tech. out and give us a new modem and move our dsl phone line to another spot on the remote terminal.

After the tech. leaves I test the modem by connecting it to 1 computer directly attached to the modem.

The Down speed is fine -- about 4 -5 Mbps and the Up speed is sweet too - about 620 kbps.

I think it's solved.  So I reconnect my Gentoo router with 3 Nics, 1 WAN-facing NIC, and 2 LAN sub-net NICS.

And to my dread, I find the Down speed is solid, but Crud, the Up speed is getting like 30 to 60 kbps, now mind you, that's not 30-60 K(B)ps, no no, I mean 30-60 k(b)ps.  The rated up-speed is 512kbps - 768kbps.

The only evidence I can find of a problem are the pages and pages of these kinds of errors in my /var/log/messages --

Nov 30 15:51:39 superaguri nf_conntrack: table full, dropping packet.

Nov 30 15:51:44 superaguri printk: 226 messages suppressed.

Nov 30 15:51:44 superaguri nf_conntrack: table full, dropping packet.

Nov 30 15:51:49 superaguri printk: 223 messages suppressed.

Nov 30 15:51:49 superaguri nf_conntrack: table full, dropping packet.

Nov 30 15:51:57 superaguri printk: 239 messages suppressed.

Nov 30 15:51:57 superaguri nf_conntrack: table full, dropping packet.

Nov 30 15:51:59 superaguri printk: 143 messages suppressed.

Nov 30 15:51:59 superaguri nf_conntrack: table full, dropping packet.

Nov 30 15:52:04 superaguri printk: 204 messages suppressed.

Nov 30 15:52:04 superaguri nf_conntrack: table full, dropping packet.

Nov 30 15:52:09 superaguri printk: 319 messages suppressed.

Nov 30 15:52:09 superaguri nf_conntrack: table full, dropping packet.

Nov 30 15:52:16 superaguri printk: 319 messages suppressed.

Nov 30 15:52:16 superaguri nf_conntrack: table full, dropping packet.

Nov 30 15:52:20 superaguri printk: 191 messages suppressed.

Nov 30 15:52:20 superaguri nf_conntrack: table full, dropping packet.

Nov 30 15:52:32 superaguri printk: 280 messages suppressed.

Nov 30 15:52:32 superaguri nf_conntrack: table full, dropping packet.

Nov 30 15:52:32 superaguri nf_conntrack: table full, dropping packet.

Nov 30 15:52:34 superaguri printk: 171 messages suppressed.

Nov 30 15:52:34 superaguri nf_conntrack: table full, dropping packet.

Nov 30 15:52:39 superaguri printk: 235 messages suppressed.

Nov 30 15:52:39 superaguri nf_conntrack: table full, dropping packet.

Nov 30 15:52:44 superaguri printk: 320 messages suppressed.

Nov 30 15:52:44 superaguri nf_conntrack: table full, dropping packet.

Nov 30 15:52:51 superaguri printk: 205 messages suppressed.

Nov 30 15:52:51 superaguri nf_conntrack: table full, dropping packet.

Superaguri is the host name of my routerbox.

Please help point me in the right direction.

I've been googling nf_conntrack errors, but I can't figure out what might apply in my situation.

All my iptables conntrack items are built as modules.

I'm using Gentoo Sources 2.6.22-r5.

Performance was great and fine up until suddenly 48 hours ago.

I didn't make any changes to my firewall -- it's a very basic setup.

Both Subnets on the LAN get the same bad up-speed performance.

It would seem if any of the NIC hardware was going bad, the performance would be equally erratic down and up, but it's not the case.

So it takes forever to request pages from the net due to low low upspeed, but once the request finally hits the net, then web sites and files download fine.

Thank you for your patience and reading.Last edited by Sum1 on Tue Dec 04, 2007 3:33 pm; edited 1 time in total

----------

## Hu

A brief reading of the kernel source suggests that this error indicates that the kernel has run out of slots to track connections.  A NAT device must track every connection it is translating, so that all packets associated with a given virtual circuit get mangled the same way.  If I am right, it appears that your system has suddenly begun tracking so many connections that it cannot keep up.  Have you recently introduced any new software on the LAN?  If you capture traffic on the internal interface, do you see a volume of traffic consistent with the number of active systems?  What is the output of wc /proc/net/ip_conntrack?

----------

## Sum1

Hu, thank you so much for the thoughtful reply.

I will probably go to work again on Sunday to research this and will report back with what I find.

Best regards.

----------

## Sum1

Results:

superaguri vvt # wc /proc/net/ip_conntrack

15325 291162 3173369 /proc/net/ip_conntrack

I cannot think of any new software on the LAN and I don't believe the firewall is set up to do any specific connection tracking.

I will try to do some device listening on the WAN-facing NIC to see if any patterns reveal themselves.

----------

## Sum1

Uggh, I'm very worried.

It seems like we are constantly getting scanned and probed.

I see the same addresses popping up over a period of days --

ABTS-NCR-Dynamic-133.170.163.122.airtelbroadband.in

wca10.libero.it.61514

45.136.217.87.dynamic.jazztel.es.58513

p54BF6CAE.dip.t-dialin.net

host-41.234.3.181.tedata.net

And sometimes I see incoming pings and requests racing through a series of port numbers on our WAN-facing NIC.

Please help if you can.  I've never confronted something like this on my gentoo routerbox before.

Thx.

11:25:41.698177 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > 190.40.240.206: ICMP echo request, id 512, seq 1280, length 40

11:25:41.698182 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.27718 > c-98-200-47-162.hsd1.tx.comcast.net.3076: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:25:41.698332 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > ABTS-NCR-Dynamic-133.170.163.122.airtelbroadband.in: ICMP echo request, id 512, seq 1280, length 40

11:25:41.698337 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.27718 > 190.43.189.125.8883: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:25:41.698407 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.27718 > 190.40.240.206.52247: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:25:41.698411 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.27718 > wca10.libero.it.61514: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:25:41.698415 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.27718 > 45.136.217.87.dynamic.jazztel.es.58513: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:25:41.698432 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.27718 > 190.43.93.129.54396: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:25:41.698448 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > wca10.libero.it: ICMP echo request, id 512, seq 1280, length 40

11:25:41.698452 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.27718 > 77.108.88.202.2472: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:25:41.698469 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > p54BF6CAE.dip.t-dialin.net: ICMP echo request, id 512, seq 1280, length 40

11:25:41.698492 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > 45.136.217.87.dynamic.jazztel.es: ICMP echo request, id 512, seq 1280, length 40

11:25:41.698496 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.27718 > p54BF6CAE.dip.t-dialin.net.22253: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:25:41.698500 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > host-41.234.3.181.tedata.net: ICMP echo request, id 512, seq 1280, length 40

11:25:41.698516 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.27718 > 190.43.185.78.18754: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:25:41.698533 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > 190.43.93.129: ICMP echo request, id 512, seq 1280, length 40

11:25:41.698537 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.27718 > 168.187.181.2.25179: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:25:41.698785 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > 77.108.88.202: ICMP echo request, id 512, seq 1280, length 40

11:27:00.517424 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > c-98-200-47-162.hsd1.tx.comcast.net: ICMP echo request, id 512, seq 1280, length 40

11:27:00.517523 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > 168.187.181.2: ICMP echo request, id 512, seq 1280, length 40

11:27:00.517778 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > 190.40.240.206: ICMP echo request, id 512, seq 1280, length 40

11:27:00.517931 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.41010 > c-98-200-47-162.hsd1.tx.comcast.net.3076: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:27:00.517936 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > ABTS-NCR-Dynamic-133.170.163.122.airtelbroadband.in: ICMP echo request, id 512, seq 1280, length 40

11:27:00.518008 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.41010 > 190.43.189.125.8883: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:27:00.518012 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.41010 > 190.40.240.206.52247: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:27:00.518016 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.41010 > wca10.libero.it.61514: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:27:00.518033 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.41010 > 45.136.217.87.dynamic.jazztel.es.58513: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:27:00.518050 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.41010 > 190.43.93.129.54396: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:27:00.518054 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > wca10.libero.it: ICMP echo request, id 512, seq 1280, length 40

11:27:00.518071 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.41010 > 77.108.88.202.2472: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:27:00.518087 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > p54BF6CAE.dip.t-dialin.net: ICMP echo request, id 512, seq 1280, length 40

11:27:00.518096 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > 45.136.217.87.dynamic.jazztel.es: ICMP echo request, id 512, seq 1280, length 40

11:27:00.518115 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.41010 > p54BF6CAE.dip.t-dialin.net.22253: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:27:00.518119 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > host-41.234.3.181.tedata.net: ICMP echo request, id 512, seq 1280, length 40

11:27:00.518123 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.41010 > 190.43.185.78.18754: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:27:00.518142 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > 190.43.93.129: ICMP echo request, id 512, seq 1280, length 40

11:27:00.518158 PPPoE  [ses 0x127a] IP 134.adsl.snet.net.41010 > 168.187.181.2.25179: S 3217589126:3217589126(0) win 64512 <mss 1412,nop,nop,sackOK>

11:27:00.518162 PPPoE  [ses 0x127a] IP 134.adsl.snet.net > 77.108.88.202: ICMP echo request, id 512, seq 1280, length 40

----------

## Hu

 *Sum1 wrote:*   

> Results:
> 
> superaguri vvt # wc /proc/net/ip_conntrack
> 
> 15325 291162 3173369 /proc/net/ip_conntrack
> ...

 

Those numbers are far too high.  According to that, you have more than 15000 connections being tracked by a single NAT device.  The firewall must do connection tracking for NAT to work properly.  If it did not, responses from the Internet could not be sent to the correct client on the LAN.

I would prefer a numeric-only tcpdump, but the way I read that, you are the one doing the scanning.  Your system is 134.adsl.snet.net, correct?  This could indicate that a machine on your LAN has become infected and begun probing other hosts on the Internet.  That behavior could account for the large number of tracked connections, as well as the capture output that asserts that your network is initiating the traffic.

Regardless of whether you are being probed or are probing others, the answer is a set of iptables rules to DROP the offending traffic until the problem can be resolved properly.  If I am right that you have an infected machine on the LAN, you should see roughly equal volumes of traffic when you sniff the internal versus external interfaces.  In that case, identify the IP address of the infected machine and run iptables -t nat -I PREROUTING 1 -s infected-machine -j DROP to discard all traffic from that machine as soon as it enters the system.  This will not resolve your problem with filling up the connection tracking table.  To fix that, you need to use the NOTRACK target in the raw table.  Unfortunately, most people do not build the raw table because it is rarely needed.  If you have it, or if you are willing to build it, run iptables -t raw -I PREROUTING 1 -s infected-machine -j NOTRACK; iptables -t raw -I PREROUTING 2 -s infected-machine -j DROP.  This should mark the packet as untracked and then drop it.

If I am wrong about the problem being on the inside, then you need a set of firewall rules to discard the traffic from the external systems.  Start with allowing traffic for loopback connections and established connections, and dropping everything else.

----------

## Sum1

Hu,

Thanks for your response.

Your assessment is absolutely correct.

I'm embarrassed to admit, but under stress and multiple tasks, I was reading the tcpdump results completely backwards.

I identified the offending machine and as soon as I disconnected the CAT 5 cable, all network function went back to normal.

The machine was completely compromised, so much so that the offending software on the Windows client was invisible or strategically "mis-named" in the system process monitor.  Very tricky stuff.  I stayed 'til midnight and shredded the HD on the offending Win Client box and reinstalled.

I should have followed my instincts on this, as this was a new employee who came with her own computer and server that were added to our network.

By focusing mostly on the WAN-facing traffic, I missed seeing that the source of the traffic was inside.

Thanks again for your knowledgeable help.

I also say a big thank you to Lumato and Kale on the gentoo irc channel.  They were really helpful in troubleshooting through many configurations in /proc/sys/net/ipv4/netfilter/*.* 

Sum

----------

## Hu

Good to hear this is resolved.  Was the infected machine owned by the new employee?  You said you shredded the hard disk, which I would not expect to get away with if the machine was not owned by the company.  Under the circumstances, I think shredding the disk was exactly the right move.  Any infection advanced enough that it even attempts to hide should be assumed to be so invasive that you will never dig out all the pieces.  Nuking it from orbit was the only way to be sure.

If you have not already, it may be worth having a talk with this woman about basic computer security.  Her machine got infected once, so it could happen again.  A bit of preventive work now may save you from facing this situation again in the future.  If your office is big enough that you have a Windows domain, I suggest joining her system to the domain and using Group Policy to ensure that all the systems stay up to date on Windows patches.

Do not be embarrassed about misreading the output.  Anyone can make a mistake under pressure, and next time you will know what to look for.

----------

