# what causes tcp checksum errors [SOLVED]

## msutton

I am running iptables as primary firewall.

I have INPUT and FORWARD run a chain called Firewall always.

OUTPUT is set to always ACCEPT

Inside the Firewall chain I have

-A Firewall -m state --state RELATED,ESTABLISHED -j ACCEPT

and I have a NAT rule

-A POSTROUTING -s 192.168.1.0/255.255.255.0 -j MASQUERADE

now when I am browsing the web I get a whole bunch of DROP lines in my log like so.

Sep  9 16:21:20 [kernel] [IPTABLES DROP] : IN=eth1 OUT= MAC=<MAC> SRC=<SRC> DST=<MY EXT IP> LEN=1500 TOS=0x00 PREC=0x00 TTL=50 ID=12331 DF PROTO=TCP SPT=443 DPT=3837 WINDOW=17520 RES=0x00 ACK URGP=0

this is of course is an example of HTTPS protocol but it happens with HTTP and RSYNC and when I connect with gaim

Why is that rule not working all the time?

It isnt dropping all ESTABLISHED/RELATED or I would not be able to connect to anything becasue my iptables rule set is DROP based.

Any help would be appreciated.Last edited by msutton on Thu Sep 22, 2005 7:21 pm; edited 3 times in total

----------

## buzzin

think you need a rule for a 'new' state  .. e.g

```
 iptables -A block -m state --state NEW -i ! $EXTIF -j ACCEPT

```

imho I find it better to make a new chain for the states. below is an example script you could edit which works ok for me.

```

#!/bin/sh

#

echo "   enabling forwarding.."

echo "1" > /proc/sys/net/ipv4/ip_forward

#outbound

EXTIF="eth0"

#inbound

INTIF="eth1"

echo "   clearing any existing rules and setting default policy.."

iptables --flush

iptables -P INPUT ACCEPT

iptables -F INPUT

iptables -P OUTPUT ACCEPT

iptables -F OUTPUT

iptables -P FORWARD DROP

iptables -F FORWARD

iptables -t nat -F

## Create chain which blocks new connections, except if coming from inside.

 iptables -X block

 iptables -N block

 iptables -A block -m state --state INVALID -j DROP

 iptables -A block -m state --state ESTABLISHED,RELATED -j ACCEPT

 iptables -A block -m state --state NEW -i ! $EXTIF -j ACCEPT

 iptables -A block -j DROP

## Jump to that chain from INPUT and FORWARD chains.

 iptables -A INPUT -j block

 iptables -A FORWARD -j block

 iptables -A OUTPUT -j block

echo "   Enabling SNAT (MASQUERADE) functionality on $EXTIF"

iptables -t nat -A POSTROUTING -o $EXTIF -j MASQUERADE

```

----------

## msutton

I did as you said and I get the same thing

Just seems odd that it is hit and miss like that

do you know of anything else I could check?

----------

## buzzin

Did you try the above script?

Make sure you are identifying the states via the interface (-i) the traffic is seen on and not the source ip (-s) flag as ips can be faked.

----------

## msutton

ok I loaded the above script

now it says those packets are INVALID in the log, I added logging before the rules.

so why would those packets be invalid if I am browsing that site?

Seems like iptables is not tracking connections correctly.

Sep 10 13:00:56 [kernel] [IPTABLES INVALID] : IN=eth1 OUT= MAC=<MAC> SRC=<SRC IP> DST=<MY IP> LEN=1500 TOS=0x00 PREC=0x00 TTL=50 ID=26190 DF PROTO=TCP SPT=873 DPT=33069 WINDOW=57920 RES=0x00 ACK URGP=0 OPT (0101080A0F5AAE8503F720C4) 

and this is logged when I initiate a rsync with an rsync server iptables says the returned packets are INVALID.

----------

## buzzin

Strange, not sure whats up.

what kernel are you using? Also can u post the output of iptables --list -v pls

Maybe try another kernel and then  re-emerge iptables?

----------

## msutton

kernel=Linux gentoo 2.6.12-gentoo-r10

iptables -L

# iptables -L

Chain INPUT (policy ACCEPT)

target     prot opt source               destination         

Matt       all  --  anywhere             anywhere            

Chain FORWARD (policy ACCEPT)

target     prot opt source               destination         

Matt       all  --  anywhere             anywhere            

Chain OUTPUT (policy ACCEPT)

target     prot opt source               destination         

Chain Matt (2 references)

target     prot opt source               destination         

ACCEPT     all  --  anywhere             anywhere            

ACCEPT     all  --  anywhere             anywhere            

LOG        all  --  anywhere             anywhere            state INVALID LOG level warning tcp-options ip-options prefix `[IPTABLES INVALID] : ' 

REJECT     all  --  anywhere             anywhere            state INVALID reject-with icmp-port-unreachable 

ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED 

ACCEPT     all  --  anywhere             anywhere            state NEW 

ACCEPT     all  --  <Friends Static IP>  anywhere            

ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:3390 

ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:smtp 

LOG        all  --  anywhere             anywhere            LOG level warning tcp-options ip-options prefix `[IPTABLES DROP] : ' 

REJECT     all  --  anywhere             anywhere            reject-with icmp-port-unreachable 

when I download a file I noticed it logs Dropped 3-4 times every second

----------

## msutton

I downgraded to 2.6.11-r11

re-emerged iptables

and still get the same behavior.

I am at a loss.

It does not happen on another gentoo box I have access to.

is there any setting in /proc I could check for this problem??

----------

## msutton

Could it be becasue the packets are recieved out of order?

congestion at a switch?

I added that log line to my other 2 gentoo firewalls that are on the same T1 with different IP's

and they are logging INVALId packets too for packets that should be considered established

----------

## CriminalMastermind

hmm... interesting

i wonder what would happen if you used SNAT insted of MASQUERADE.

changing... *msutton wrote:*   

> -A POSTROUTING -s 192.168.1.0/255.255.255.0 -j MASQUERADE

 

from your first post too...

```
-A POSTROUTING -s 192.168.1.0/255.255.255.0 -i $YOUR_INTERNAL_INTERFACE -j SNAT --to $YOUR_EXTERNAL_IP
```

may want to give that a shot and see what happens, i'm not sure if anything will change, but it's a shot.

hope that helps

----------

## msutton

Changed from MASQ to SNAT and I still get the same behavior.

=(

I cannot figure out what is the matter.

I run a gentoo firewall at home too and I added the INVALID logging and have yet to see any INVALID connections at home.

----------

## msutton

With further investigation using ethereal and tcpdump

Every time a packet is logged invalid the packet has a bad checksum.

What causes a bad checksum?

3 machines running firewalls on the T1 all receiving them and are all Athlon XP 2800+ with gig of ram and have 3com 3c905c nics in them.

Could it be the T1 dropping packets thus making the checksum invalid??

I would not think it would be slow or faulty nic since it happens on all 3.

Any insight would be helpful

----------

## CriminalMastermind

 *msutton wrote:*   

> With further investigation using ethereal and tcpdump
> 
> Every time a packet is logged invalid the packet has a bad checksum. 

 

wow.  sounds like you've done some homework.  it's nice to see people put a good amount of effort into there problems.

 *msutton wrote:*   

> What causes a bad checksum?

 

corrupt data?

are these drops on your external/internet interface?

are you sure when they arrive on your external interface they have a bad checksum?

from what i remember, ip has a checksum, and tcp also has a checksum... which one is failing?

one thing you could try if you are sure you are the checksum is bad when it the packet get to you is rebooting anything between you and the router where the bad packets are arriving.  (ie if there is a cable modem, switch, hub)  and if that doesn't work and you can, try rebooting the router.

i'm just guessing at where i think the problem could be.

hope something there helps.

----------

## msutton

The drops are on the external interface.

The T1 uplink was on a 5 port linksys switch and I moved it to a Cisco Switch on its own VLAN and still have the problems ruling out the switch.

The only thing after reading an enormous amount of info on the net is that the T1 router is corrupting the packet header when it sends it to me or the uplink cat 5 is bad.

I still need to replace the uplink cat 5 from the router to the switch.  It is just a pain cause of drop down celings and having to move ceiling tiles.  Really need to do it after hours so I can just string it on the ground and then test before getting the ladder out.

And the reason I believe it could be the T1 router, after reading the net this is what I understand, is that the router should do a tcp checksum by itself and if the packet is corrupt then it should not accept it and should ask for a retransmit.  And since it is not asking for a retransmit and it passing it on and when it passes it on it adds its own IP header it could be corrupting.

First I will cycle the router then I will re-wire from the router to the switch and then switch to the firewall and then change the nic.

I will do this during the weekend.  And if this doesnt work my next plan is to call the telco to check their router.

Reading on the net it seems that TCP checksum errors going all the way to the destination is very very rare because if the packet is corrupt it should not be passed on.

But in tcpdump it says protocol is TCP and it has packet info with packet ID and checksum failed.

And when I open the iptables log and look at the packet ID for the INVALID entry they match in the log and tcpdump.

How would I check for the IP checksum??

----------

## buzzin

Wow, sorry to see this is really turning into a epic headache for you.

ethereal should let you inspect those checksums

----------

## CriminalMastermind

 *msutton wrote:*   

> The T1 uplink was on a 5 port linksys switch and I moved it to a Cisco Switch on its own VLAN and still have the problems ruling out the switch.

 

makes sence to me.

 *msutton wrote:*   

> The only thing after reading an enormous amount of info on the net is that the T1 router is corrupting the packet header when it sends it to me

 

ya,  that is why i suggested a reboot if possible.

 *msutton wrote:*   

> or the uplink cat 5 is bad.

 

i'm not sure.  i wouldn't think a bad cable would behave like this.  most bad cables i've seen ether don't work, or if you wiggle by the connector it starts and stops working.  i guess i could see the shielding on the cable having a rip somewhere and noise being introduced, or some equipment the cable is run past is really misbehaving and radiating lots of noise, but i've never seen that.  i seem to remember ethernet having a pretty high tolerance for noise, and think this is pretty unlikly.  i'm not too sure on any of this though.  i'd make a tin-foil helmet and ware it before approaching anything generating that much noise to interfere with ethernet.

another thing is ethernet cable should only be so long before it has to go into a hub or some other device.  i don't remember the length, it's pretty far, but if you are running it a long way, you may want to google around for the maximum cable length and see if you are over it or close to it.  fyi, it is different for different speeds of ethernet.  gig-e is actually pretty short, again from what i remember.

if you wanted to be 100% sure everything with the cable i think i've played with a fancy cable checker that did a frequency sweep making sure the cable will handle everything ethernet will throw at it and that it's not too long.  i think it was kind of expensive, but i don't really know.  you could try to get your hands on one and check the cable while it's in place.

 *msutton wrote:*   

> And the reason I believe it could be the T1 router, after reading the net this is what I understand, is that the router should do a tcp checksum by itself and if the packet is corrupt then it should not accept it and should ask for a retransmit. And since it is not asking for a retransmit and it passing it on and when it passes it on it adds its own IP header it could be corrupting.

 

yep, rebooting the router before you get the packets is where my money is.  again, good to see you have done your homework, but i should point out a correction.  i'm pretty sure router's don't check the tcp checksum, they check the ip checksum.  i don't think routers know or care about tcp/udp at all, just ip.

 *buzzin wrote:*   

> ethereal should let you inspect those checksums

 

yes, i second that.  ethereal is your friend.  if you don't have X11 installed on any of these boxes, i know it is possible to use tcpdump with some flags to capture the packets and then you can transfer the log to a computer with ethereal and open them for viewing from there.  i don't remember how to do this, but you should be able to google for it.

also, what happens if you ping the router before you?  do you start getting packet loss?  getting back replies with bad ip checksums?

i'm not sue what that would prove, but sounds like a good thing to try.

it sure sounds like you are having fun.  hope something i've said helped.

----------

## mjensen42

Hey, are you sure the checksum failures are on the TCP level, not lower in the protocol stack?  The kinds of symptoms you describe are somewhat consistent with ethernet devices trying to communicate with mis-matched speed and/or duplex settings -- but that would cause a checksum error on the packet itself, not just the TCP payload.

----------

## msutton

OK I reset the router.

I still get invalids but now it does not complain about the checksum.

for example rsyncing (which generates invalids more than anything)

eth0 is internal

eth1 is external

this is iptables saying packet ID 46968 is INVALID

Sep 16 22:36:57 [kernel] [IPTABLES INVALID] : IN=eth1 OUT= MAC=<Mac> SRC=<RSYNC SERVER> DST=<MY EXT IP> LEN=1500 TOS=0x00 PREC=0x00 TTL=50 ID=46968 DF PROTO=TCP SPT=873 DPT=57082 SEQ=3987671666 ACK=169538902 WINDOW=58400 RES=0x00 ACK URGP=0

this is TCP dump for packet ID 46968

22:36:57.818118 IP (tos 0x0, ttl  50, id 46968, offset 0, flags [DF], length: 1500) <RSYNC SERVER>.rsync > <MY EXT IP>.57082: . 14144952:14146412(1460) ack 29488 win 58400

Can you see anything unusual by this or do you need more detail out of tcpdump??

If you need more detail please let me know what flags to use.

----------

## msutton

I moved the firewall into the Telco closet where the T1 comes in.

Made new cables and still getting the errors ruling out the cat5.

Now how can I set the network card duplex and speed manually from linux for a

 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 7 :Cool: 

With ethtool I get this

Settings for eth1:

        Supported ports: [ TP MII ]

        Supported link modes:   10baseT/Half 10baseT/Full 

                                100baseT/Half 100baseT/Full 

        Supports auto-negotiation: Yes

        Advertised link modes:  10baseT/Half 10baseT/Full 

                                100baseT/Half 100baseT/Full 

        Advertised auto-negotiation: Yes

        Speed: 100Mb/s

        Duplex: Full

        Port: MII

        PHYAD: 24

        Transceiver: internal

        Auto-negotiation: on

        Current message level: 0x00000001 (1)

        Link detected: yes

I want to set it to 10MB first then 10MB half duplex and see if these INVALID's go away.

----------

## msutton

ok I changed the speed and duplex.

And after each time I changed it I restarted the interface.

The link only works on 100mb full duplex nothing else.

----------

## msutton

Did some more reading

echo 255 > /proc/sys/net/ipv4/netfilter/ip_conntrack_log_invalid

now I get this in my log file which confirms my earlier findings

Sep 19 16:24:52 [kernel] ip_ct_tcp: bad TCP checksum IN= OUT= SRC=<SRC IP> DST=<My IP> LEN=1500 TOS=0x00 PREC=0x00 TTL=113 ID=6318 DF PROTO=TCP SPT=80 DPT=4148 SEQ=3946414554 ACK=1679195815 WINDOW=16861 RES=0x00 ACK URGP=0

Any other suggestions to fix the TCP checksum errors?

----------

## buzzin

maybe try another network card which uses a different kernel driver?

----------

## CriminalMastermind

sorry for the late reply,

 *buzzin wrote:*   

> maybe try another network card which uses a different kernel driver?

 

i think it's pretty slim that the problem would be the network driver.  i don't think network drivers know anything about the layers above ethernet.  they may have some knowledge of ip, but i'm pretty sure they don't know about udp or tcp.

 *msutton wrote:*   

> Any other suggestions to fix the TCP checksum errors?

 

not really, you could check and make sure they are leaving there source host ok, if you had access to them, but i doubt they will be sending tcp packets with bad checksum.  i've think i've seen tcp performance checkers somewhere... but again, i don't think that will help you.

i'm pretty sure routers don't have any knowledge of tcp/udp or anything above the ip level, so i don't think they will be looking at anything above the ip level.  that means any of the routers the packet pass through could be corrupting the packet as it goes through there buffers.

the only way i could think of figureing out what router is corrupting the packest would be if you notice a pattern, like the same hosts always give you corrupt packets once in a while, then maybe using trace route to find common routers used.  that doens't sound like anything resembling fun and i think has an extremely low probability of sucess.  even if you did find what you thought was the guilty router, i don't know how you could go about proving it had a problem.

this may be one of those cases where things may not be working, yet they are.  ip and tcp have check sum's built into them for a reason.  data does get corrupted every once and a while along the trip.  i've never look at this, so i couldn't comment on how much traffic ends up getting corrupted, but i'd look into the percentage of how much traffic you are getting corrupted vs how much good traffic you get back.  i don't know where you could find what is an acceptable percentage, but if it seems low, perhaps this is just the way things are.

hope that helped.

----------

## msutton

Last Night I figured it out.

It is a Cisco IAD 2431-8FXS.

The ethernet port on the back flips in and out of full and half duplex.

And when you unplug the cable out of it all the lights stay on even though nothing is in it.

I believe the port is borked.

I did change the card to a realtek and the realtek would log the duplex changing and the mismatches where the 3com 3c905c would not.

Guess it is just a driver thing with the logging.

But hopefully this will get squared away soon and my internet will be up at full speed again.

Thanks for all your help guys.

----------

