IPTables_Friend - 10:59 am on Nov 10, 2007 (gmt 0) The problem starts out in a straight forward way; - User level observation is an intermittently failing protocol;
As this was one of only a couple of places where this problem was being discussed, and as there was no solution or proposed solution published, I wanted to post my improvements to his information and what I believe was my root cause (testing is still in progress).
The problem starts out in a straight forward way;
- User level observation is an intermittently failing protocol;
But the problem is hard to diagnose; - System level observation is an INVALID packet match;
But the problem is hard to diagnose;
- System level observation is an INVALID packet match;
iptables -A INPUT -m state --state INVALID -j LOG
--log-level info --log-prefix "INVALID (input):"
iptables -A INPUT -m state --state INVALID -j DROP
xx xx xx:xx:xx localhost kernel: INVALID (input):
IN=eth1 OUT= MAC= SRC=xx.xx.xx.xx DST=xx.xx.xx.xx
LEN=561 TOS=0x0C PREC=0x20 TTL=56 ID=2945 PROTO=TCP
SPT=34920 DPT=80 WINDOW=183 RES=0x00 ACK PSH URGP=0
To the point that it took me a week of on-again off-again testing before I got to an indication of the cause; - Packet level observation is a TCP packet that reportedly
contains an incorrect checksum;
To the point that it took me a week of on-again off-again testing before I got to an indication of the cause;
- Packet level observation is a TCP packet that reportedly
tcpdump -i eth1 -s1500 -vvv port 80 ¦ grep -i incorrect
xx:xx:xx.577564 IP (tos 0x2c, ttl 56, id 2945, offset 0,
flags [none], proto: TCP (6), length: 561)
sourcehost.34920 > desthost.http: P,
cksum 0x39fd (incorrect (-> 0xfc9e),
1:522(521) ack 1 win 183
Like AlexK, I wasn't prepared to believe that this was typical hacker crap. Unlike AlexK, I found that I could intermittently reproduce the problem using one of my own Internet connections (so I could confirm his and my theory, and create a somewhat slow and painful testing platform -- but slightly faster than "wait-n-see").
So the real piece of missing information for AlexK (and myself originally) was that the INVALID state match doesn't just match on "out of state" packets - it also matches packets with other bad attributes, including an incorrect TCP checksum. This isn't commonly discussed, but is noted around the traps .. ie;
In my case, I saw this problem on all of the hosts on my public network. This is odd as my iptables scripts have evolved across the past 10 years, with this particular install being a slightly improved version again - but with no substantial changes. And this was certainly something that I hadn't seen before.
While testing is still running, in my case, due to the common affect of the problem across multiple hosts and multiple services, and due to the incorrect TCP checksum seen at the host interface, I concluded that;
The second conclusion was going to be the easiest to test, as I didn't have alternative parts for these boxes, not lying around anyway.
Upstream I had a small router-firewall appliance of a sort I'd not used before; I typically deploy iptables boxes on hardened Linux platforms as these seem to be the most robust. This particular router-firewall had a couple of "native" security features which I'd taken advantage of - something the vendor called "Port Scan and DOS Protection". The Port Scan detection was working, so I left them running.
However, given the TCP checksum issue, I concluded that the Denial of Service Protection was probably trying to do its own version of SYN Cookies (or something similar), which was requiring packets to be re-assembled in-line, before reaching the hosts; thus being the most likely cause of my problem.
As it turns out, disabling this service has, so far, proven to resolve my INVALID packet issue.
I hope that this information, along with AlexK's original thread, remain available to help other future surfers who are attempting to resolve this very intricate fault.
AlexK, thanks for posting the detail you did; you helped to shorten my root-cause analysis. If you've been waiting for two years to resolve this problem, then I'm glad I could help (and sorry I wasn't here sooner ;-))
[edited by: encyclo at 11:12 am (utc) on Nov. 10, 2007]