1.3.14: Numerous CLOSE_WAIT connections may cause out-of-con

If your problem doesn't fall into one of the other categories, report it here.

Moderator: Moderators

Post Reply
User avatar
rsaddey
Posts: 6
Joined: Mon Jun 13, 2011 7:24 am
Location: Berlin, Germany
Contact:

1.3.14: Numerous CLOSE_WAIT connections may cause out-of-con

Post by rsaddey »

I'm using Gargoyle 1.3.14 with following setup:

1. ISP
2. ADSL modem
3. Gateway - DIR-655, WAN=PPOE, LAN=192.168.48.1 (DHCP server enabled)
4. Gateway - Gargoyle 1.3.14 on WRT-54GL v1.1, WAN=192.168.48.2 (static)


Symptoms:

1. The TCP NAT table within the DIR-655 contains numerous connections in CLOSE_WAIT state belonging to the Gargoyle (i.e. 192.168.48.2:*).
2. The DIR-655 SPI firewall frequently complains about invalid FIN:ACKs and RST:ACKs (i.e. they do not refer to valid connections).


Analysis:

1. DST IP+Port within FIN:ACKs and RST:ACKs on the DIR-655 refer to connections in CLOSE_WAIT initiated from the Gargoyle router.
2. SRC IP within within FIN:ACKs and RST:ACKs received by the DIR-655 from the Gargoyle router do NOT state the WAN IP of the Gargoyle, but instead reflect the IPs of arbitrary clients within the NAT of the Gargoyle (i.e. 192.168.1.*).
3. Googling for gargoyle close_wait reveals links to Gargoyle sources (e.g. http://www.gargoyle-router.com/gargoyle ... sh/uip.cpp) which contain following comment: "CLOSED and LISTEN are not handled here. CLOSE_WAIT is not implemented, since we force the application to close when the peer sends a FIN (hence the application goes directly from ESTABLISHED to LAST_ACK)"


Deductions:

1. As yet, it appears as though Gargoyle does not fully implement RFC793 state transitions.
2. When the remote peer closes a connection, Gargoyle prematurely purges the connection (instead of marking it CLOSE_WAIT).
3. At the time the NAT client sends its FIN:ACK, Gargoyle no longer has any NAT mapping for the connection and chooses to forward the FIN:ACK unchanged (i.e. IP + ports from INSIDE its NAT).
4. The peer receiving the invalid forwarded FIN:ACK is unable to identify the connection the FIN:ACK applies to. Thus the connection remains in CLOSE_WAIT at the peer. In this set up the peer is the upstream DIR-655 which will log the invalid FIN:ACK and chooses to apply its normal TCP in-session timeout (i.e. two hours) thus filling its NAT table with numerous CLOSE_WAIT entries.
5. As the peer did not receive the FIN:ACK, no ACK will be sent by the peer nor received by the NATted client, eventually causing the client to send a final RST:ACK in desperation.
6. This RST:ACK is as well forwarded by Gargoyle to the peer (again SRC fields indicating IP+port from WITHIN the NAT) which has no other option than to ignore (and log) the invalid RST:ACK.
7. Thus the peer (in this set up both the DIR-655 as well as the ultimate peer within the Internet) will never receive a confirmation for the connection to be closed.


TCP Connection terminations initiated by remote peers can only reach CLOSED state through time outs, possibly causing excessive resource consumption.

behappy
Posts: 84
Joined: Thu Mar 31, 2011 5:06 pm

Re: 1.3.14: Numerous CLOSE_WAIT connections may cause out-of-con

Post by behappy »

I did change the conntrack timeout followed the guide from internet.

Change/add the file - etc\sysctl.conf -
My example
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established=1200
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_sent=30
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_recv=30
net.ipv4.netfilter.ip_conntrack_tcp_timeout_fin_wait=30
net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait=30
net.ipv4.netfilter.ip_conntrack_tcp_timeout_close=10
net.ipv4.netfilter.ip_conntrack_tcp_timeout_close_wait=30
net.ipv4.netfilter.ip_conntrack_tcp_timeout_last_ack=30
net.ipv4.netfilter.ip_conntrack_udp_timeout_stream=180

pbix
Developer
Posts: 1373
Joined: Fri Aug 21, 2009 5:09 pm

Re: 1.3.14: Numerous CLOSE_WAIT connections may cause out-of-con

Post by pbix »

rsaddy,

Some interesting analysis here. I would suggest you take up this matter over at openwrt.org. The code in quesiton here is to my knowledge part of openwrt and not modified by us intentionally.

Gargoyle is mostly a user interface on top of OpenWRT. We do have some specific functionality but I am not aware of anything involving the connection table state machine.

The code you reference in your post is part of the fone flash utility used to upgrade firmware on some routers and would not be active when the router was actually running so I do not believe it is related to your issue.
Linksys WRT1900ACv2
Netgear WNDR3700v2
TP Link 1043ND v3
TP-Link TL-WDR3600 v1
Buffalo WZR-HP-G300NH2
WRT54G-TM

User avatar
rsaddey
Posts: 6
Joined: Mon Jun 13, 2011 7:24 am
Location: Berlin, Germany
Contact:

Re: 1.3.14: Numerous CLOSE_WAIT connections may cause out-of-con

Post by rsaddey »

Thanks pbix,

the CLOSE_WAIT problem is mitigated by both CLOSE_WAIT only occurring if the remote peer initiates the close and some routers allowing the CLOSE_WAIT time out to be overridden.

A typical use case for Gargoyle covers no more than some tens of local clients and does not include local servers (beware of P2P though). With this use case most TCP connection closes will be initiated from local clients, where there's no problem at all.

With my setup, I have never encountered more then some hundred connections in CLOSE_WAIT at the upstream router. My upstream router is a DIR-655, which is said to be capable of handling more than 20 thousand simultaneous connections (see http://www.smallnetbuilder.com/lanwan/r ... simul-conn).

So for me, I don't expect the CLOSE_WAITs to have any perceivable effect at all :D

Thanks again,
Reiner

Post Reply