Community discussions

MikroTik App
 
pe1chl
Forum Guru
Forum Guru
Topic Author
Posts: 6787
Joined: Mon Jun 08, 2015 12:09 pm

TCP session connection tracking bug?

Thu Jun 23, 2016 5:31 pm

I have been debugging a problem that partly was caused by some other equipment, but a MikroTik
router was involved in it in the following way:

The router contains a Forward ruleset that allows Established/Related, drops Invalid, and allows New
connections in one direction.  Not too unusual.  It is not a NAT environment, pure routing.

What I have observed: some TCP sessions that have no traffic are ticking down from the
"TCP unacked" timeout instead of the "TCP established" timeout set in the connection tracking.
This timeout is only 5 minutes by default, the established timeout being one day.

After those 5 minutes the tracking entry is removed, and some time later the remote end closes
the idle connection with a FIN ACK which is dropped as "invalid" (correct), the local side never sees
the connection being closed and assumes it is still open.  When the local side wants to send more data, a new
tracking entry is created by the router and the data is sent.  However, there is another firewall further down the
path (not MikroTik) which has seen that unanswered FIN ACK and has by now deleted its tracking entry, and it drops
this data because it is invalid.  It does not send any reply to inform about this.  The local system
sees a dead TCP connection and complains.

I have worked around this, first by sending a TCP RESET on TCP traffic that would be dropped
as invalid (this makes the local system realize something is wrong and re-establish the session),
but then I noticed the root cause of the problem, increased the "TCP unacked" timeout and now
the session is neatly closed.

However, the root cause of this problem is a mis-classification of an idle session.
I sort of recall I have seen this before.  SSH sessions that died when not used for some time.
In the case I was researching it was a connection from a Linux to a Windows system, but in the
SSH case it was from Linux to Linux.
What does "TCP Unacked" mean anyway?  There is another timer for "TCP retransmit", but this
is not involved (I changed the timeout values to see from which one it was ticking down).
It is not documented on the WiKi.
Could it be that this one misfires when a "TCP Keepalive" is sent by one side and not the other?
 
andriys
Forum Guru
Forum Guru
Posts: 1353
Joined: Thu Nov 24, 2011 1:59 pm
Location: Kharkiv, Ukraine

Re: TCP session connection tracking bug?

Fri Jun 24, 2016 5:07 pm

What does "TCP Unacked" mean anyway?
I guess that means connection tracking code hasn't seen (or has missed) some TCP handshake messages (i.e. ACK or SYN+ACK). Just a guess based on the state name, though.

Anyways, what you describe sounds like a bug. What board do you see this on? Is it a multicore one? If it is, it might be a process/thread synchronization problem.
 
pe1chl
Forum Guru
Forum Guru
Topic Author
Posts: 6787
Joined: Mon Jun 08, 2015 12:09 pm

Re: TCP session connection tracking bug?

Fri Jun 24, 2016 5:16 pm

The router is a RB951G-2HnD so I would not expect that.
I find it difficult to obtain documentation about what those different states actually are, not only in the
MikroTik WikI but also in generic Linux documentation.   Usually the names are listed and you are supposed
to derive the function yourself.   Maybe this weekend I find time to look in the kernel source to see what it
is supposed to do.

This issue is not occurring on all sessions, not even all sessions between two specific endpoints, but it does
happen all the time for a certain kind of session.  In this case a session from a printer to a follow-you print
server.  The printer connects to the server when the MiFare ID card is scanned and it basically runs a
HTTP/1.1 session with Keep-Alive over a nonstandard portnumber (2939).  After having exchanged some session
info the TCP connection remains intact (the server has sent a reply and the client has read it), I think the
connection tracking should consider it "Established" at that time, with a 1-day timeout.  But the tracking entry
was ticking down from a 5-minute timeout instead.  When the idle timeout of the server is > 5 minutes, this
results in problems (that took me quite some time to debug).
However, I don't yet understand why this happens to this specific session.  Other TCP sessions in the same
router, e.g. its BGP session or the Web management are registered correctly.
 
andriys
Forum Guru
Forum Guru
Posts: 1353
Joined: Thu Nov 24, 2011 1:59 pm
Location: Kharkiv, Ukraine

Re: TCP session connection tracking bug?

Fri Jun 24, 2016 5:29 pm

it does happen all the time for a certain kind of session.
Have you tried wiresharking the session? Can you share the dump?
it basically runs a HTTP/1.1 session with Keep-Alive over a nonstandard portnumber (2939).
Do you have any special firewall rules for this? What's the rule?
 
pe1chl
Forum Guru
Forum Guru
Topic Author
Posts: 6787
Joined: Mon Jun 08, 2015 12:09 pm

Re: TCP session connection tracking bug?

Fri Jun 24, 2016 6:39 pm

Yes I have some traces but they do not reveal anything noticable except what I wrote above.
The same device also makes other TCP sessions, and those do not show this behaviour.

The firewall rules in the router were as shown in the first article.  Later to work around the
problem I added an extra rule for invalid that in case of protocol TCP did not simply drop
the traffic but rejects it with TCP RESET.  Then the user-level problem was solved.  When
I identified what was going on, I increased the Unacked timeout from 00:05:00 to 00:30:00
and now there is no invalid traffic being counted anymore, indicating that the sessions are
closed in the expected way.
 
Mosin
just joined
Posts: 3
Joined: Sun Sep 11, 2016 9:29 pm

Re: TCP session connection tracking bug?

Sun Sep 11, 2016 9:39 pm

I have the same problem with an RB2011UiAS-2HnD-IN; idle RDP sessions seem dropped after TCP unacked timeout.
Could you please share your TCP RESET extra rule?
 
R1CH
Forum Veteran
Forum Veteran
Posts: 928
Joined: Sun Oct 01, 2006 11:44 pm

Re: TCP session connection tracking bug?

Mon Sep 12, 2016 12:03 am

A TCP RESET will terminate the connection regardless, just on your terms instead of when the router decides to. This is a bug in recent RouterOS versions which needs to be fixed by Mikrotik. The best workaround for now is to set the TCP unacked timeout to 1 day, this makes it the same as a normal established timeout without causing interrupted connections.

The unacked timeout is supposed to be for data that is sent and hasn't been acknowledged. Somehow the RouterOS kernel isn't seeing the ACK for the data, so it uses the wrong timeout and drops connections much sooner than it's supposed to.
 
pe1chl
Forum Guru
Forum Guru
Topic Author
Posts: 6787
Joined: Mon Jun 08, 2015 12:09 pm

Re: TCP session connection tracking bug?

Mon Sep 12, 2016 11:35 am

I don't think an RDP connection would be sufficiently idle for 5 minutes to trigger this problem, probably it is
something else. But in any case you can increase the TCP Unacked timeout in the connection tracking, I
have increased it from 5 to 30 minutes but in case that is still causing trouble you could increase it to 1 day
like the TCP Established timeout.

However, it is a bug (probably a kernel bug) and something needs to be done about it.
 
Mosin
just joined
Posts: 3
Joined: Sun Sep 11, 2016 9:29 pm

Re: TCP session connection tracking bug?

Mon Sep 12, 2016 12:02 pm

Thank you pe1chl
I also thought it could not be the cause for RDP but after increasing the "TCP unacked timeout" the problem seems solved; I'm still waiting for some time before drawing conclusions.
I had already increased the unacked timeout before reading this post after noting the strange reduction, from 24h to 5 minutes, of RDP connection timeout.
Reading about your extra rule I asked you to share it because I was hoping to learn something more.
 
Mosin
just joined
Posts: 3
Joined: Sun Sep 11, 2016 9:29 pm

Re: TCP session connection tracking bug?

Thu Sep 15, 2016 3:12 pm

I don't think an RDP connection would be sufficiently idle for 5 minutes to trigger this problem....
If you collapse the RDP window the connection will result idle.
 
drees
just joined
Posts: 22
Joined: Tue Sep 20, 2016 9:39 pm

Re: TCP session connection tracking bug?

Sat Oct 08, 2016 12:36 am

I'm seeing the same issue, RB951G-2HnD running ROS 6.37.1 firmware 3.33.

It's acting as a router + AP with the AP bridged with the LAN ports.

A Linux machine NATed behind the router (connected via ethernet) SSHes out to a remote machine on the internet. If the SSH connection remains idle long enough (not sure exactly, but probably 5-10 minutes?), the Timeout drops from 23 hours to 5 minutes, then if that timeout hits the connection is closed with a SSH error "Write failed: broken pipe" and I have to reconnect.

If I generate a little bit of traffic, that's enough to get the Timeout bumped up to the TCP Unacked Timeout. A few directory listings, for example.

If I generate more traffic on the SSH session (running top for 15 seconds seems to be enough), that's enough to get the Timeout bumped up to 24 hours (TCP Established Timeout) again.

Any ideas? It's pretty annoying to have your SSH sessions die, and setting the TCP Unacked Timeout higher seems like it could have other side effects.
 
pe1chl
Forum Guru
Forum Guru
Topic Author
Posts: 6787
Joined: Mon Jun 08, 2015 12:09 pm

Re: TCP session connection tracking bug?

Sat Oct 08, 2016 11:37 am

THe only side effect of increasing that timer is that potentially some TCP connections that have died may be in the
tracking table a bit longer. Should not cause issues.
When you want to solve your SSH problems and can access the server side config, add something like this
to /etc/ssh/sshd_config:

ClientAliveInterval 240
ClientAliveCountMax 6

This sends a dummy packet over the line that keeps the connection open, also for other routers that may be in the path.
Unfortunately there is no such feature at the client side, and setting TCPKeepAlive does not seem to fix it.
 
drees
just joined
Posts: 22
Joined: Tue Sep 20, 2016 9:39 pm

Re: TCP session connection tracking bug?

Sat Oct 08, 2016 11:45 pm

I rolled the RB951G-2HnD router/AP back to 6.34.6 on the bugfix branch and the bug does not seem to exist there, though the behavior is slightly different.

Initially the connection timeout will start at 2 days, then it will drop down to around 1 day. Sometimes it will jump back up to around 2 days.

Kinda weird, but at least you don't have to worry about the connections dropping any more.
 
pe1chl
Forum Guru
Forum Guru
Topic Author
Posts: 6787
Joined: Mon Jun 08, 2015 12:09 pm

Re: TCP session connection tracking bug?

Sun Oct 09, 2016 11:44 am

Yes there has been a recent change but I do not remember at which version.
I thought it was before 6.34.6 but I am not sure.

Anyway, the behaviour is a bit random. For some connections no problem at all, for others this problem occurs.
 
drees
just joined
Posts: 22
Joined: Tue Sep 20, 2016 9:39 pm

Re: TCP session connection tracking bug?

Wed Oct 12, 2016 8:51 am

I sent support an email referencing this thread, and they asked if I had any fasttrack rules while also asking for my supout.rif file.

Based on the question, I disabled my fasttrack rule and now it appears that the connection timeouts work as expected on 6.37.1.
 
drees
just joined
Posts: 22
Joined: Tue Sep 20, 2016 9:39 pm

Re: TCP session connection tracking bug?

Sat Nov 05, 2016 9:17 am

Based on my testing, v6.38rc24 fixes the issue completely, I can't tell any difference in session tracking with fasttrack on or off now.
 
pe1chl
Forum Guru
Forum Guru
Topic Author
Posts: 6787
Joined: Mon Jun 08, 2015 12:09 pm

Re: TCP session connection tracking bug?

Sat Nov 05, 2016 10:57 am

Based on my testing, v6.38rc24 fixes the issue completely, I can't tell any difference in session tracking with fasttrack on or off now.
Ok that is good to hear, in the upcoming weeks I will have the opportunity to test another setup in which I experience this bug and that I can
update to an RC version, to see if in my case it works too.
 
haj3s29a
newbie
Posts: 25
Joined: Sun Jul 05, 2020 5:02 pm

Re: TCP session connection tracking bug?

Sat Aug 01, 2020 8:32 pm

I have similar issue on my Chateau C12 LTE. After latest ROS v7.1beta1, I cannot connect to my SSH server. Always get following error:
packet_write_wait: Connection to x.x.x.x port 22: Broken pipe
I do not see any dropped packets in log

UPDATE

It is an issue with fasttrack vs normal firewall processing.

Who is online

Users browsing this forum: Google [Bot], neticted and 167 guests