Hi!
I believe I found a bug in at least the 5.xx releases (tested with 5.14 and 5.20). Can someone verify my findings please. Following is the setup:
VoIP Phone (h.323) - Switch - Mikrotik Router - Switch - VoIP Gateway
Here the packet flow
- A call (media is initialized by the Gateway to the phone
Source Destination Protocol Info
Gateway Phone TCP src port 4139, dst port 1720, SYN
Phone Gateway TCP src port 1720, dst port 4139, SYN, ACK
Gateway Phone TCP src port 4139, dst port 1720, ACK
- Now they talk a little and than they hang up
Source Destination Protocol Info
Gateway Phone TCP src port 4139, dst port 1720, FIN, ACK
Phone Gateway TCP src port 1720, dst port 4139, FIN, ACK
The mikrotik is at this time in the tcp-time-wait-timeout state (default: 10s) - (from the wiki: maximal amount of time connection tracking entry will survive after having seen connection termination request (FIN) just after connection request (SYN) or having seen another termination request (FIN) from connection release initiator)
- now we call immediately again, and the gateway uses the same src port … which leads to the same connection tuple (source IP, source port, destination IP, destination port).
Source Destination Protocol Info
Gateway Phone TCP src port 4139, dst port 1720, SYN
Phone Gateway TCP src port 1720, dst port 4139, SYN, ACK
Gateway Phone TCP src port 4139, dst port 1720, ACK
But now the Phone is not able to send normal packets to the gateway back until the tcp-time-wait-timeout has run out.
You see following after the initial TCP handshake.
Source Destination Protocol Info
Gateway Phone TCP src port 4139, dst port 1720, h323: setup OpenLogicalChanel
Phone Gateway TCP src port 1720, dst port 4139, ACK
Phone Gateway TCP src port 1720, dst port 4139, h323: connect openLogicalChannel (Filtered by Microtik)
Phone Gateway TCP src port 1720, dst port 4139, resent, h323: connect openLogicalChannel (Filtered by Microtik)
Phone Gateway TCP src port 1720, dst port 4139, resent, h323: connect openLogicalChannel (Filtered by Microtik)
Phone Gateway TCP src port 1720, dst port 4139, resent, h323: connect openLogicalChannel
The last resent is after the fin packet time + tcp-time-wait-timeout seconds and reaches the gateway. I’ll decreased the tcp-time-wait-timeout to 1 second and the problem went away, as I was not able to dial so fast
. The same was true when deactivating connection tracking.
I searched through the netfilter mailling list and at some time there was an error in it which looks similar, even if to old
.
http://www.mail-archive.com/git-commits-head@vger.kernel.org/msg23313.html
With your description I could reproduce the bug and actually you were
completely right: the code above is incorrect. Somehow I was able to
misread RFC1122 and mixed the roles >> :
When a connection is >>closed actively<<, it MUST linger in
TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).
However, it MAY >>accept<< a new SYN from the remote TCP to
reopen the connection directly from TIME-WAIT state, if it:
[…]
Update: I found following in the RFC1122: http://tools.ietf.org/html/rfc1122
When a connection is closed actively, it MUST linger in
TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).
However, it MAY accept a new SYN from the remote TCP to
reopen the connection directly from TIME-WAIT state, if it:(1) assigns its initial sequence number for the new
connection to be larger than the largest sequence
number it used on the previous connection incarnation,
and(2) returns to TIME-WAIT state if the SYN turns out to be
an old duplicate.
I verified that the initial sequence number of the SYN Packet is larger than the last one from the gateway. And also the sequence number for the SYN,ACK packet is larger then the last from the phone. I deactivated for this the “relative sequence numbers” setting in Wireshark.
I can also provide packet traces, before and behind the router.