Community discussions

MUM Europe 2020
 
mccreigh
just joined
Topic Author
Posts: 8
Joined: Sun Apr 27, 2014 9:01 pm

Obsolete connection table entries

Mon Jan 15, 2018 12:42 pm

Fellow MikroTik enthusiasts --

My ISP assigns me a single dynamic public IPv4 address, and I run a small Asterisk SIP server on my LAN. In my MikroTik router I run a once-per-minute script that looks into the router's DHCP client and updates a dynamic DNS server if the public IPv4 address has changed.

My problem is (and has been for years) that if the public IPv4 address changes, there's enough outbound chatter to my SIP partners that the now-stale associated UDP/5060 connection table entries never time out, and so traffic for "existing" SIP connections exits my WAN interface with an obsolete source address and is presumably discarded immediately by my ISP.

I once imagined that a packet would never exit a src-NAT interface with a source address not currently assigned to the interface, but I was wrong. Experiment shows that it still happens, at least if the interface address was bound by the DHCPv4 client, the binding subsequently changed, and there was enough continuing traffic to keep the obsolete connection table entry alive.

The standard SIP UDP ports are dest-NAT'ed to the SIP server on my LAN, so the router's connection table shouldn't be necessary for SIP, but it is quite useful for other things. As I understand it, it can be either on or off but there are no finer-grain settings.

Over the years I've tried various workarounds with mixed success. For example, in my once-per-minute script I've tried removing all SIP connections involving my Asterisk server, like this

/ip firewall connection remove [find src-address="192.168.1.10:5060" protocol=udp];
/ip firewall connection remove [find reply-src-address="192.168.1.10:5060" protocol=udp];

where 192.168.1.10:5060 is the SIP port of the Asterisk server on my LAN.

A couple of years ago, the last time I looked at it carefully, those statements sometimes, although rarely, generated errors. So I surrounded them with

:do {
/ip firewall connection remove [find src-address="192.168.1.10:5060" protocol=udp];
/ip firewall connection remove [find reply-src-address="192.168.1.10:5060" protocol=udp];
} on-error={
:log info ("Couldn't remove :5060 UDP connections");
};

That worked better, but I still wasn't sure it worked perfectly.

Has anyone else faced this problem and found a completely reliable solution/workaround?

Many thanks,
Ed McCreight
 
sid5632
Member
Member
Posts: 379
Joined: Fri Feb 17, 2017 6:05 pm

Re: Obsolete connection table entries

Mon Jan 15, 2018 3:32 pm

Yes, I've had stuck/stale connection table entries, once with 5060/udp.
They're rare enough that I just kill the connections manually.
 
pe1chl
Forum Guru
Forum Guru
Posts: 6240
Joined: Mon Jun 08, 2015 12:09 pm

Re: Obsolete connection table entries

Mon Jan 15, 2018 5:03 pm

It is apparently a well known problem (I never see it because here it is not customary to change IP address on fixed interfaces)
but the workaround is to take down (disable) the external interface and bring it back up (enable).
Then the connection table is flushed of entries referring to the old address.
 
sid5632
Member
Member
Posts: 379
Joined: Fri Feb 17, 2017 6:05 pm

Re: Obsolete connection table entries

Tue Jan 16, 2018 3:56 am

I should have made it clear that I'm on a static IP address.
Taking the whole interface down is a bit drastic just to clear one stuck connection don't you think?
 
pe1chl
Forum Guru
Forum Guru
Posts: 6240
Joined: Mon Jun 08, 2015 12:09 pm

Re: Obsolete connection table entries

Tue Jan 16, 2018 2:48 pm

I should have made it clear that I'm on a static IP address.
My reply was to the original poster, who has a dynamic address.
 
sindy
Forum Guru
Forum Guru
Posts: 4220
Joined: Mon Dec 04, 2017 9:19 pm

Re: Obsolete connection table entries

Tue Jan 16, 2018 6:23 pm

Can you look to the log when the "remove connection" yields an error to see what was the reason why the deletion has failed? It may be a valid reason which can be handled properly, or it may be a bug which needs reporting. If manual removal is possible, the first variant is more likely.
Instead of writing novels, post /export hide-sensitive. Use find&replace in your favourite text editor to systematically replace all occurrences of each public IP address potentially identifying you by a distinctive pattern such as my.public.ip.1.
 
sid5632
Member
Member
Posts: 379
Joined: Fri Feb 17, 2017 6:05 pm

Re: Obsolete connection table entries

Tue Jan 16, 2018 8:47 pm

I should have made it clear that I'm on a static IP address.
My reply was to the original poster, who has a dynamic address.
You should have included a quote then, to make it obvious.
 
mccreigh
just joined
Topic Author
Posts: 8
Joined: Sun Apr 27, 2014 9:01 pm

Re: Obsolete connection table entries

Sat Jan 20, 2018 1:40 pm

Hello sindy --

My "worker" script runs every couple of minutes. On average it hangs about once a day
within the statement

/ip firewall connection remove [find reply-src-address="192.168.1.10:5060" protocol=udp];

even though that statement is inside a do {} block with an on-error clause. (I know exactly
where it's hanging because my worker script is heavily decorated with
debugging statements, so I can see in a global variable a kind of
program counter within the script.)

When that hang happened yesterday, the following line was posted to the log:

Jan/19/2018 10:11:23 script,error script error: no such item (4)

I assume that occasionally during the execution of the statement, a row of the connection table
(I guess row 4 in the present example) times out and disappears
between the "find" and the "remove", so the "remove" can
no longer find it. There are many ways of rewriting that statement, but every one I can think of
has an atomicity problem between "find" and "remove".
The error happens rarely enough in my setup that it wouldn't be a problem
if only the on-error clause trapped the error. Which it doesn't. Instead the script just hangs.

When I first noticed a couple of years ago that that on-error clause wasn't working,
I worked around the bug by writing a periodically-
executing supervisor script to supervise the worker script. When the supervisor script notices that
the worker script has made no progress for several invocations of
the supervisor script, the supervisor script does a

/system script job remove [find name="worker"];
/system script run "clean_up";

to get things working again, which happened eleven minutes after the error message above.
 
User avatar
sebastia
Forum Guru
Forum Guru
Posts: 1796
Joined: Tue Oct 12, 2010 3:23 am
Location: Antwerp, BE

Re: Obsolete connection table entries

Sun Jan 21, 2018 2:55 am

Hi

Do you use as action "src-nat" or "masquerade"? I remember reading somewhere that in case of masq, the conntrack gets auto-cleared... Not for src-nat though
 
mccreigh
just joined
Topic Author
Posts: 8
Joined: Sun Apr 27, 2014 9:01 pm

Re: Obsolete connection table entries

Sun Jan 21, 2018 2:43 pm

Hello sebastia --

Interesting. I never thought there might be a difference. As it happens, I have been using masquerade.

It appears that the obsolete connection table entries are being
kept alive by periodic registrations from the Asterisk server on my LAN to other SIP servers,
my SIP-trunking suppliers, on the WAN. Those packets on the LAN have unchanging LAN source IP addresses/ports and unchanging WAN
destination IP addresses/ports, so I imagine that after the routing tables have chosen the outbound interface,
the connection table mechanism just finds a row that matches both LAN src ip/port and WAN dest ip/port, updates that row's timer, gives the
packet the indicated new source IP and port for use on the WAN, and ships the packet out the already-decided interface
without checking whether the new source IP might still be appropriate for that interface.

There's a similar problem if you have two WAN's and your scripts decide that the primary one is not working as well as it should
and they want to shift new SIP connections to the secondary one. My script does that by changing
routing priorities, which changes the outgoing interface.
But then the connection table gets hold of the packet, finds a match as described above, and slaps the source address of the
primary connection on the packet before sending
it to the secondary ISP. If the secondary ISP is any good, he discards the packet.

Sure seems like a longstanding bug to me. Am I wrong?

An obvious-seeming fix would be to check the packet after the connection table match for a source
address consistent with the interface if the interface is masq'ed. Inconsistent source address?
Remove that connection table entry and run the packet through the connection table again.
 
User avatar
sebastia
Forum Guru
Forum Guru
Posts: 1796
Joined: Tue Oct 12, 2010 3:23 am
Location: Antwerp, BE

Re: Obsolete connection table entries

Sun Jan 21, 2018 10:58 pm

I remember Mikrotik personnel explaining why a CCR with a few thousand PPPoE clients using masquerade, is not scalable when a few drop every second. In that case study, it was explained that when a PPPoE with masq would die, system would need to examine all connection tracking data and discard related ones.

See slide 25: https://mum.mikrotik.com/presentations/ ... 948376.pdf

Coming back to your issue, you could use that, and disable the failing interface to force clearing it's conn track entries.

What is happening now:
conn track is still unaware that your link is down and using that info
but routing has already changed to fall-back route
=> the 2 aren't aligned

Ps: slide 27+ are also applicable here
 
User avatar
acruhl
Member
Member
Posts: 363
Joined: Fri Jul 03, 2015 7:22 pm

Re: Obsolete connection table entries

Mon Jan 22, 2018 5:55 am

and so traffic for "existing" SIP connections exits my WAN interface with an obsolete source address and is presumably discarded immediately by my ISP.
Sorry, off topic but I couldn't resist. I found a whole range of RFC1918 addresses my ISP (or some other ISP customer?) replies to, and it shouldn't happen. It's not a bad idea to block RFC1918 on the way out of your outbound interface.

This is not happening here, but spoofing of public IP addresses (inserting a bogus, but routable address in the source) is one of the biggest problems on the internet. And for whatever reason most providers aren't looking for them. Routers mostly only care about where traffic is going not where it "comes from".
Stuff.

Who is online

Users browsing this forum: Dude2048, Google [Bot], Guntis, MSN [Bot] and 120 guests