How to flush connection in a failover route change ?

lurker888 · June 23, 2025, 5:56pm

@johnmansey: That doesn’t work. The removal stops upon encountering the first error, therefore although the error is suppressed, the deletion is only partial.

BartoszP · June 23, 2025, 6:27pm

As on-error is applied to “do”, not “remove”. When remove fails then do fails and on-error catches the last end the only error

jonmansey · June 23, 2025, 6:37pm

I used this to catch the error of there being no UDP connections in the table, which was causing my script to end prematurely on

remove [find protocol=“udp”]

lurker888 · June 23, 2025, 7:31pm

@BartoszP: I don’t know if you’re agreeing or not, but let’s clarify.

The problem with the “remove [find]” construct lies in the following. It is executed like this:

Make a list of things to delete (find)
It then passed this list to remove, which removes the listed entries serially

There may be connections that have timed out (or for tcp, ended) sometime between the list generation and the removal.

In this case, when “remove” encounters this item, it a throws a “no such item” error.

The problem is that “remove” stops at this point, although the result of “find” may contain further entries to remove.

My workaround applies “remove” individually and catches each error without stopping the iteration. I think anav has the other common one, where only ( timeout > 60s ) items are deleted, and assumes that these won’t disappear during the process.

To illustrate:

[admin@xxx] /ip/firewall/connection> print count-only where protocol=udp              
3328
[admin@xxx] /ip/firewall/connection> :do { remove [find protocol=udp]; } on-error={ };
[admin@xxx] /ip/firewall/connection> print count-only where protocol=udp              
3281

So depending on luck either all, most or a few connections are deleted.

EDIT and PS. I would really love it if there was an ignore-errors or such argument specifically for removing these. The only other example where this would come in handy that I’m aware of is dynamic (with timeout) address list entries.

Maybe it could be the default behavior to attempt to remove all entries, and still return an error when it’s done.

anserk · June 24, 2025, 1:54am

The foreach loop is the key since each iteration is an individual command. The suggestion from @lurker888 can be simplified a bit. Both command produce the same result:

:foreach i in={"dummy1"; "dummy2"; "dummy3"} do={ :onerror e in={ :put $i; /file/get $i } do={} }

:foreach i in={"dummy1"; "dummy2"; "dummy3"} do={ :put $i; /file/get $i } on-error={}

The only benefit of the more complex structure, I think, is having the error condition stored in a variable for later use.

:foreach i in={"dummy1"; "dummy2"; "dummy3"} do={ :onerror e in={ :put $i; /file/get $i} do={ :put $e } }

jaclaz · June 24, 2025, 11:36am

You do understand how this sounds a lot like Vogon’s poetry to most readers, don’t you?

Can you post an example?

If I get this right you propose to add mangle marks (even if not strictly needed depending on the failover method chosen) like “to_ISP1” or “to_ISP2” to be able to filter the packets and then selectively remove the (old ISP) existing connections.

How exactly one could do the above?

lurker888 · June 24, 2025, 7:36pm

I for one happen to like the Vogon philosophy.

You might have gathered from our interactions that I have a bit of an unorthodox view of a few things.

First of all, I think that trying for “fast” failover in these is scenarios is a needless and futile effort. In my experience many of the failed ISP connections don’t fail “correctly”, they just become lossier and lossier, become intermittent, etc. I think the main goal should be solid failover: that is, we accept the fact that video/voice calls will be interrupted, websites will stutter or need reloading, what we want is to be able to resume whatever we were doing with a minimal interruption and with no manual intervention necessary.

As an aside, personally - while I respect the wisdom of the forum and its members - when failover is needed, I use some sort of script with quite wide tolerances. I think that having a minute of connectivity problems and then resuming things is better than having a situation where some packet loss can lead to rapid back-and-forth toggling.

Most things nowadays handle failover - by this I mean connections just breaking on them for whatever reason - pretty well. This is mostly because of two things: they retry, and when they do they use ephemeral ports, which allows the firewall to recognize them as new connections. (Another aside: the 24h TCP timeout may have made sense in 1995, but today anything that wants a permanent connection has keepalive.)

There are some particular things that do not do this, and these are the ones we have problems with. Many of these are services, which either because of their specification or old-school (but correct) coding reuse the same ports. These are few and far between:

Wireguard, at least on Mikrotik. The mobile/$favorite_os clients have a concept of a “failed” connection and re-initialize their sockets
NTP (usually not a problem)
SIP phones, etc.

I think it’s best to handle the exception specifically: if this sort of traffic is part of the network, you usually know about it at the network level. It makes sense to identify them somehow: mark these (in the appropriate mangle chain) and delete these selectively. Or identify them by the ports they use, source addresses, protocols, etc. Filtering by marks is usually faster.

I just think it’s better to delete selectively, rather than excessively. For one, it’s faster and we don’t totally unnecessarily delete connections. When you have a VPN for tunneling things, it’s even harmful to be needlessly thorough.

But if you remove connections, at least do it correctly.

Anyway, that was my Vogon treatise on the issue.

jaclaz · June 24, 2025, 9:47pm

Yep, a very good philosophical explanation of WHY one would want (or would want not) to kill connections, and not particularly Vogonish.

Too bad my question was more (actually only) about HOW to mark and kill (selected) connections, but I know, resistance is useless.

lurker888 · June 24, 2025, 11:17pm

Simply:

# All my devices connect to a central point for management via WG.
# On a WAN change, I'll want to re-nat them.
/ip firewall mangle add chain=prerouting action=mark-connection new-connection-mark=renat connection-mark=no-mark protocol=udp src-port=13231

# I want to remove connections marked "renat"
# Of course I should use one of the error-tolerant formulations by anserk, but just for semantics:
/ip firewall connection remove [ find connection-mark=renat ]

My remark about fasttrack was simply that this solution can be used on fasttracked connections as well. The marks are applied when the first packet of the connection traverses the firewall, still in the connection-state=new state, and persist even after being fasttracked.

jaclaz · June 25, 2025, 7:56am

Don’t you anyway need a timeout like “where timeout>60” or some “on error” construct?

lurker888 · June 25, 2025, 7:59am

Exactly. As I’ve been thoughtful enough to write just above the command

BartoszP · June 25, 2025, 8:06am

Just thinking:

If the failover route goes via the distinct own interface and the main has it’s own interface, then what is the problem to just disable the main one? Isn’t ROS killing/flushing all connections assigned to that interface by itself?

lurker888 · June 25, 2025, 8:14am

Good question. The kernel behavior changed several times. For interface down, currently yes; for loss of address, change of route or pref src, no.

jaclaz · June 25, 2025, 9:28am

So - as often happens - it is more * something like * the actual command to use.

I think most of the common failover setups (at least those I have seen on the Forum) actually use a separate interface “upstream” (towards ISP) so if by disabling (temporarily) the one currently without connection is enough to kill existing connections automatically, it seems to me very easy to implement (if using netwatch).
But I believe one could use the netwatch mechanism also additional to (plainer) recursive.

anserk · June 25, 2025, 5:55pm

I wonder if disabling/enabling connection tracking kills all connections.

/ip/firewall/connection/tracking
set enabled=no

lurker888 · June 25, 2025, 7:56pm

I get the (vague?) impression that you think I like writing poetry and only by insinuation providing solutions. You are probably right. I think understanding matters more than the exact sequence of commands. In things that count, e.g. in security, I strive to be exceptionally clear.

The exact command I suggest is

:foreach i in=[ find where connection-mark=renat ] do={ :onerror e in={ remove $i} do={} }

@anserk pointed out that there’s a simpler form:

foreach i in= [ find where connection-mark=renat ] do= { remove $i } on-error= { }

I haven’t tested @anserk’s answer, but I have no reason to doubt it. It makes sense for all “do=” constructs to accept an “on-error”, but on a cursory look, I couldn’t find any documentation to this effect. This would of course not be the first undocumented thing, and again: it makes sense.

@anserk: I was going to chime with the exact thing: for nuking all entries, disabling/re-enabling conntrack is an option. I would only do this as a last resort - it results in some nasty things, but it does accomplish what it’s supposed to in the least amount of time.

anserk · June 26, 2025, 12:21am

There is none that I could find either, but to be fair, this turns out to be quite a recent thing.

What’s new in 7.19 (2025-May-22 10:53):
*) console - added on-error to “for” and “foreach” loops;

I saw that the “do” has it documented and just tried with the foreach loop. It worked, but now I see it probably doesn’t work with below 7.19.

The :onerror is relatively new too, but at least it is documented.

What’s new in 7.13 (2023-Dec-14 09:24):
*) console - added “:onerror” command;

yearly7100 · February 1, 2026, 2:56am

anserk:

The foreach loop is the key since each iteration is an individual command. The suggestion from @lurker888 can be simplified a bit. Both command produce the same result:
:foreach i in={"dummy1"; "dummy2"; "dummy3"} do={ :onerror e in={ :put $i; /file/get $i } do={} }

:foreach i in={"dummy1"; "dummy2"; "dummy3"} do={ :put $i; /file/get $i } on-error={}

The simplified version (second command) actually doesn’t work on either RouterOS 6 or RouterOS 7.

And the original approach (first command) by @lurker888 only works on RouterOS 7.

On RouterOS 6, the following version can be used (also works on RouterOS 7):
:foreach i in={"dummy1"; "dummy2"; "dummy3"} do={ :do {:put "$i"; /file get $i } on-error={}}

Amm0 · February 1, 2026, 3:53am

And it should be noted that NAT masquerade will also clear it's connection in 7.20.7+

*) firewall - clear relevant masqueraded connection tracking entries on IP address change;