How to flush connection in a failover route change ?

Main 0.0.0.0/0 route points to a virtual GW checking (ping) recursively two internet hosts.
Secondary 0.0.0.0/0 route (distance 2) becomes active when the first one fails, but active connections are still hung on primary route , preventing navigation.
Of course manual connections flush does the trick.

What’s the best way to do it automatically ??

Of course the command to flush connections would be “/ip firewall connection remove [find]”.
But the tricky part is how to call that when a failover configuration like that has been done. I don’t think you can call a script on failure.
(of course you can install an additional netwatch to do it)

Do not remove or alter the timeout condition value

/ip fire conn
:foreach idc in=[find where timeout>60] do={ remove [find where .id=$idc] }

Can you verbally describe what the script is doing at each step, uncode it in PLAIN italian… (then will reverse engineer it in English :wink: )

And note that I have no idea what /ip fire conn refers too…

Italian is not involved, simply english…

/ip firewall connection
:for each idofconnection inside the results obtained from research [ find where timeout value is major of 60 ] do remove all the connections where the ids are equal to idofconnections obtained

and NOT,
this is NOT equal as
/ip firewall connection remove [find]
or
/ip firewall connection remove [find where timeout>60]

No point in flushing TCP connections as they’ll eventually die out anyway for the lack of response - and if there was no NAT, flushing would not be necessary. UDP connections and/or ping “connections” are another thing, as those updated from the LAN side will survive forever unless you remove them.

So rather than selecting the ones to be removed by timeout, I’d choose them on reply-dst-address. You don’t need the foreach cycle as suggested by @rextended unless you want to avoid a load peak on the CPU - /ip firewall connection remove [find where protocol=udp reply-dst-address~“ip.of.inaccessible.wan”] will remove them all, of course using a cycle, but an internal, and therefore faster, one.

But as others have already stated, there is no event related to state change of a route, to which you could hook the execution of the script. So /tool netwatch is one possibility (with some non-obvious limitations), /system scheduler is another one. With the scheduler, you have to check the route state at every run, so the full script would look somehow like :if ([:len [/ip route find where gateway=x.x.x.x dst-address=0.0.0.0/0 active]]=0) do={/ip firewall connection remove [find where protocol=udp reply-dst-address~“ip.of.inaccessible.wan”]}, and you can even place it directly into the on-event parameter of the /system scheduler row.

Already considered and made for
http://forum.mikrotik.com/t/dual-wan-failover-script-ping-command/150516/8

:global newIP [:tostr $"local-address"]

/ip fire conn
:foreach idc in=[find where timeout>60 and (!(reply-dst-address~$newIP))] do={
 remove [find where .id=$idc]
}

But with too few details, as usual, I cannot adapt it to this user request without any data…

In this case, if IP obtained are the same, do nothing, if IP obtained is different, drop all “invalid” tracked connection (that have timeout major to 60 seconds)

On RouterOS default configuration is 1 day timeout for estabilished TCP, unacked TCP 5 minutes, for other TCP status are usually 10 seconds…
On UDP Streams default configuration is 3 minutes

Correct, my point is that the TCP client will drop the session on timeout and use another port to create a new one, so the fact that the tracked connection related to the old session still exists doesn’t prevent the new one from establishing. So removing TCP connections just helps speed a little bit (faster search) but doesn’t prevent new ones from succeeding, in contrary to the UDP and ICMP echo case.

Yes, I do not want to object :stuck_out_tongue:
I want just say that on the device that I configure, estabilished timeout are reduced to 10 min, udp to 2 min and unacked to 1 min.
At current internet speed is absurd to wait 5 min to a reply and 1 day to a single packet to pass…

try modifying connection tracking timings to make broken connections expire sonner

at this time ,better wait @ik3umt reply, assuming he answers…

Thanks for replies,

Not sure what reply you’re expecting from me… :astonished:

Just as said, that failover technique works fine, connection on failover route is immediately available but (tcp,udp, etc) old connections are stuck on waiting/sleeping, so no internet for current LAN users (except for new connections) despite backup route is up

I’m just finding the IMMEDIATE way to automatically restart all current connections toward new route (and IMMEDIATELY back on main one when it becomes available again).

Unfortunately I can’t netwatch the virtual recursive host as it doesn’t respond to ping,
I could netwatch real internet hosts under test ( 1.1.1.1 , 8.8.4.4 etc) but a single lack of packet doesn’t necessary mean main route failure.

I think the problem is not in what the script will do (what to flush) , instead, by what i recall the script… basically, how to detect routes switch

Not sure what @rextended was expecting, but for me this narrowing the question down to the only part you are really missing did make sense.

However, it does not change anything about what I’ve already written. Currently, the only asynchronous events you can hook a script to are DHCP lease state change, PPP interface state change, and VRRP interface state change. Neither of those can be related to a route state change in any useful way (leaving aside the crazy idea of establishing a PPP tunnel solely for the purpose of monitoring link state, which would introduce an additional delay anyway).

Netwatch state change cannot be considered an asynchronous event in terms that it detects the state change of the route with some delay due to the nature of its operation. In fact, the failover based on recursive next-hop search also doesn’t detect the outage immediately, the check-gateway pings are hardcoded to a 10-second periodicity. And you cannot synchronise netwatch or scheduler to those intervals so that they could check the outcome immediately after the check-gateway test has ended. So a periodically scheduled script is currently the only option - you check the route state every second and take action if it is down.

Netwatch is currently becoming a bit better in v7 (from v7.4beta5) but still it has lots of issues as described above.
You could file a feature request to have a state-change script call for checked routes, that would be useful for others too.

Well…I’m actually starting from scratch with the lab routerboard ad two WANs
When the first route fails I can use the failover one within few seconds (and few page refresh on various browser) , acceptable at all :open_mouth:
I came to ask this thread because I experienced systems in which was impossible to use browser for minutes (or at least until a manual connections flush).
Weird…I hope not to waste your time for some unknown issues other than a normal connection timeout/refresh…

Hi guys,

Searching for a way to flush the connection tracking table like I used to under linux I stumbled upon this old thread.

In my past, mainly SIP posed a real issue when changing routes because of it’s OPTIONS/keep alive packets for NAT traversal.
My current problem is similar: very high frequent UDP packet flows for an IPsec Tunnel from a device in LAN to a remote ipsec server.

So what I’d need is a real flush, not a sequential delete with error intolerance…
I tried it with the intuitive approach that @pe1chl also mentioned above, just removing ‘[ find ]’, of course not removing more than a hand full of, before running in an error condition for a missing number.
Then I searched here and found this topic and @rextended 's approach. Using >10s as a condition “works”, but as it doesn’t clear everything and most importantly not my ipsec connection this approach doesn’t work either.

The situation is:

  • “router” is a 2216, behind that is a firewall (192.168.60.5) that is creating an ipsec tunnel to 11.22.33.44.
  • Currently “router” doesn’t have an internet connection, and is forwarding, without NAT the packets to an other router, that is NATing the traffic to the interwebs. This router is using an other public IP address.

What I want to do is:

  • “router” got a physical connection to an ISP now (same ISP, different public IP)
  • this connection is on sfp28-8
  • there is no default route yet and every time I add the new default route, some connections and most importantly this ipsec tunnel fails

Of course there is a NAT entry:

 /ip/firewall/nat/export
# 2025-06-05 19:16:07 by RouterOS 7.16.1
/ip firewall nat
add action=src-nat chain=srcnat comment="NAT local networks" out-interface=sfp28-8 src-address=192.168.0.0/16 to-addresses=44.55.66.77

So, what I’m doing is setting a new default route over sfp28-8 to my gateway.
What I see is: packets take the new route and are being put on the wire – unNATed with their original (LAN) source address.

[user@router] <SAFE> /ip firewall/connection/print count-only
39533
[user@router] <SAFE> /ip firewall connection remove [find where timeout>10 ]
[user@router] <SAFE> /ip firewall/connection/print count-only
12159
[user@router] <SAFE> /tool/sniffer/quick interface=sfp28-8 ip-address=10.0.0.0/8
Columns: INTERFACE, TIME, NUM, DIR, SRC-MAC, DST-MAC, SRC-ADDRESS, DST-ADDRESS, PROTOCOL, SIZE
INTERFA  TIME     NUM  DI  SRC-MAC            DST-MAC            SRC-ADDRESS         DST-ADDRESS                  PROTOC  SIZ
sfp28-8  8.095  11051  ->  48:B5:C4:32:25:F6  5C:76:AB:D3:14:02  192.168.60.5:4500    11.22.33.44:4500         ip:udp  210
sfp28-8  8.096  11052  ->  48:B5:C4:32:25:F6  5C:76:AB:D3:14:02  192.168.60.5:4500    11.22.33.44:4500         ip:udp  162
sfp28-8  8.097  11053  ->  48:B5:C4:32:25:F6  5C:76:AB:D3:14:02  192.168.60.5:4500    11.22.33.44:4500         ip:udp  146
sfp28-8  8.102  11054  ->  48:B5:C4:32:25:F6  5C:76:AB:D3:14:02  192.168.60.5:4500    11.22.33.44:4500         ip:udp  162
sfp28-8  8.106  11057  ->  48:B5:C4:32:25:F6  5C:76:AB:D3:14:02  192.168.60.5:4500    11.22.33.44:4500         ip:udp  226
sfp28-8  8.107  11058  ->  48:B5:C4:32:25:F6  5C:76:AB:D3:14:02  192.168.60.5:4500    11.22.33.44:4500         ip:udp  130
sfp28-8  8.107  11059  ->  48:B5:C4:32:25:F6  5C:76:AB:D3:14:02  192.168.60.5:4500    11.22.33.44:4500         ip:udp  226
sfp28-8  8.107  11060  ->  48:B5:C4:32:25:F6  5C:76:AB:D3:14:02  192.168.60.5:4500    11.22.33.44:4500         ip:udp  162
sfp28-8  8.107  11061  ->  48:B5:C4:32:25:F6  5C:76:AB:D3:14:02  192.168.60.5:4500    11.22.33.44:4500         ip:udp  162
sfp28-8  8.11   11062  ->  48:B5:C4:32:25:F6  5C:76:AB:D3:14:02  192.168.60.5:4500    11.22.33.44:4500         ip:udp  130
sfp28-8  8.113  11063  ->  48:B5:C4:32:25:F6  5C:76:AB:D3:14:02  192.168.60.5:4500    11.22.33.44:4500         ip:udp  130
sfp28-8  8.113  11064  ->  48:B5:C4:32:25:F6  5C:76:AB:D3:14:02  192.168.60.5:4500    11.22.33.44:4500         ip:udp  162
sfp28-8  8.119  11065  ->  48:B5:C4:32:25:F6  5C:76:AB:D3:14:02  192.168.60.5:4500    11.22.33.44:4500         ip:udp  802
sfp28-8  8.122  11066  ->  48:B5:C4:32:25:F6  5C:76:AB:D3:14:02  192.168.60.5:4500    11.22.33.44:4500         ip:udp  482
sfp28-8  8.13   11069  ->  48:B5:C4:32:25:F6  5C:76:AB:D3:14:02  192.168.60.5:4500    11.22.33.44:4500         ip:udp  578
sfp28-8  8.133  11070  ->  48:B5:C4:32:25:F6  5C:76:AB:D3:14:02  192.168.60.5:4500    11.22.33.44:4500         ip:udp  146

using shorter timeouts doesn’t work of course:

[user@router] <SAFE> /ip firewall connection remove [find where timeout>2 ]
no such item (4)

Funny thing is… in the “old” situation, the timeout is much larger than 10s:

[user@router] > /ip/firewall/connection/print where dst-address~"192.168.60.5.*" and src-address~"11.22.33.44.*"
Flags: S - SEEN-REPLY; A - ASSURED; C - CONFIRMED
Columns: PROTOCOL, SRC-ADDRESS, DST-ADDRESS, TIMEOUT, ORIG-RATE, REPL-RATE, ORIG-PACKETS, REPL-PACKETS, ORIG-BYTES, REPL-BYTES
 #     PROTOCOL  SRC-ADDRESS           DST-ADDRESS       TIMEOUT  ORIG-RATE  REPL-RATE  ORIG-PACKETS  REPL-PACKETS      ORIG-BYTES       REPL-BYTES
92 SAC udp       11.22.33.44:4500  192.168.60.5:4500  3m       203.6kbps  163.0kbps    89 897 210   144 725 843  22 548 518 707  129 096 821 662

So, it should be removed by above query.



I picked the UDP and ipsec connection here as an example, but of course there are others all over the protocol spectrum as well.
As internal clients will lose connection in that scenario, I can’t test too long and failback by disabling the new default route.

Connection tracking is the only thing that came to my mind, but maybe I’m hunting for the wrong white rabbit?
If I’m looking in the right direction: is there some other way to really flush and not just clean up the table?

If you don’t have any better idea, I will have to

  • set the new default route
  • reboot the router (maybe using it for a routeros update as well)
  • hope that it works after reboot with a clean conntrack table…


    Thanks for your ideas and help, very much appreciated!

Irrwitzer

This is the magic incantation for error tolerance:

:foreach i in=[ find where protocol=udp ] do={ :onerror e in={ remove $i} do={} }

The connection mark is also available as a filter criterion, and in similar situations I’ve found that it’s easiest to mark sensitive connections via mangle on a continuous basis and use this mark to delete them quickly and selectively. (This sort of marking is compatible with fasttrack, given that the connections are marked while still in the “new” stage.)

Anyway, gook luck!

Thanks a lot @lurker888 ,

That looks promising :wink: I’m not used to scripting mikrotik, never tried it to be honest.
I will test this in lab and then live and report back.

Thanks again and have a great weekend!

I use

:do { remove [find protocol=“udp”]; } on-error={ };