Mikrotik Holding Connections on Failover

Good day everyone

I have the following configuration:
Interfaces
Eth1: Primary Internet Connection => Static IP from ISP
Eth2: Local Network, DHCP => 192.168.1.0/24
Eth3: 192.168.2.1 Connected to LTE router (Router IP: 192.168.2.1)

IP Routes
0.0.0.0/0 => Gateway => Distance 1
0.0.0.0/0 => 192.168.2.1 => Distance 2

The failover works instantly when the primary gateway fails, however all devices on the local network cannot access the internet. I then need to manually log into the Mikrotik and clear all the entries from the IP>Firewall>Connections window before the devices can access the internet.

Is there a way to immediately clear these entries automatically or am I doing something wrong?

Thanks in advance.

It is normal that there are problems with existing connections when a failover like this occurs.
Please specify “cannot access the internet”. Do you mean that their existing activities are disrupted (that is normal!) and that e.g. a SIP (VoIP) telephone or application no longer works (that is unfortunate, but also normal)?
Or is it really impossible to access the internet after failover, e.g. you close the browser and re-open it to some site you were visiting and it still won’t connect?

If you use an action=masquerade rule for srcnat, whenever the address used as reply-dst-address in connections is lost, all such connections are removed automatically.

However, if the actual problem is further in the network and the gateway interface doesn’t go down,

  1. you have to use a periodically scheduled script to remove the connections,
  2. you need to use recursive next-hop search or netwatch to check the path transparency all the way to internet to detect the failure.

But that would only work when the actual interface that holds that address is going down, e.g. when PPPoE is in use and the PPPoE connection actually fails.
For a plain ethernet connection with a static IP or a DHCP lease with reasonable time, the address will not get lost when the link to the provider fails e.g. between modem and ISP.
(it could work when the modem powercycles, however only when the ethernet port is used directly, not when it is on a bridge)

And as you correctly write, the actual failure may be one or more hops further down at the ISP and the whole failover mechanism doesn’t trigger unless you use a mechanism like recursive routing or monitoring of some destination (netwatch or a ping in a regularly scheduled script).

Thank you for the responses.
The failover is set to trigger as soon as the Gateway is unreachable.

@pe1chl
On some browsers I can close the browser and open and then it works but others take a while longer, however as you mentioned, the SIP phones cannot register again until those connections are removed.

@Sindy
I use action=masquerade rule for srcnat. Will a periodically scheduled script work in this case? Maybe schedule to run as soon as the 0.0.0.0/0 IP route change?

Checking accessibility of the gateway is good, but checking accessibility of some immortal internet address via that gateway is even better - ISPs’ gear also breaks sometimes.


Sure, it’s only that tracked connections whose src-nat behaviour is created by a masquerade rule are cleared automatically when the address disappears, but of course they can also be cleared using the /ip firewall connection remove command. On a negative note, I’ve read here but never encountered that myself, that some SIP connections resist the remove command (or at least did in the past). So there, using a masquerade rule and then using /ip dhcp-client xxx release instead of /ip firewall connection remove might have a dual advantage, killing also these resistive connections and also having more job done by the conntrack module itself (if you say /ip firewall connection remove [find reply-dst-address=x.x.x.x], the script engine creates a list of all connections matching the condition first, and then kills those on the list one by one; I guess conntrack itself does the same but much faster). But both are qualified guesses, I’m no insider.


That’s the problem - the current RouterOS lacks the possibility to link a script to many events where it would be useful. You can schedule scripts for a given start time and/or a given periodicity, but only some objects can have a script linked to events related to them. These are dhcp client and server, PPP-based interfaces (via /ppp profile) and to some extent netwatch (in terms that the periodicity is related to pinging but running a script is only triggered by state change when the monitored IP stops responding or starts responding again). But you cannot trigger a script by a match of a firewall rule, by expiration of an item on address-list, … - any such event has to be checked by a script running periodically and storing the previous state of the monitored object, and running its executive part conditionally if it detects the change.