How is this failover mechanism working??

There appears to be many, many ways to implement fail over between two ISP WAN connections on a hEX. Most of them are a combination of netwatch, script and scheduled scripts.

I have a urgent job where the client wants to fail over to a mobile phone hot spot if the main VDSL2 line goes down. I solved the “How do you connect to a mobile phone hot spot into the network” using a hAP ac lite configured as a Wi-Fi repeater. But that’s almost an aside.

This article talks about fail over without scripting:

https://wiki.mikrotik.com/wiki/Advanced_Routing_Failover_without_Scripting

There is an immediate gotcha with the first bit of code:

/ip route
add dst-address=Host1 gateway=GW1 scope=10
add dst-address=Host2 gateway=GW2 scope=10

This kind of assumes that the gateways are static. Common maybe in a bigger business but not so in this scenario where the WAN links are via DHCP clients - esp. mobile hot spots. So I had to put together a small scheduled script that queries the DHCP client for the current gateway and updates the route table. This is the code for one of the WAN ISP connections:

:if ([:len [/ip dhcp-client find comment=ISP1]] > 0) do={
	:put "Updating ISP1 gateway"
	:local ISP1Gateway
	:set $ISP1Gateway [/ip dhcp-client get [find comment="ISP1"] gateway]
	/ip route set gateway=$ISP1Gateway [find comment="Host1"]
}

This script linked in with the article above works a treat but my reason for this post is that I haven’t a clue how it works!! Unplug ISP1/WAN1 and it magically starts routing traffic via ISP2 (mobile hot spot). Reconnect ISP1 and it switches back.

Anyone fancy having a bash of explaining the process that’s going on here? I’m guessing that it’s sort of dynamically changing the distance values depending upon which routes to the two hosts are working. If it’s pinging Host1, then GW1 gets distance 1 & GW2 = distance 2 but if it’s getting through to Host2, then these are swapped over.

But I always like to understand what exactly is going on. What’s the route marking all about?

The more I get to know RouterOS, the more I appreciate it’s awesome power. And in a box costing just £54! Amazing

Better than scheduled script is lease script for DHCP client, because it will run only when DHCP gets new address/gateway, and immediatelly when it happens, without any delay. Scheduled script wastes resources, because it runs all the time, even when nothing changes. And when it does change, it can be a while before next run.

How recursive routes work is described in manual (https://wiki.mikrotik.com/wiki/Manual:IP/Route#Nexthop_lookup). The general idea is quite simple, when you set scopes right, gateway can be not only directly reachable address, but also some remote address (not actually that address, but gateway through which it’s reachable). But more specifically, what’s scope and target-scope, I don’t find that part intuitive at all.

FWIW, this is what I wrote up in my documentation the last time I set redundant providers up a year or two ago. I have a new client where I’m going to go through this again in a couple of days, so if I find any discrepancies, I’ll post an update.


(Reference: https://wiki.mikrotik.com/wiki/Advanced_Routing_Failover_without_Scripting)

This can be expanded when there are multiple routes to a destination but no routing protocol available to detect failures. The failover takes about 30 seconds. It is not adjustable. There are other ways to accomplish this with scripting if faster detection and response are required, and other more traditional methods aren’t available.

This set of routes and route policy (PBR) sets up a recursive route which uses ping to detect a failure to a remote “gateway(s)” (in this case, 8.8.8.8 and 4.2.2.2).

If each monitored distant “gateway” is unreachable, the default route defined in route 1 (add distance=1 dst-address=0.0.0.0/0 gateway=10.10.10.1 routing-mark=Charter) will no longer be available.

Note that the routing-mark (AKA alternate routing table) is not required. If it is not included, the routes will affect the main routing table.


These routes define how the router should reach an arbitrary address, 10.10.10.1, via the distant "gateway. " The “gateway” IP address(es) will be monitored with ping. Only one of these routes will be active at a time, unless all “gateways” are unreachable.

  add check-gateway=ping distance=1 dst-address=10.10.10.1/32 gateway=8.8.8.8 scope=10
  add check-gateway=ping distance=1 dst-address=10.10.10.1/32 gateway=4.2.2.2 scope=10

These routes define how the router should reach a distant “gateway” IP via the actual next hop.

  add distance=1 dst-address=8.8.8.8/32 gateway=35.129.32.1 scope=10
  add distance=1 dst-address=4.2.2.2/32 gateway=35.129.32.1 scope=10

This route adds a default route via fake gateway IP 10.10.10.1 in the routing table “Charter”. This route is dependent on at least one available recursive route. In this case, if both of the above recursive routes are considered down (because they aren’t pingable), then this route is removed from the routing table. Note that the dst-address= is assumed when configuring via CLI, but is included here for clarity.

 /ip route
  add distance=1 dst-address=0.0.0.0/0 gateway=10.10.10.1 routing-mark=Charter

It all starts working when PBR is set to route all traffic using the “Charter” routing table rather than the main (default) routing table.

 /ip route rule
  add dst-address=0.0.0.0/0 table=Charter

can you explain again how to deal with the dhcp interfaces ?
I am trying to setup 2 internet connections from one router.
I would prefer to use statics, however:
It seems the connection only fully works if a DHCP client is enabled for one of the interfaces and the active default route is from DHCP.
without that I have 2 problems:
ping out bound: Destination net unreachable
connecting L2TP VPN inbound fails
so what's the easiest way to setup the config ?
Can it be done without scripting ?

in my config below if I enable any routes marked X then things don't work
All the other routes work.
What I really want is for all outbound traffic to go via Vodafone on 5g but inbound such as L2TP VPN to work via the Sky ADSL.
(as 5g can't give me a public IP address...)
Its seems the ADS route is essential.

DST-ADDRESS PREF-SRC GATEWAY DISTANCE

0 A S ;;; Vodafone gateway for checking
192.168.213.21/32 192.168.8.1 1
1 A S ;;; Sky gateway for checking
90.207.238.97/32 ether1-gateway 1
2 ADS 0.0.0.0/0 192.168.0.1 7
3 X S ;;; default route via Sky
0.0.0.0/0 90.207.238.97 2
4 X S 0.0.0.0/0 ether1-gateway 1
5 X S ;;; Default route via Vodafone
0.0.0.0/0 192.168.213.21 1
6 X S 0.0.0.0/0 192.168.8.1 2
7 A S ;;; google
172.217.0.0/16 192.168.8.1 1
8 ADC 192.168.0.0/24 192.168.0.9 ether1-gateway 0
9 ADC 192.168.8.0/24 192.168.8.2 ether5-5g-gateway 0
10 ADC 192.168.20.0/24 192.168.20.254 bridge-local 0
11 ADC 192.168.30.0/24 192.168.30.254 DMZ 0
12 A S ;;; microsoft updates
205.0.0.0/8 192.168.8.1 1


0 ;;; internal network
192.168.20.254/24 192.168.20.0 bridge-local
1 ;;; DMZ
192.168.30.254/24 192.168.30.0 DMZ
2 ;;; Sky ADSL
192.168.0.9/24 192.168.0.0 ether1-gateway
3 ;;; 5G connection
192.168.8.2/24 192.168.8.0 ether5-5g-gateway
4 D 192.168.0.100/24 192.168.0.0 ether1-gateway

Thanks for the posts - will need to put aside a bit of time to digest them.

can you explain again how to deal with the dhcp interfaces ?

Was that directed at me? If so, then this is the full script I have at the moment:

# Failover: part of the ISP failover system.
# Updates the route with the current gateway of the two ISP DHCP-clients
# The actual failover is done by the special routing

# Version history
# v1.00 28/01/2020 New: original version
# v1.01 30/01/2020 Fix: Checks that gateway returned is not-empty

:if ([:len [/ip dhcp-client find comment=ISP1]] > 0) do={
	:local ISP1Gateway
	:set $ISP1Gateway [/ip dhcp-client get [find comment="ISP1"] gateway]
	:if ([:len $ISP1Gateway] > 0) do={
		:put "Updating ISP1 gateway"
		/ip route set gateway=$ISP1Gateway [find comment="Host1"]
	}
}
:if ([:len [/ip dhcp-client find comment=ISP2]] > 0) do={
	:local ISP2Gateway
	:set $ISP2Gateway [/ip dhcp-client get [find comment="ISP2"] gateway]
	:if ([:len $ISP2Gateway] > 0) do={
		:put "Updating ISP2 gateway"
		/ip route set gateway=$ISP2Gateway [find comment="Host2"]
	}
}

So one basically adds the two routes as documented in original article using the current DHCP dynamic gateway settings:

/ip route
add dst-address=Host1 gateway=GW1 scope=10
add dst-address=Host2 gateway=GW2 scope=10

And this script runs every minute to update them to the current value. Uses comments to identify the DHCP client record and route table entries.

I’ve just come onsite to test the fail over in a more accurate scenario, i.e. the WAN DHCP client is still up but there is no route to the internet. Unfortunately it doesn’t work as covered in the original article. So I must be missing something. In my earlier tests, I was disconnecting the WAN connection which is a different failure to the link still being up but no route.

I notice that the original article I posted to has been removed… so back to the drawing board.

Better than scheduled script is lease script for DHCP client, because it will run only when DHCP gets new address/gateway

Thanks for the heads up on that - yes, a script in the DHCP client is a more sensible location rather than scheduled script.

I managed to fix the issue where DHCP client default gateway is required to reach the Internet.
Enable the detect Internet feature under Interfaces and you can then use your static routes instead.
I set Detect Interface List and Internet Interface List to All
Not really sure how this works - can any one explain ?

Thanks for that heads-up. Sounds like there are many way to skin a cat here. Will check out this feature later.