Simpler Failover for two Gateways I found working

Hey,

like many others I was wondering how to accomplish a simple failover with two Gateways (here: DSL and LTE) with MikroTik involved.
Searching the Internet and this Board, all I was able to find was “Recursive Routes” with checking e.g. 8.8.8.8 as a “Gateway”.
This was not working at first and I wasn’t happy with recursion in the routes so I managed to get the task done with another way I was not able to find anywhere while searching, so I’m sharing this:

Done this on RB5009 yesterday - in Winbox:

1. Prerequirements:

  • Network with DHCP done by MicroTik (in this case: 192.188.1.0/24)
  • Standard Gateway in DHCP will be the MikroTik (here: 192.168.1.2)
  • Internet available at (for Example) 192.168.1.1 (in this case DSL)
  • Internet available at (for Example) 192.168.1.250 (LTE-Modem)

2. Routing:

  • Standard Route 0.0.0.0/0 set to 192.168.1.250 with Distance 1 comment=LTE-Failover → (keep it DEACTIVATED)
  • Standard Route 0.0.0.0/0 set to 192.168.1.1 with Distance 2

3. Go to ROUTING → TABLES

  • Create a Routing Table named (for Example) “DSL” - check FIB

4. Go To IP → ROUTES → Click +

  • Dst,Address: 0.0.0.0/0
  • Gateway: 192.168.1.1 (your Primary Gateway)
  • Routing Table: Select above created ROUTING TABLE (here: “DSL”)

5. Go to IP → FIREWALL → Tab MANGLE
Create a MANGLE-Rule:

  • Tab → GENERAL
    – Chain: output
    – Dst.Address: 8.8.8.8
    – Protocol: 1 (icmp)
  • Tab → ACTION
    – Action: mark routing
    – New Routing Mark: Select above created ROUTING TABLE (here: “DSL”)

6. Go to TOOLS → NETWATCH
-Tab → HOST
– Create a Netwatch Host:
— Host: 8.8.8.8
— Type: icmp
— Interval: 00:00:30
— Timeout: 5.00

-Tab → Down
/ip route enable [find comment=LTE-Failover]

-Tab → Up
/ip route disable [find comment=LTE-Failover]

What’s this doing?

We were creating TWO STANDARD ROUTES for Traffic leaving the local network to the internet.
The secondary route (in this case LTE) has a higher priority (say: “lower distance”) but is kept disabled.
By creating a second Routing Table and a firewall mangle-rule we will force the ICMP-Request to 8.8.8.8 through the primary gateway (in this case: DSL).
Netwatch is able to perform scripts if the host becomes unavailable through the primary route.
The DOWN-script will enable the secondary route which will become active immediately due to the higher priority (say: “lower distance”)
All traffic to Internet will go through the secondary route now.
Netwatch will still check every 30 seconds pinging 8.8.8.8 forced to the primary gateway as of our mangle-rule.
If 8.8.8.8 will be available again through the primary gateway the UP-script will deactivate the secondary route again.
All traffic will go through the primary route again.

Please note that you will not be able to use the host used ( in this case 8.8.8.8 ) as an upstream DNS-Server, since it won’t work when LTE kicks in.

I’m not an MikroTik-Expert by far, still learning, but I found this way a bit more straight-forward and understandable than the “recursive routes” many tutorials show up with. Also you can extend the scripts by sending EMails out (configure TOOLS → EMAIL first) by adding for example:

:delay 10
/tool e-mail send to=youremail@host.com subject=“DSL is DOWN!!” body=“DSL inactive - LTE active”

at the end of the script.

Still, I was wondering, if this is already documented somewhere, that’s why I posted it here. Please disregard or close if this is “too obviuous” or “already well documented” :slight_smile:

Have a great day, everyone, many greetings,
Martin!

EDIT: I was choosing this variant for failover over the “recursive Routes” because I’d like to maintain more control about failover.
The script can be extended, and getting an EMail, WHEN failover happens is quite nice. Also we could add even MORE Netwatch-hosts. For example: The FIRST netwatch checks 8.8.8.8 and if this fails a script may ENABLE the SECOND Netwatch-Host to check, just to verify, and only after BOTH would fail, the secondary route may kick in. I think this has more opportunities at all :slight_smile:

Thanks FILO, nice explanation.

You should not use the numbers from “/ip/route/print” to disable an interface (step 6 in OP). The numbers are transitory, so you need to either use the .id for route or use [find something=that] as what to disable/enable. Why most other example use [find comment=“WAN1”] or something like that to find the route to enable/disable.

Also, without firewall marking, incoming connection are not possible using this approach. So VPN’s be tricky with this approach.

THIS is important - thanks for reminding me, will edit the first post accordingly today. When the board gets rebooted the IDs will / might change.

VPN is okay in this case - I‘m using a dynamic DNS able to update quite quickly through the Routerboard itself. Also two different Dynamic-DNS-Hosts are in place for each connection as a backup, so it‘s possible to VPN into any of those.

Thanks for the reply and correction!!

Edit: Script altered with “find” command and comment on LTE-Failover-Route

Will this also work with only a single subnet?

Yes.

The advantage of netwatch, primarily, is that you can vary some variables here to ascertain connectivity with more fidelity!!
For example, gateway-ping checks every 10 seconds, after two repetitive nil responses, the connection is deemed not active.
For many that is too long and thus netwatch if set at 10 seconds, is half that response time etc… Why the OP went 30 seconds is not understood, .???

sample (some of many) other parameters one can use for fidelity → ICMP PROBE OPTIONS: thr-avg, thr-jitter, thr-max, thr-stdev
https://help.mikrotik.com/docs/display/ROS/Netwatch

Finally, one has to be careful about ICMP probes from netwatch as they will leak and try to go out any available route…
This should be done in IP routes. Assume you have two wans, and doing netwatch on both… 1.1.1.1 is netwatch host for WAN1 and 1.0.0.1 is host for WAN2

/ip route
add comment=WAN1 distance=1 dst-address=0.0.0.0/0 gateway=XX.XX.XX.1 routing-table=main
add comment=WAN1-dns distance=1 dst-address=1.1.1.1/32 gateway=XX.XX.XX.1 routing-table=main
add comment=“Stop Leak” distance**=2** dst-address=1.1.1.1 black-hole=yes routing-table=main
++++++++++++++++++++
add comment=WAN2 distance=2 dst-address=0.0.0.0/0 gateway=XX.XX.XX.2 routing-table=main
add comment=WAN2-dns distance=1 dst-address=1.0.0.1/32 gateway=XX.XX.XX.2 routing-table=main
add comment=“Stop Leak” distance=2 dst-address=1.0.0.1 black-hole=yes routing-table=main

I want to do the same (5G + DSL-Failover).

  • Do you have your box in bridge or router mode?
  • What cable do you connect on what port on your box?

Hi,

I think using mangle mark packet is no more necessary since recent version of ROS.
Netwatch is able to define a src-address, then simply use it to ping from the DSL interface.
Moreover, I think that using mangle/mark can cause issue with fasttrack firewall rules.

Depends upon requirements. Netwatch potentially leaks in terms of using any route it can find to check ping… Thus if WAN1 checks 1.1.1.1 DNS, netwatch will try to use WAN2 to check 1.1.1.1 and when it does, will not report WAN1 as not available or at least that was what my understanding was. Probably wrong though,

I think @cyayon is right that that using the newer “src-address” in netwatch SHOULD work. But you’d need to know the src-address to set, which means having a static IP… so that kinda limits the approach while mangle it just setting routing table, which could have a interface route without IP.

Perhaps there is some effect with fasttrack+mangle…, but it’s already going to go via the CPU path since traffic is routed to internet.

Anyway, I think @Filo approach using mangle seems “safer” since it’s pretty explicit in what’s happening. But do think “src-address” might allow skipping the mangle step - you’d just want to make sure to test it pretty well since the src-address in netwatch is relatively new.

It would be great to be able to ping from interface (like on Linux ping -I …).
We can also use a dhcp client script which update Netwatch sec-address (if wan address is not fixed).
Another enhancement would be to be able to ping multiple IP before declaring wan interface down, like nested recursive routing …
It should be possible to do nested Netwatch too… but it will be complicated.
Personally I am using a script which run each minute and ping multiple IP from interface and if all ping failed, then disable primary default route (but keep another primary with a longer distance).

Here the script which is scheduled every minutes.

It’s far from perfect but it worked. I do not use my CCR2116 for dual-wan/failover, I moved my wan2 on another router (pure linux).
Do not hesitate to purpose enhancements and corrections.

# check wan
#
# use this with netwatch or scheduler
# prefer netwatch with src-address
#
# TODO : 
# use netwatch src-address
# disable dhclient update recursive route 

# version 20230803

:global checkWanStatus
:global checkWanRun

#
# define vars
#
# wan1
:local iface "wan1"
:local tableRoute "route-wan1"
:local gateway xx.xx.xx.xx.xx"
:local srcAddress "xx.xx.xx.xx"
:local distanceDefault 1
:local distancePersist 101
:local dstAddress "0.0.0.0/0"

# wan2
#:local iface "wan2"
#:local tableRoute "route-wan2"
#:local gateway "192.168.6.1"
#:local srcAddress "192.168.6.254"
#:local distanceDefault 9
#:local distancePersist 109
#:local dstAddress "0.0.0.0/0"

:local resetConn 1
:local addrTest1 "1.0.0.1"
:local addrTest2 "9.9.9.10"
:local addrTest3 "8.8.4.4"
:local countTest 3
:local LogHeader "check-wan"
:local email "xxx@xxx.com"
:local Date [/system clock get date];
:local Time [/system clock get time];
:local routeStatus
:local pingStatus
:local prouteStatus
:local trouteStatus

# init 
:if ( [:tostr $checkWanStatus]  = "" ) do={
      :set checkWanStatus ($Time . " " . $Date);
}
:set checkWanRun ($Time . " " . $Date);


:local msg
:local addr


#
# define dynamic vars
#
#:set dstAddress ""
#:set gateway ""
#:set srcAddress ""
if ( [:tostr $distanceDefault] = "" ) do={
           :set distanceDefault "1"
           :set msg "$LogHeader : defined distance=$distanceDefault"
           :put "$msg"
}

if ( [:tostr $dstAddress] = "" ) do={
    :set dstAddress "0.0.0.0/0"
    :set msg "$LogHeader : defined dst-address=$dstAddress"
    :put "$msg"
}
    
if ( [:tostr $gateway] = "" ) do={
    :set gateway ([/ip route print detail as-value where distance=$distanceDefault routing-table=main dst-address=$dstAddress ]->0->"gateway")
    :set msg "$LogHeader : defined gateway=$gateway"
    :put "$msg"
     if ( [:tostr $gateway] = "" ) do={
          :set msg "$LogHeader : null gateway=$gateway"
          #/tool e-mail send to=$email subject="$msg"
          :set checkWanStatus "ERROR"
          :error "$smg"
     }
}

if ( [:tostr $srcAddress] = "" ) do={
    :set srcAddress ([/ip/address print detail as-value where interface=$iface ]->0->"address") 
    :local delim [:find $srcAddress "/" 0]; :set srcAddress [ :pick $srcAddress 0 $delim ]
    :set msg "$LogHeader : defined src-address=$srcAddress"
    :put "$msg"
     if ( [:tostr $srcAddress] = "" ) do={
          :set msg "$LogHeader : null src-address=$srcAddress"
          #/tool e-mail send to=$email subject="$msg"
          :set checkWanStatus "ERROR"
          :error "$smg"
     }
}


#
# test table route
#
:set trouteStatus ([/ip route print detail as-value where gateway="$gateway" routing-table=$tableRoute dst-address=$dstAddress disabled=no ]->0->"distance")
if ( [:tostr $trouteStatus] = "" ) do={
     :set trouteStatus "FAILED"
     :set msg "$LogHeader : current table route dst-address=$dstAddress gateway=$gateway routing-table=$tableRoute FAILED !"
     :put "$msg"; 
     :log warning "$msg"
} else={
     :set msg "$LogHeader : current table route dst-address=$dstAddress gateway=$gateway routing-table=$tableRoute distance=$trouteStatus alive"
     :put "$msg"; 
     :set trouteStatus "OK"
}


#
# test current persistent route
#
:set prouteStatus ([/ip route print detail as-value where gateway="$gateway" distance=$distancePersist routing-table=main dst-address=$dstAddress disabled=no ]->0->"distance")
if ( [:tostr $prouteStatus]  != "$distancePersist" ) do={
     :set prouteStatus "FAILED"
     :set msg "$LogHeader : current persistent route dst-address=$dstAddress gateway=$gateway distance=$distancePersist routing-table=main FAILED !"
     :put "$msg"; 
     :log warning "$msg"
} else={
     :set msg "$LogHeader : current persistent route dst-address=$dstAddress gateway=$gateway distance=$distancePersist routing-table=main alive"
     :put "$msg"; 
     :set prouteStatus "OK"
}


#
# test current route
#
:set routeStatus ([/ip route print detail as-value where gateway="$gateway" distance=$distanceDefault routing-table=main dst-address=$dstAddress disabled=no ]->0->"distance")
if ( [:tostr $routeStatus]  != "$distanceDefault" ) do={
     :set routeStatus "FAILED"
     :set msg "$LogHeader : current route dst-address=$dstAddress gateway=$gateway distance=$distanceDefault routing-table=main FAILED !"
     :put "$msg"; 
     :log warning "$msg"
} else={
     :set msg "$LogHeader : current route dst-address=$dstAddress gateway=$gateway distance=$distanceDefault routing-table=main alive"
     :put "$msg"; 
     :set routeStatus "OK"
}


#
# test ping
#
:set pingStatus "OK"
:set addr "$addrTest1"
if ([/ping $addr src-address=$srcAddress count=$countTest]=0) do={
      :set msg "$LogHeader : ping src-address=$srcAddress $addr FAILED !"
      :put "$msg"; :log warning "$msg"
      :set pingStatus "WARNING"

      :set addr "$addrTest2" ;
      if ([/ping $addr src-address=$srcAddress count=$countTest]=0) do={
             :set msg "$LogHeader : ping src-address=$srcAddress $addr FAILED !"
             :put "$msg"; :log warning "$msg"
             :set pingStatus "WARNING"

             :set addr "$addrTest3" ;
             if ([/ping $addr src-address=$srcAddress count=$countTest]=0) do={
                    :set msg "$LogHeader : ping src-address=$srcAddress $addr FAILED !"
                    :put "$msg"; :log error "$msg"
                    :set pingStatus "FAILED"
             } else={
                    :set msg "$LogHeader : ping src-address=$srcAddress $addr alive"
                    :put "$msg"; 
                    #:log info "$msg"
             }
      } else={
             :set msg "$LogHeader : ping src-address=$srcAddress $addr alive"
             :put "$msg"; 
             #:log info "$msg"
      }
} else={
      :set msg "$LogHeader : ping src-address=$srcAddress $addr alive"
      :put "$msg"; 
      #:log info "$msg"
}


# status
:set msg "$LogHeader : routeStatus:$routeStatus trouteStatus:$trouteStatus prouteStatus:$prouteStatus pingStatus:$pingStatus"
:put "$msg"; 
#:log info "$msg"


#
# final decision
#
if ( $pingStatus = "FAILED") do={
         :set checkWanStatus "$iface FAILED"
         :set msg "$LogHeader : interface $iface FAILED !"
         :put "$msg"; :log warning "$msg"
         if ($routeStatus = "OK") do={
             /ip route set [find gateway=$gateway distance=$distancePersist routing-table=main dst-address=$dstAddress disabled=yes ] disabled=no comment="$iface persistdef - $LogHeader $Time enabled"; 
             /ip route set [find gateway=$gateway distance=$distanceDefault routing-table=main dst-address=$dstAddress disabled=no ] disabled=yes comment="$iface def - $LogHeader $Time disabled";
             if ( $resetConn = "1" ) do={
                     /ip/firewall/connection remove [find]
             }
             :set checkWanStatus "$iface DISABLED"
             :set msg "$LogHeader : route dst-address=$dstAddress gateway=$gateway distance=$distanceDefault routing-table=main reset-conn:$resetConn DISABLED !"
             :put "$msg"; :log warning "$msg"
             :delay 3;
             /tool e-mail send to=$email subject="$msg" } else={
                  :set msg "$LogHeader : route dst-address=$dstAddress gateway=$gateway distance=$distanceDefault routing-table=main already disabled"
                  :put "$msg"; :log info "$msg" }
} else={
         if ($routeStatus = "FAILED") do={
              :set checkWanStatus "$iface RESTORED"
              /ip route set [find gateway=$gateway distance=$distancePersist routing-table=main dst-address=$dstAddress disabled=yes ] disabled=no comment="$iface persistdef - $LogHeader $Time restored"; 
              /ip route set [find gateway=$gateway distance=$distanceDefault routing-table=main dst-address=$dstAddress disabled=yes ] disabled=no comment="$iface def - $LogHeader $Time restored";
              if ( $resetConn = "1" ) do={
                     /ip/firewall/connection remove [find]
              }
              :set msg "$LogHeader : route dst-address=$dstAddress gateway=$gateway distance=$distanceDefault routing-table=main reset-conn:$resetConn RESTORED !"
              :put "$msg"; :log warning "$msg"
              :delay 3;
              /tool e-mail send to=$email subject="$msg"} else={
                   :set msg "$LogHeader : interface $iface alive"
                   :put "$msg"; 
                   :log info "$msg"
                   :set checkWanStatus "$iface OK"
             }
}

It is always wrong to use the firewall to decide the routes (except in exceptional cases).

For routes must be used… routes…
/ip firewall mangle
add action=mark-routing chain=output dst-address=3.3.3.3 routing-table=!mytable new-routing-mark=mytable

/ip route
add dst-address=3.3.3.3/32 gateway=3.3.3.3 routing-mark=mytable
/ip route
add dst-address=3.3.3.3/32 gateway=3.3.3.3 routing-mark=mytable

/ip route rule
add dst-address=3.3.3.3/32 table=mytable
/routing table
add fib name=mytable

/ip firewall mangle
add action=mark-routing chain=output dst-address=3.3.3.3 routing-mark=!mytable new-routing-mark=mytable

/ip route
add dst-address=3.3.3.3/32 gateway=3.3.3.3 routing-table=mytable
/routing table
add fib name=mytable

/ip route
add dst-address=3.3.3.3/32 gateway=3.3.3.3 routing-table=mytable

/routing rule
add dst-address=3.3.3.3/32 table=mytable

Hi - probably.
The “mangle”-rule in my initial approach at the top of this thread was designed for the netwatch - and yes, it is doing it’s job the way a routing-entry would do it. At the time creating the rule I had zero experience with routing and this was my first approach in my home environment.

Surprisingly all other approaches to create a backup / failover were much more complex at this time (you can see it in the threads at this time here in the BBoard) - that’s why I came to the idea to involve netwatch with this mangle. Did not see this approach documented before and posted it here.

Will alter the config for the routing part since you are of course right with your statement, although the first idea did work since I posted it very well.

Greetings!

Edit: And SORRY, if I missed this thread, I saw the other answers but had been abscent for a while…

Sorry for the late reply, @derolf - here’s the answer:

My RB5009 is in BRIDGE-Mode. All ports are bridged together, no other IP-Segment is used.
So since ALL Ports are on the same bridge and you like to rebuild this, you are free to use any port of your MikroTik-Device for that.

In my case I have a mixed-setup with AVM-Hardware (“FritzBox”). The “FritzBox” is providing DSL / Landline and attached to another “FritzBox” in my upstairs location (MESH-Wireless) there’s the LTE-Modem (192.168.1.250 as a normal LAN-Address).

You see, everything in the network is seeing everything (since this is capsulated from the guest-network and IOT which “FritzBox” is providing), no need for additional internal firewall-rules, so BRIDGE is fine here.

If you setup everything like this (one subnet, one bridge) you’ll be fine with the rest of settings I mentioned in the first thread and your failover is “good-to-go” :slight_smile:

Hope this helps!

(Reply to #15)

In fact, I didn’t comment on the rest, because there was nothing to add.
I’m usually very critical (not by chance, but always explaining the reasons), and if I haven’t added anything else, it means you did a good job (and I thank you for putting it on the forum).

I just explained how things should be done, that’s the purpose of the forum.

Felt not offended - we’re all here to learn. And usually this board is a good example of respecting every stage of knowledge and diving into each others’ problems.
If this thread has a wholesome solution at the end, this work is perfectly done :sunglasses:

Cheers!

Only to keep things as together as possible I just “sold” this Filo’s approach to a new user, with a few changes.
I got rid of the separate routing table and of the mangle by adding a “narrow” /32 route to the “canary” ip address in “main” table.
And I didn’t use the “comment” as selector in the Netwatch script (this is a pet peeve of mine, comments may be changed accidentally six months or a year later, the setup would stop working and finding out what happened would be more difficult).
Because of some reasons (I suspect the address on ether1 coming from DHCP server instead of being static) when the ether1 is physically disconnected from the ISP router (think of the ethernet cable going bad or just the ISP router or its power supply failing) the “main” route becomes inactive, and the whole setup starts flapping each time the netwatch script runs.
So I added a blackhole route to the same /32 address with distance 2.

The thread is here:
http://forum.mikrotik.com/t/secondary-wan-and-failover-setup-hap-ax2-7-16-for-a-beginner/179134/1

As it is a bit difficult to follow due to all the tests made, here it is the overall setup using the SAME IP addresses and structure of the original Filo’s post:

  1. Prerequirements:
  • Network with DHCP done by MicroTik (in this case: 192.188.1.0/24)
  • Standard Gateway in DHCP will be the MikroTik (here: 192.188.1.1)
  • Internet available at (for Example) 192.168.1.1 (in this case DSL)
  • Internet available at (for Example) 192.168.1.250 (LTE-Modem)
  • Both interfaces connected to the two devices above characterized as WAN in interface list and masqueraded in /ip firewall nat
  1. Routing:
  • Standard Route 0.0.0.0/0 set to 192.168.1.250 with Distance 1 comment=LTE-Failover → (keep it DEACTIVATED)
  • Standard Route 0.0.0.0/0 set to 192.168.1.1 with Distance 2
  • Narrow Route 8.8.4.4/32 set to 192.168.1.1 with Distance 1
  • Narrow Blackhole route 8.8.4.4/32 with Distance 2
  1. Go to ROUTING → TABLES
  • Create a Routing Table named (for Example) “DSL” - check FIB
  1. Go To IP → ROUTES → Click +
  • Dst,Address: 0.0.0.0/0
  • Gateway: 192.168.1.1 (your Primary Gateway)
  • Routing Table: Select above created ROUTING TABLE (here: “DSL”)
  1. Go to IP → FIREWALL → Tab MANGLE
    Create a MANGLE-Rule:
  • Tab → GENERAL
    – Chain: output
    – Dst.Address: 8.8.8.8
    – Protocol: 1 (icmp)
  • Tab → ACTION
    – Action: mark routing
    – New Routing Mark: Select above created ROUTING TABLE (here: “DSL”)
  1. 3.Go to TOOLS → NETWATCH
    -Tab → HOST
    – Create a Netwatch Host:
    — Host: 8.8.8.8 8.8.4.4
    — Type: icmp
    — Interval: 00:00:30
    — Timeout: 5.00

-Tab → Down
/ip route enable [find comment=LTE-Failover] [find dst-address=0.0.0.0/0 and gateway=192.168.1.250]

-Tab → Up
/ip route disable [find comment=LTE-Failover] [find dst-address=0.0.0.0/0 and gateway=192.168.1.250]

It seems like it works nicely and it is simpler to implement.

EDIT: added the detail that interfaces should be WAN and masqueraded

[quote=jaclaz post_id=1102129 time=1728390315 user_id=224177

-Tab → Down
/ip route enable [find comment=LTE-Failover] [find dst-address=0.0.0.0/0 and gateway=192.168.1.250]

-Tab → Up
/ip route disable [find comment=LTE-Failover] [find dst-address=0.0.0.0/0 and gateway=192.168.1.250]
[/quote]
Just to be sure on both TAB up and TAB down, the router ends up pointing to the same gateway ???

My bad I see you differentiate by enable and disable..

The problem I am having is how do you associate netwatch to the correct ROUTE???
Just identifying the gateway is good enough? but surelly you mean for static gateways or even pppoe name, but what about dynamic gateways??