WAN Failover with Preferred WAN and ROS V7

Dear Forum,

I experimented a whole Day to get it to work but, maybe I do not understand it correctly or something else.
I am testing with a RB4011iGS and current ROS 7.10.

It is a virgin and basic configuration for test purposes.
There are 2 WAN connections, both are other routers DHCPing to the mikrotik on ether1 and ether2.
The WAN connection on ether1 should be the preferred connection, as it is unmetered land line, while the 2nd is LTE backup with a metered connection.
When the basics run as estimated I will change and adopt the setup to PPPoE and public IP / DHCP (as mentioned below)

The other ports are in a Bridge (bridge1), the mikrotik has an IP and DHCP Server running on it. The two routers are using the following IP Addresses:
ether1: 10.1.116.11/24 (DHCP to the Mikrotik, ping 7ms to Internet)
ether2: 192.168.8.1/24 (DHCP to the Mikrotik, ping 35ms to Internet)

Here the BASIC config of the Mikrotik:

# 2023-07-05 09:47:58 by RouterOS 7.10.1
# software id = XXXXXXXX
#
# model = RB4011iGS+5HacQ2HnD
# serial number = XXXXXX
/interface bridge
add name=bridge1
/interface list
add name=WAN
/ip pool
add name=pool1 ranges=192.168.20.1-192.168.20.99
/ip dhcp-server
add address-pool=pool1 interface=bridge1 name=server1
/port
set 0 name=serial0
set 1 name=serial1
/interface bridge port
add bridge=bridge1 interface=ether10
add bridge=bridge1 interface=ether9
add bridge=bridge1 interface=ether8
/interface list member
add interface=ether1 list=WAN
add interface=ether2 list=WAN
/ip address
add address=192.168.20.254/24 interface=bridge1 network=192.168.20.0
/ip dhcp-client
add interface=ether1
add interface=ether2
/ip dhcp-server network
add address=192.168.20.0/24 dns-server=192.168.20.254 gateway=192.168.20.254
/ip dns
set allow-remote-requests=yes
/ip firewall nat
add action=masquerade chain=srcnat out-interface-list=WAN
/system clock
set time-zone-name=Europe/Berlin
/system note
set show-at-login=no

(I removed some unnecessary LED and WIFI stuff as it is not used)

So far it is running, I can use the Internet at a random line, when I change the distance in the DHCP client I can prefer either the one or the other WAN interface.

My goal is to check if 2 or more public IP addresses are reachable via WAN1, and if they are not reachable to failover the main traffic to WAN2 while the check resists on WAN1 and if ping comes back to re-failover back the whole traffic to WAN1.
I explicitly don’t want to check if the 2 routers are offline, as it is no indicator if the DSL or LTE is working problerly!

From my nature I prefer a script, as I can build in some error handling or information like sending an email.

I found a script and changed it a bit.

:local rchbl
     :if ([/ping 8.8.4.4 count=2 interface=ether1] >0 || [/ping 141.1.1.1 count=2 interface=ether1] >0) do={:set rchbl 1} else={:set rchbl 0}

#:log info "reachable is $rchbl"

     :local rtenable [/ip route print count-only where vrf-interface=ether1 && distance=1]
     :local msg ""
     :put "rchbl is $rchbl "

 # check the distance of the primary default gateway static route
       :if ($rtenable = 1) do={
           :if ($rchbl = 0) do={
               :set msg "Pings not reachable, switching to secondary gateway with setting distance to 1"
               /ip route set [find vrf-interface=ether1] distance=20
               /ip route set [find vrf-interface=ether2] distance=1
           }
       } else={
           if ($rchbl = 1) do={
               :set msg "Pings reachable, switching back to primary gateway with setting distance to 1"
                /ip route set [find vrf-interface=ether1] distance=1
                /ip route set [find vrf-interface=ether2] distance=20
           }
       }

       # output/feedback
       :if ($msg != "") do={
           :log info "$msg"
           :put ".:. $msg"
       }

The first confusing thing here is, that the calculating of >0 is inversed, when I run the script from terminal instead of from winbox gui!!! The script is NOT!!! handled correct from the terminal with >0 - it must be changed to <0 instead (maybe a very confusing bug). It is correct handled from Winbox and the scheduler as posted above!

But in this case the pings are not really correct handled, there are timeouts when switched to the backup WAN.


The less preferred method I found was the officially published method by Mikrotik:
https://help.mikrotik.com/docs/pages/viewpage.action?pageId=26476608

For this I also changed it to my needs:



/ip/firewall/nat
add chain=srcnat action=masquerade out-interface=ether1
add chain=srcnat action=masquerade out-interface=ether2

/routing/table
add fib name=to_ISP1
add fib name=to_ISP2
 
/ip/firewall/mangle
add chain=output connection-state=new connection-mark=no-mark action=mark-connection new-connection-mark=ISP1_conn out-interface=ether1
add chain=output connection-mark=ISP1_conn action=mark-routing new-routing-mark=to_ISP1 out-interface=ether1
add chain=output connection-state=new connection-mark=no-mark action=mark-connection new-connection-mark=ISP2_conn out-interface=ether2
add chain=output connection-mark=ISP2_conn action=mark-routing new-routing-mark=to_ISP2 out-interface=ether2

/ip/route
add dst-address=8.8.8.8 gateway=10.1.116.11 scope=10
add dst-address=208.67.222.222 gateway=10.1.116.11 scope=10
add dst-address=8.8.4.4 gateway=192.168.8.1 scope=10
add dst-address=208.67.220.220 gateway=192.168.8.1 scope=10

/ip/route/
add distance=1 gateway=8.8.8.8 routing-table=to_ISP1 target-scope=11 check-gateway=ping
add distance=1 gateway=208.67.222.222 routing-table=to_ISP1 target-scope=11 check-gateway=ping
add distance=2 gateway=8.8.4.4 routing-table=to_ISP1 target-scope=11 check-gateway=ping
add distance=2 gateway=208.67.220.220 routing-table=to_ISP1 target-scope=11 check-gateway=ping

/ip/route/
add distance=1 gateway=8.8.4.4 routing-table=to_ISP2 target-scope=11 check-gateway=ping
add distance=1 gateway=208.67.220.220 routing-table=to_ISP2 target-scope=11 check-gateway=ping
add distance=2 gateway=8.8.8.8 routing-table=to_ISP2 target-scope=11 check-gateway=ping
add distance=2 gateway=208.67.222.222 routing-table=to_ISP2 target-scope=11 check-gateway=ping

/ip/route
add dst-address=10.10.10.1 gateway=8.8.8.8 scope=10 target-scope=11 check-gateway=ping
add dst-address=10.10.10.1 gateway=208.67.222.222 scope=10 target-scope=11 check-gateway=ping
add dst-address=10.20.20.2 gateway=8.8.4.4 scope=10 target-scope=11 check-gateway=ping
add dst-address=10.20.20.2 gateway=208.67.220.220 scope=10 target-scope=11 check-gateway=ping

/ip/route
add distance=1 gateway=10.10.10.1 routing-table=to_ISP1 target-scope=12
add distance=2 gateway=10.20.20.2 routing-table=to_ISP1 target-scope=12
add distance=1 gateway=10.20.20.2 routing-table=to_ISP2 target-scope=12
add distance=2 gateway=10.10.10.1 routing-table=to_ISP2 target-scope=12

At the first look it works BUT with two disadvantages:

  • in this version I can not prefer a WAN interface, the router sometimes routes about the LTE line
  • I have problems with a dynamically added IP - as I later want to use a PPPoE interface instead, I randomly get an IP address and a route, and I also want to change the settings of the LTE router to be transparent and change my APN to get a public IP address (it is possible with my carrier to use another and then I get a public IP directly on the DHCP interface of the Mikrotik, so the upsteam gateway on WAN2 may also change).

Does anyone have an idea to handle this (preferred with a script)?

Thx

Dirk



EDIT: I think I got it managed:

/routing/table
add fib name WAN1
add fib name WAN2

/ip/firewall/mangle
add chain=output action=mark-connection new-connection-mark=WAN1-Conn passthrough=yes connection-state=new dst-address=8.8.8.8 connection-mark=no-mark log=no log-prefix="" 
add chain=output action=mark-connection new-connection-mark=WAN1-Conn passthrough=yes connection-state=new dst-address=8.8.4.4 connection-mark=no-mark log=no log-prefix="" 
add chain=output action=mark-routing new-routing-mark=WAN1 passthrough=yes connection-mark=WAN1-Conn log=no log-prefix=""


/system script
add dont-require-permissions=no name=ISP_Failover owner=admin policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon source=":local rchbl\r\
    \n     :if ([/ping 8.8.8.8 count=2] >0 || [/ping 8.8.4.4 count=2] >0) do={:set rchbl 1} else={:set rchbl 0}\r\
    \n\r\
    \n:log info \"reachable is \$rchbl\"\r\
    \n\r\
    \n     :local rtenable [/ip route print count-only where vrf-interface=ether1 && distance=1]\r\
    \n     :local msg \"\"\r\
    \n     :put \"rchbl is \$rchbl \"\r\
    \n\r\
    \n # check the distance of the primary default gateway static route\r\
    \n       :if (\$rtenable = 1) do={\r\
    \n           :if (\$rchbl = 0) do={\r\
    \n               :set msg \"Pings not reachable, switching to secondary gateway with setting distance to 1\"\r\
    \n               /ip route set [find vrf-interface=ether1] distance=20\r\
    \n               /ip route set [find vrf-interface=ether2] distance=1\r\
    \n           }\r\
    \n       } else={\r\
    \n           if (\$rchbl = 1) do={\r\
    \n               :set msg \"Pings reachable, switching back to primary gateway with setting distance to 1\"\r\
    \n                /ip route set [find vrf-interface=ether1] distance=1\r\
    \n                /ip route set [find vrf-interface=ether2] distance=20\r\
    \n           }\r\
    \n       }\r\
    \n\r\
    \n       # output/feedback\r\
    \n       :if (\$msg != \"\") do={\r\
    \n           :log info \"\$msg\"\r\
    \n           :put \".:. \$msg\"\r\
    \n       }"


/system scheduler
add interval=10s name=ISP-Check on-event=ISP_Failover policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon start-date=2023-07-05 start-time=14:17:55

/ip dhcp-client
remove 0
remove 1

add interface=ether1 script="{\r\
    \n    :local rmark \"WAN1\"\r\
    \n    :local count [/ip route print count-only where comment=\"WAN1\"]\r\
    \n    :if (\$bound=1) do={\r\
    \n        :if (\$count = 0) do={\r\
    \n            /ip route add gateway=\$\"gateway-address\" comment=\"WAN1\" routing-table=\$rmark\r\
    \n        } else={\r\
    \n            :if (\$count = 1) do={\r\
    \n                :local test [/ip route find where comment=\"WAN1\"]\r\
    \n                :if ([/ip route get \$test gateway] != \$\"gateway-address\") do={\r\
    \n                    /ip route set \$test gateway=\$\"gateway-address\"\r\
    \n                }\r\
    \n            } else={\r\
    \n                :error \"Multiple routes found\"\r\
    \n            }\r\
    \n        }\r\
    \n    } else={\r\
    \n        /ip route remove [find comment=\"WAN1\"]\r\
    \n    }\r\
    \n}"
add default-route-distance=20 interface=ether2 script="{\r\
    \n    :local rmark \"WAN2\"\r\
    \n    :local count [/ip route print count-only where comment=\"WAN2\"]\r\
    \n    :if (\$bound=1) do={\r\
    \n        :if (\$count = 0) do={\r\
    \n            /ip route add gateway=\$\"gateway-address\" comment=\"WAN2\" routing-table=\$rmark\r\
    \n        } else={\r\
    \n            :if (\$count = 1) do={\r\
    \n                :local test [/ip route find where comment=\"WAN2\"]\r\
    \n                :if ([/ip route get \$test gateway] != \$\"gateway-address\") do={\r\
    \n                    /ip route set \$test gateway=\$\"gateway-address\"\r\
    \n                }\r\
    \n            } else={\r\
    \n                :error \"Multiple routes found\"\r\
    \n            }\r\
    \n        }\r\
    \n    } else={\r\
    \n        /ip route remove [find comment=\"WAN2\"]\r\
    \n    }\r\
    \n}\r\
    \n"

The whole router Config (as it is only “LAB-Use” there may be some things missing, like hardening and firewall rules) Important is, the script seems to work stable, switching the distances of both routes and you are able to modify and add some customized e-mail notification:

# 2023-07-05 15:32:03 by RouterOS 7.10.1
# software id = XXXXXXXXX
#
# model = RB4011iGS+5HacQ2HnD
# serial number = XXXXXXX
/interface bridge
add name=bridge1
/interface wireless
set [ find default-name=wlan1 ] ssid=MikroTik
set [ find default-name=wlan2 ] ssid=MikroTik
/interface list
add name=WAN
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/ip pool
add name=pool1 ranges=192.168.20.1-192.168.20.99
/ip dhcp-server
add address-pool=pool1 interface=bridge1 name=server1
/port
set 0 name=serial0
set 1 name=serial1
/routing table
add fib name=WAN1
add fib name=WAN2
/interface bridge port
add bridge=bridge1 interface=ether10
add bridge=bridge1 interface=ether9
add bridge=bridge1 interface=ether8
/interface list member
add interface=ether1 list=WAN
add interface=ether2 list=WAN
/ip address
add address=192.168.20.254/24 interface=bridge1 network=192.168.20.0
/ip dhcp-client
add interface=ether1 script="{\r\
    \n    :local rmark \"WAN1\"\r\
    \n    :local count [/ip route print count-only where comment=\"WAN1\"]\r\
    \n    :if (\$bound=1) do={\r\
    \n        :if (\$count = 0) do={\r\
    \n            /ip route add gateway=\$\"gateway-address\" comment=\"WAN1\"\
    \_routing-table=\$rmark\r\
    \n        } else={\r\
    \n            :if (\$count = 1) do={\r\
    \n                :local test [/ip route find where comment=\"WAN1\"]\r\
    \n                :if ([/ip route get \$test gateway] != \$\"gateway-addre\
    ss\") do={\r\
    \n                    /ip route set \$test gateway=\$\"gateway-address\"\r\
    \n                }\r\
    \n            } else={\r\
    \n                :error \"Multiple routes found\"\r\
    \n            }\r\
    \n        }\r\
    \n    } else={\r\
    \n        /ip route remove [find comment=\"WAN1\"]\r\
    \n    }\r\
    \n}"
add default-route-distance=20 interface=ether2 script="{\r\
    \n    :local rmark \"WAN2\"\r\
    \n    :local count [/ip route print count-only where comment=\"WAN2\"]\r\
    \n    :if (\$bound=1) do={\r\
    \n        :if (\$count = 0) do={\r\
    \n            /ip route add gateway=\$\"gateway-address\" comment=\"WAN2\"\
    \_routing-table=\$rmark\r\
    \n        } else={\r\
    \n            :if (\$count = 1) do={\r\
    \n                :local test [/ip route find where comment=\"WAN2\"]\r\
    \n                :if ([/ip route get \$test gateway] != \$\"gateway-addre\
    ss\") do={\r\
    \n                    /ip route set \$test gateway=\$\"gateway-address\"\r\
    \n                }\r\
    \n            } else={\r\
    \n                :error \"Multiple routes found\"\r\
    \n            }\r\
    \n        }\r\
    \n    } else={\r\
    \n        /ip route remove [find comment=\"WAN2\"]\r\
    \n    }\r\
    \n}\r\
    \n"
/ip dhcp-server network
add address=192.168.20.0/24 dns-server=192.168.20.254 gateway=192.168.20.254
/ip dns
set allow-remote-requests=yes
/ip firewall filter
add action=fasttrack-connection chain=forward connection-state=established \
    hw-offload=yes
add action=accept chain=forward connection-state=established
add action=accept chain=input dst-port=53 protocol=udp
add action=accept chain=input connection-state=established
add action=drop chain=input connection-state=invalid
add action=drop chain=input in-interface-list=WAN
add action=drop chain=forward in-interface-list=WAN log=yes
/ip firewall mangle
add action=mark-connection chain=output connection-mark=no-mark \
    connection-state=new dst-address=8.8.8.8 new-connection-mark=\
    WAN1-Conn passthrough=yes
add action=mark-connection chain=output connection-mark=no-mark \
    connection-state=new dst-address=8.8.4.4 new-connection-mark=WAN1-Conn \
    passthrough=yes
add action=mark-routing chain=output connection-mark=WAN1-Conn \
    new-routing-mark=WAN1 passthrough=yes
/ip firewall nat
add action=masquerade chain=srcnat out-interface-list=WAN
/ip route
add comment=WAN1 gateway=10.1.116.11 routing-table=WAN1
add comment=WAN2 gateway=37.0.0.1 routing-table=WAN2
/system clock
set time-zone-name=Europe/Berlin
/system leds
add interface=wlan2 leds="wlan2_signal1-led,wlan2_signal2-led,wlan2_signal3-le\
    d,wlan2_signal4-led,wlan2_signal5-led" type=wireless-signal-strength
add interface=wlan2 leds=wlan2_tx-led type=interface-transmit
add interface=wlan2 leds=wlan2_rx-led type=interface-receive
/system note
set show-at-login=no
/system scheduler
add interval=10s name=ISP-Check on-event=ISP_Failover policy=\
    ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon \
    start-date=2023-07-05 start-time=14:17:55
/system script
add dont-require-permissions=no name=ISP_Failover owner=admin policy=\
    ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon source=":\
    local rchbl\r\
    \n     :if ([/ping 8.8.8.8 count=2] >0 || [/ping 8.8.4.4 count=2] >0\
    ) do={:set rchbl 1} else={:set rchbl 0}\r\
    \n\r\
    \n:log info \"reachable is \$rchbl\"\r\
    \n\r\
    \n     :local rtenable [/ip route print count-only where vrf-interface=eth\
    er1 && distance=1]\r\
    \n     :local msg \"\"\r\
    \n     :put \"rchbl is \$rchbl \"\r\
    \n\r\
    \n # check the distance of the primary default gateway static route\r\
    \n       :if (\$rtenable = 1) do={\r\
    \n           :if (\$rchbl = 0) do={\r\
    \n               :set msg \"Pings not reachable, switching to secondary ga\
    teway with setting distance to 1\"\r\
    \n               /ip route set [find vrf-interface=ether1] distance=20\r\
    \n               /ip route set [find vrf-interface=ether2] distance=1\r\
    \n           }\r\
    \n       } else={\r\
    \n           if (\$rchbl = 1) do={\r\
    \n               :set msg \"Pings reachable, switching back to primary gat\
    eway with setting distance to 1\"\r\
    \n                /ip route set [find vrf-interface=ether1] distance=1\r\
    \n                /ip route set [find vrf-interface=ether2] distance=20\r\
    \n           }\r\
    \n       }\r\
    \n\r\
    \n       # output/feedback\r\
    \n       :if (\$msg != \"\") do={\r\
    \n           :log info \"\$msg\"\r\
    \n           :put \".:. \$msg\"\r\
    \n       }"