I have followed the guide and done some testing.
The failover certainly seems to work if I simply pull the cable - In my lab example, I have 2 WAN interfaces:
ether1
lte1
Both the simple single host (Google DNS) example and multi host (Google + Open DNS) examples work when I just yank the ether1 interface.
The problem is that in the real world, this is not always how WAN fails - there are many issues that could stop internet service for a particular WAN interface.
In this case, I also have control of the upstream router for ether1 WAN, and when I just drop all traffic from that network, the failover does not work.
I also tested at this point, on simple single host config, that:
I can’t ping 8.8.8.8
I can ping 8.8.4.4
but my internet test ping to 1.1.1.1 is still failing and there is no “failover” for my WAN route
This is basic single host script I have used for testing
** I used WAN interface names, as LTE IP changes and this seemed like a better way to do it, although I have tested with static gateway IPs and the failover still does not seem to work when traffic is dropped
Hoping someone can help me get this right.
Or maybe share a better way to do WAN failover in Router OS 7.
I did consider writing a script using detect-internet function, although it is a lot slower to failover, and it seems to be limited to a single host / not customisable to add more host checks.
Yes, and that’s the very motivation for this whole setup, where the WAN “transparency” is being checked using pings to the Reference addresses.
An unusual thing I can spot in your setup is that the gateway parameters of the routes towards the reference addresses are set to interface names rather than to IP addresses of the gateway devices; this is a possible setting but the gateway device must act as an ARP proxy, responding with its own MAC address to ARP request for any IP address.
Other than that, the configuration seems incomplete to me. You have two routing tables, named to-Primary_WAN and to-Failover_LTE, but you didn’t configure (show?) any default routes via Reference addresses to be added to routing table main, nor have you shown any rules that would assign routing-mark values except in mangle chain output. And the latter ones are confusing to me.
So unless something is missing in what you’ve shown, add the default routes via 8.8.8.8 and via 8.8.4.4 to routing table main, disable the rules in chain output of mangle, and try pinging 1.1.1.1 again. If you use a src-nat or masquerade rule(s) on the WAN interfaces, you have to bear in mind that if you break the Primary WAN transparency to 8.8.8.8 during pinging, the connection tracking will not change the NAT handling for that ping sequence so the pings will fail even though the route via LTE will be active. To check that the Backup LTE kicked in, you have to stop pinging for 10 seconds and then try again.
Can you elaborate a bit further, please ?
Do you mean that, for this to work, “chosen references (here 8.8.8.8 and 8.8.4.4) SHOULD act as ARP proxy but they don’t” ?
Not the references themselves (they are far away so our ARP requests cannot reach them), the routers to which WAN1 and WAN2 are physically connected. Only if the LTE WAN is an internal LTE card or an USB one, giving its name as gateway of a route makes sense as the LTE interface is either an L3 one, or an L2 one but acting as proxy-arp.
But it is not only important for WAN failover - even if there was none, you still need that either the external routers support proxy-arp or the routes’ gateways are set to IP addresses, not interface names, of these external routers.
Thanks for the comments - they were helpful - it appears to be working after a bit of testing, at least for the lab I have setup - probably requires a bit more real world testing before I’m confident
For anyone interested, here is the same script block with adjustments I used to test / implement
The best way to test a “true” failover situation seems to be by using the firewall test rule for ICMP to the primary WAN
When this is enabled while pinging some address
You will see after some moments, the primary recursive route becomes unreachable
And traffic should switch to use sescondary WAN
You can watch the traffic switch interface using torch tool - that is what I did
{
# testing comment ease of removal of added config
:local TestingComment "testing-comment-remove-me"
# wan or isp reference names
:local Wan1Reference "Primary_WAN"
:local Wan2Reference "Failover_LTE"
# actual interface names for the 2 wans
:local Wan1InterfaceName "ether1"
:local Wan2InterfaceName "lte1"
# interface ip or gateway
## lte1 gets from dhclient and not gateway shown
## getting address and removing cidr notation
# appears as though lte1 interface name can be used and using ip does not work
:local Lte1IP ([[/ip/route/get number=[/ip/route/find gateway="lte1" distance=0]] as-value]->"dst-address")
:local Lte1IP [:pick $Lte1IP 0 [:find $Lte1IP "/"]]
## ether1 is dhclient
## has gateway address
:local PrimaryWanGatewayIP [/ip/dhcp-client/get ether1 gateway]
# hosts for checking internet status
## google dns
:local Wan1CheckHostA "8.8.8.8"
:local Wan2CheckHostA "8.8.4.4"
# add routing tables
/routing/table add fib name=("to-".$Wan1Reference) comment=$TestingComment
/routing/table add fib name=("to-".$Wan2Reference) comment=$TestingComment
# ip firewall mangle rules for marking connections and routing
/ip/firewall/mangle add chain=output connection-state=new connection-mark=no-mark action=mark-connection new-connection-mark=($Wan1Reference."-conn") out-interface=$Wan1InterfaceName comment=$TestingComment
/ip/firewall/mangle add chain=output connection-mark=($Wan1Reference."-conn") action=mark-routing new-routing-mark=("to-".$Wan1Reference) out-interface=$Wan1InterfaceName comment=$TestingComment
/ip/firewall/mangle add chain=output connection-state=new connection-mark=no-mark action=mark-connection new-connection-mark=($Wan2Reference."-conn") out-interface=$Wan2InterfaceName comment=$TestingComment
/ip/firewall/mangle add chain=output connection-mark=($Wan2Reference."-conn") action=mark-routing new-routing-mark=("to-".$Wan2Reference) out-interface=$Wan2InterfaceName comment=$TestingComment
# route config
/ip/route add dst-address=$Wan1CheckHostA scope=10 gateway=$PrimaryWanGatewayIP comment=$TestingComment
/ip/route add dst-address=$Wan2CheckHostA scope=10 gateway=$Wan2InterfaceName comment=$TestingComment
/ip/route add distance=1 gateway=$Wan1CheckHostA target-scope=11 routing-table=("to-".$Wan1Reference) check-gateway=ping comment=$TestingComment
/ip/route add distance=2 gateway=$Wan2CheckHostA target-scope=11 routing-table=("to-".$Wan1Reference) check-gateway=ping comment=$TestingComment
/ip/route add distance=1 gateway=$Wan2CheckHostA target-scope=11 routing-table=("to-".$Wan2Reference) check-gateway=ping comment=$TestingComment
/ip/route add distance=2 gateway=$Wan1CheckHostA target-scope=11 routing-table=("to-".$Wan2Reference) check-gateway=ping comment=$TestingComment
# some test rules which can be enable / disable to test failover
/ip/firewall/filter/add chain=output action=drop out-interface=ether1 place-before=1 disabled=yes comment=$TestingComment
/ip/firewall/filter/add chain=output action=drop out-interface=lte1 place-before=1 disabled=yes comment=$TestingComment
/ip/firewall/filter/add chain=output action=drop out-interface=ether1 place-before=1 protocol=icmp dst-address=8.8.8.8 disabled=yes comment=$TestingComment
/ip/firewall/filter/add chain=output action=drop out-interface=lte1 place-before=1 protocol=icmp dst-address=8.8.4.4 disabled=yes comment=$TestingComment
}
I also tried the second example (from references in original post), as I wanted to use two hosts for the failover config.
I was not able to get the virtual ip stuff to work - not sure exactly what is required - I think it maybe something to do with having default routes on the WAN interfaces or target-scope precedence or something else.
If the virtual IP example is a better way to do it, would love to hear someones explanation of why, and some more notes on how to implement it as the reference is not very good IMO.
After the virtual ip stuff didn’t work, I just tried doing it the normal way (first example) and this seemed to work, and for my testing did seem to ONLY failover when BOTH hosts were not reachable.
Here is that example for anyone interested:
{
# testing comment ease of removal of added config
:local TestingComment "testing-comment-remove-me"
# wan or isp reference names
:local Wan1Reference "Primary_WAN"
:local Wan2Reference "Failover_WAN"
# actual interface names for the 2 wans
:local Wan1InterfaceName "ether1"
:local Wan2InterfaceName "lte1"
# gateways
## need different things here depending on gateway type - your mileage may vary and some testing likely required
:local Wan1Gateway [/ip/dhcp-client/get ether1 gateway]
:local Wan2Gateway "lte1"
# hosts for checking internet status
## google dns
:local Wan1CheckHostA "8.8.8.8"
:local Wan2CheckHostA "8.8.4.4"
## open dns
:local Wan1CheckHostB "208.67.222.222"
:local Wan2CheckHostB "208.67.220.220"
# add routing tables
/routing/table add fib name=("to-".$Wan1Reference) comment=$TestingComment
/routing/table add fib name=("to-".$Wan2Reference) comment=$TestingComment
# ip firewall mangle rules for marking connections and routing
/ip/firewall/mangle add chain=output connection-state=new connection-mark=no-mark action=mark-connection new-connection-mark=($Wan1Reference."-conn") out-interface=$Wan1InterfaceName comment=$TestingComment
/ip/firewall/mangle add chain=output connection-mark=($Wan1Reference."-conn") action=mark-routing new-routing-mark=("to-".$Wan1Reference) out-interface=$Wan1InterfaceName comment=$TestingComment
/ip/firewall/mangle add chain=output connection-state=new connection-mark=no-mark action=mark-connection new-connection-mark=($Wan2Reference."-conn") out-interface=$Wan2InterfaceName comment=$TestingComment
/ip/firewall/mangle add chain=output connection-mark=($Wan2Reference."-conn") action=mark-routing new-routing-mark=("to-".$Wan2Reference) out-interface=$Wan2InterfaceName comment=$TestingComment
# route config
/ip/route add dst-address=$Wan1CheckHostA scope=10 gateway=$Wan1Gateway comment=$TestingComment
/ip/route add dst-address=$Wan2CheckHostA scope=10 gateway=$Wan2Gateway comment=$TestingComment
/ip/route add dst-address=$Wan1CheckHostB scope=10 gateway=$Wan1Gateway comment=$TestingComment
/ip/route add dst-address=$Wan2CheckHostB scope=10 gateway=$Wan2Gateway comment=$TestingComment
/ip/route add distance=1 gateway=$Wan1CheckHostA target-scope=11 routing-table=("to-".$Wan1Reference) check-gateway=ping comment=$TestingComment
/ip/route add distance=2 gateway=$Wan2CheckHostA target-scope=11 routing-table=("to-".$Wan1Reference) check-gateway=ping comment=$TestingComment
/ip/route add distance=1 gateway=$Wan1CheckHostB target-scope=11 routing-table=("to-".$Wan1Reference) check-gateway=ping comment=$TestingComment
/ip/route add distance=2 gateway=$Wan2CheckHostB target-scope=11 routing-table=("to-".$Wan1Reference) check-gateway=ping comment=$TestingComment
/ip/route add distance=1 gateway=$Wan2CheckHostA target-scope=11 routing-table=("to-".$Wan2Reference) check-gateway=ping comment=$TestingComment
/ip/route add distance=2 gateway=$Wan1CheckHostA target-scope=11 routing-table=("to-".$Wan2Reference) check-gateway=ping comment=$TestingComment
/ip/route add distance=1 gateway=$Wan2CheckHostB target-scope=11 routing-table=("to-".$Wan2Reference) check-gateway=ping comment=$TestingComment
/ip/route add distance=2 gateway=$Wan1CheckHostB target-scope=11 routing-table=("to-".$Wan2Reference) check-gateway=ping comment=$TestingComment
# some test rules which can be enable / disable to test failover
/ip/firewall/filter/add chain=output action=drop out-interface=ether1 place-before=1 protocol=icmp dst-address=208.67.222.222 disabled=yes comment=$TestingComment
/ip/firewall/filter/add chain=output action=drop out-interface=ether1 place-before=1 protocol=icmp dst-address=8.8.8.8 disabled=yes comment=$TestingComment
}
I have exactly the same problem. I am also not able to get the virtual IP stuff working.
My setup:
CRS326 with RouterOS 7.1.3
WAN1 on ether1 with IP DHCP client with IP 10.0.10.13 and Gateway 10.0.10.1
WAN2 on ether2 with IP DHCP client with IP 10.10.11.2 and Gateway 10.10.11.1
Bridge with ether3 to ether24 with DHCP Server
For my understanding, the default route on the WAN interfaces must be deactived. But than no connection is possible.
Hi Everyone, i do also have problems with this config to work - in my opinion it lacks default routes in main table (0.0.0.0/0) or am i missing something else?
I’ve tested both B2ONX configs/scripts, with no effect (nothing goes out from router), but … after adding:
(in main routing table) i’m able to send some icmp traffic from the router to remote host (other than google/open DNS servers written in config script), but then when i simulate WAN1 failure (ICMP firewall block on WAN1 device), even though route switches to WAN2 and seems to be working (at least for a while), then … out of nowhere i get timeouts and packet losses (~20% on avg) with “network unreachable” from WAN1 device (during WAN1 ICMP blockade and when ICMP packets should be traveling through WAN2).
Does someone have working config for simple “WAN Backup” on mikrotik routerOSv7?
I appreciate ANY help with this, cause i want to get rid of my old linux box where everything works like a charm - wan backup with load balance (not needed now), router-on-a-stick scenario (but there’s no problem with vlans as they are working on mikrotik too). THX in advance!