Hi mikrotik community,
I am building a Dual-WAN setup with automatic failover using netwatch and I need a sanity check of my config.
Things I want to achive:
- latest RouterOS 7.13 on a L009
- no recursive routing solution
- my head always spins when using recursive routing with multiple monitored hosts per ISP uplink
- I want to have more control in deciding when a monitored host is over an acceptable threshold (maybe one uplink is rural LTE and needs more relaxed ping/jitter/timeouts)
- the very much improved netwatch in RouterOS 7 gives me the ability to set different metrics per monitored host
- each ISP uplink has two netwatchers which monitor two hosts only over that ISP uplink (so I am monitoring four distinct hosts)
- a scheduled script runs every 5 seconds and collects all failed hosts per ISP from netwatch and disables the default route per ISP
- an uplink is considered failed, when all netwatched hosts for that uplink fail
- an uplink is considered available, when at least one netwatched hosts for that uplink is available
- I would prefer to solve this with routing tables and no mangling involved
- I have no need for Load-Balancing, but I want to steer certain internal traffic out on a specific ISP uplink, unless that uplink fails, then it should fail back to the other uplink (think traffic from a VoIP vlan, which goes out to a low-bandwith but still low-latency ISP uplink, while the rest uses a LTE-backend)
- my ISP uplinks are with DHCP, but I’ve got that covered with a dhcp-client script, which sets the default route (and correct src-nat for that interface)
- I am using src-nat for each ISP uplink instead of masquerade (because masquerade causes different problems) – the correct src-nat settings get set by my dhcp-client script – and NAT connections table gets flushed in the scheduled script
While this solution does involve some custom scripting, I am more interested in solving the gears involving the overall routing. So far, in my (limited) testing, most parts work. Netwatch works reliable and the scheduled script disabled/enabled the uplink depending on what monitored host I block way farther upstream. When default rules do change, the NAT connections are flushed reliable.
Questions:
- Is this layout sane, or are there hidden pitfalls to watch out for?
- I am struggling with adding a routing rule for my Voice-VLAN/address-range, which gets pinned to e.g. ISP2 unless that one fails, then it probably should use main table?
Any insights are greatly appreciated.
RouterOS default setup:
/interface list member
add interface=bridge list=LAN
add interface=ether1 list=WAN
add interface=ether2 list=WAN
/ip address
add address=192.168.88.1/24 comment=defconf interface=bridge network=192.168.88.0
NAT and DHCP-Client:
/ip firewall nat
# 10.33.x.y is part of my lab setup, correct to-addresses are set by dhcp-client, we prefer src-nat instead of masquerade
add action=src-nat chain=srcnat out-interface=ether1 to-addresses=10.33.20.63
add action=src-nat chain=srcnat out-interface=ether2 to-addresses=10.33.30.55
/ip dhcp-client
add add-default-route=no comment="UPLINK Telekom" interface=ether1 script=":if (\$bound=1) do={\r\
\n\t/ip firewall nat set [ find where out-interface=\$\"interface\" ] to-addresses=\$\"lease-address\"\r\
\n\t/ip route set [ find where comment=\"ISP1\" and dst-address=\"0.0.0.0/0\" and routing-table=\"main\" ] gateway=\$\"gateway-address\"\r\
\n}" use-peer-ntp=no
add add-default-route=no comment="UPLINK Vodafone" interface=ether2 script=":if (\$bound=1) do={\r\
\n\t/ip firewall nat set [ find where out-interface=\$\"interface\" ] to-addresses=\$\"lease-address\"\r\
\n\t/ip route set [ find where comment=\"ISP2\" and dst-address=\"0.0.0.0/0\" and routing-table=\"main\" ] gateway=\$\"gateway-address\"\r\
\n}" use-peer-ntp=no
Routing: ← struggling
/routing table
add disabled=no fib name=to_ISP1
add disabled=no fib name=to_ISP2
/routing rule
add action=lookup-only-in-table disabled=no dst-address=9.9.9.9/32 table=to_ISP1
add action=lookup-only-in-table disabled=no dst-address=208.67.222.222/32 table=to_ISP1
add action=lookup-only-in-table disabled=no dst-address=149.112.112.112/32 table=to_ISP2
add action=lookup-only-in-table disabled=no dst-address=208.67.220.220/32 table=to_ISP2
/ip route
add comment=ISP1 disabled=no distance=1 dst-address=0.0.0.0/0 gateway=10.33.20.1 pref-src="" routing-table=main scope=30 suppress-hw-offload=no target-scope=10
add comment=ISP2 disabled=no distance=2 dst-address=0.0.0.0/0 gateway=10.33.30.1 pref-src="" routing-table=main scope=30 suppress-hw-offload=no target-scope=10
add comment=ISP1 disabled=no distance=1 dst-address=9.9.9.9/32 gateway=10.33.20.1 pref-src="" routing-table=to_ISP1 scope=30 suppress-hw-offload=no target-scope=10
add comment=ISP2 disabled=no distance=1 dst-address=149.112.112.112/32 gateway=10.33.30.1 pref-src="" routing-table=to_ISP2 scope=30 suppress-hw-offload=no target-scope=10
add comment=ISP1 disabled=no distance=1 dst-address=208.67.222.222/32 gateway=10.33.20.1 pref-src="" routing-table=to_ISP1 scope=30 suppress-hw-offload=no target-scope=10
add comment=ISP2 disabled=no distance=1 dst-address=208.67.220.220/32 gateway=10.33.30.1 pref-src="" routing-table=to_ISP2 scope=30 suppress-hw-offload=no target-scope=10
Netwatch + Scheduler:
/system scheduler
add interval=5s name=schedule1 on-event=netwatch-down policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon start-date=2023-12-20 start-time=22:43:33
/tool netwatch
add comment=ISP1 disabled=no host=208.67.222.222 http-codes="" interval=3s start-delay=15s startup-delay=15s test-script="" type=icmp
add comment=ISP1 disabled=no host=9.9.9.9 http-codes="" interval=3s start-delay=15s startup-delay=15s test-script="" type=icmp
add comment=ISP2 disabled=no host=208.67.220.220 http-codes="" interval=3s start-delay=15s startup-delay=15s test-script="" type=icmp
add comment=ISP2 disabled=no host=149.112.112.112 http-codes="" interval=3s start-delay=15s startup-delay=15s test-script="" type=icmp
Watcher-Script:
# established connections get only flushed when the routing table actually changed
:local watcherFlushConnections false
:local watcherUplinks {"ISP1";"ISP2"}
:foreach isp in=$watcherUplinks do={
:local watcherFailed [ :len [ /tool netwatch find where comment=$"isp" and status="down" ] ]
:local watcherAvailable [ :len [ /tool netwatch find where comment=$"isp" ] ]
:log debug ( $"isp" . ": " . $watcherFailed . "/" . $watcherAvailable )
# disable uplinks routes only if all netwatchers for that uplink are unreachable
:if ($watcherFailed >= $watcherAvailable) do={
:foreach idx in=[ /ip route find where comment=$"isp" and dst-address="0.0.0.0/0" ] do={
:local watcherStatus ([ /ip route get $idx disabled ])
:if ($watcherStatus=false) do={
:log warning ("disabling route for " . $isp)
/ip route disable $idx
:set watcherFlushConnections true
}
}
}
# enable uplinks routes if any netwatchers for that uplink are reachable
:if ($watcherFailed < $watcherAvailable) do={
:foreach idx in=[ /ip route find where comment=$"isp" and dst-address="0.0.0.0/0" ] do={
:local watcherStatus ([ /ip route get $idx disabled ])
:if ($watcherStatus=true) do={
:log warning ("enabling route for " . $isp)
/ip route enable $idx
:set watcherFlushConnections true
}
}
}
}
# if default routes have been changed, we need to kill current NAT connections
:if ($watcherFlushConnections=true) do={
:log warning "flushing NAT connections"
/ip firewall connection remove [ find ]
}