My goal is that if the main wan link fails, both routers automatically switch to the lte backup line on the 2nd router.
With standard netwatch config I accomplished one part of the solution on the left router, that it routes internet traffic to 172.16.1.253.
I get stuck on the right side of the configuration where I did this:
The two routes, check-gateway=ping comment=“Netwatch ISP1” distance=1 dst-address=1.0.0.1/32 gateway=1.1.1.2
check-gateway=ping distance=1 dst-address=1.1.1.2/32 gateway=172.16.1.254
suggest that you want to use recursive routing (indicating that the remote gateway 1.1.1.2 is reachable via 172.16.1.254), but you haven’t adjusted scope and target-scope values on any of the two routes accordingly. In order that the first route (to 1.0.0.1) could use the second route (to 1.1.1.2), the target-scope of the first one must be equal or higher (in case of RouterOS 6) or just higher (in case o ROS 7) than the scope of the second one.
Just bear in mind that the recursive routing is only used locally, so when the test ping to 1.0.0.1 arrives to the left router, it doesn’t carry any information that 1.1.1.2 must be used to forward it. So if 1.1.1.2 is down, the left router will send that test packet back to the right router, and the packet will keep circulating between the two routers until its TTL expires.
So you can actually make it simpler and just use a single route, dst-address=1.0.0.1/32 gateway=172.16.1.254, to replace both of them.
But most important, with or without this change: do you use the same gateway for the same canary IP on both routers? I.e. both must ping 1.0.0.1 via 1.1.1.2 and both must ping 8.8.4.4 via 2.2.2.2.
You are right, I don’t need the recursive routing.
I also use the same DNS servers on both routers to identify the connection eg. 8.8.4.4 goes always through the LTE connected to the right router.
I will test it at night. Is there a better way to accomplish such a failover?
edit: Interesting use of netwatch to check ping vice standard methods. Dont see any advantages personally.
Especially if your checking both ISPs, only the primary needs to be checked, and used again when back up.
check-gateway=ping is only useable if the modems are unreachable (power outage or bricked, gateway IP is the modem itself)
More often the gpon/dsl/lte connection breaks and that’s why I use netwatch quite often. It’s more easy and clean if the ISPs are on the same router though
If you use check-gateway=ping with the recursive routing, where you ping the virtual gateway (1.0.0.1 or 8.8.4.4 in your case) rather than the physical one, you check the actual transparency of the netwok path through your modem and ISP. The advantage of using recursive routing for this is that you don’t need to modify the configuration upon each state change, but I admit that it may be just a personal preference.
With netwatch, you have finer control about the reaction time (check-gateway pings every 10 seconds and it cannot be changed), and if you want to use more than a single canary IP per uplink, a more complicated scripting is necessary (each netwatch has to publish its own state to a global variable and check the other one’s state from another global variable to avoid false positives), so a scheduled script may be a better option anyway, and in that case, check-gateway with recursive routing makes things simpler.
If you ask why two canaries, I have seen even Google DNS down for a few minutes last year in quite a large area (multiple countries in Europe).
In this case, set routes without using recursive routing.
You need to additionally deny access to the test IP through any other interface than the desired one in the firewall.
For example route add check-gateway=ping comment=“Netwatch LTE” distance=1 dst-address=8.8.4.4/32 gateway=2.2.2.2 will not work if LTE interface 2.2.2.1 is not working. In this case pings to 8.8.4.4 will go via default route for netwatch.
On the MainISP router, disable access to 1.0.0.1 on any other interface besides the WAN 1.1.1.1
/ip firewall filter
add action=drop chain=forward comment="Check ISP1 only WAN "dst-address=1.0.0.1 out-interface=!WAN
On the LTE buckup write a route
/ip route
add check-gateway=ping comment=“Netwatch ISP1” distance=1 dst-address=1.0.0.1/32 gateway=172.16.1.254
and a similar rule
/ip firewall filter
add action=drop chain=forward comment="Check ISP1 only LAN2 "dst-address=1.0.0.1 out-interface=!LAN2
There is no point in checking if the Internet is available via LTE.
PS Do not use 1.1.1.1 2.2.2.2 etc. for the example
According to RFC 5735 it is allowed to use as ip-addresses for the examples in the documentation: 192.0.2.0/24, 198.51.100.0/24 and 203.0.113.0/24
For that, routes to /32 destinations are usually enough, maybe with blackhole ones with higher distance to the same /32 destination where necessary. But you can do that using firewall filter instead of course.
You are right of course, I just got used it that I netwatch both or every ISP. Sometimes I need the functionality to simply switch the distances of the two routes and netwatch still works that way without editing it
No need to configure firewall rules with /32 routes but you can if you want. I don‘t use 1111 or 2222, they just placeholder for my real IP which I don‘t post here
Here is an example, the channel is configured through wlan1 at the moment interface wlan1 is not connected to the access point. If there is no blocked access to 1.0.0.1 through other interfaces, as you can see the route to 1.0.0.1/32 through 192.168.88.1 becomes inactive. However, there is a ping to 1.0.0.1. The netwatch check does not work. Status stays up
There are several ways to block access to the IP through other interfaces.
I in the rule in the netwatch script in addition to switching the route state also reset the existing connections and display a message in the log.
up-script
/ip route set [find comment=ISP1] disabled=no
/ip firewall connection {remove [find]}
:log warning ISP1-UP
PS.
Do not use 1.1.1.1 etc as a placeholder for your real IP address, use a dedicated IP address 192.0.2.0/24, 198.51.100.0/24 and 203.0.113.0/24
If the main line from ISP1 gets disconnected then both routers deactivate the main route and switch to the lte backup which is great.
If the ISP1 line comes back online the main router switches back to it, good.
The LTE backup router on the other hand stays with main route deactivated. Netwatch status says 1.0.0.1 is down but in terminal it’s pingable and the traceroute tells me that icmp packets to that host are going through the main line which is right. I don’t understand why netwatch stays down …
What you mean? We have a standard business LTE with fixed IP. We get a /30 net where we have one IP for our router and one is the gateway which is on site. So it’s possible that you can ping the static ip from the gateway but this doesn’t mean that you have a internet connection.
It’s the same on business dsl rotuers on routing mode where we get a fixed /30 network