Trying to understand this config: "MikroTik Automatic ISP Fail-over without scripts or route marking"

I’m trying to setup dual wan failover on my mikrotik that has dynamic IPs like most of us do and I came across this post on reddit recommending doing it this way without scripts or route marking and I’ve set it up and works exactly like I expect it to work but I’m trying to understand how this set up works and what it does.

I see that I’m adding a route that sends all my traffic to QUAD9 DNS servers 0.0.0.0/0 to 9.9.9.9 for example and is bit worrysome but check ping is checked so is it just pining 9.9.9.9?. Or am I misunderstanding how this configuration works and is this a good configuration for dual wan failover?

I can’t link to the authors blog post, the website doesn’t exist but wayback machine had a copy of his post so I’ve attached the pdf version here.
MikroTik Automatic ISP Fail-over without scripts o.pdf (151 KB)

What you are doing is “recursive failover”.
There is a nice guide here:
http://forum.mikrotik.com/t/advanced-routing-failover-without-scripting/136599/1

See if you get the idea from that, then, if you still have doubts, ask.

Guides are useless, its understanding that is required.

The typical failover has the structure
add dst-address=0.0.0.0/0 distance=2 gateway=ISP1 routing-table=main
add dst-address=0.0.0.0/0 distance=4 gateway=ISP2 routing-table=main

Due to the fact that ISP1 has a lower distance for routing in the main table, it will be picked for traffic over ISP2 every time.

The questions to ask are what happens when:
a. physical port goes down, lets say WAN1 is on ether1
b. ISP1 gateway is not available
c. ISP1 gateway is available but ISP connection to the WWW is not working.

a. Now if there is a problem with the physical port, that ISP1 uses, then the router will detect that and make the route NOT available.
Result1: Router switches traffic to WAN2, assuming its available.
Result2: If the port comes back live, then the router will switch to the active route with lower distance.

b. Now if the port is fine but the there is a problem between your router and the ISP gateway, the router will not detect this and will keep the route active
Result: No traffic as the router will continue to send traffic to WAN1 and there will be no internet traffic available to users.

c. Now if you can reach the ISP gateway on WAN1 no problem, but the ISP has an issue at their end where their gateway is not able to connect to the WWW, the router will not detect this and will keep the route active and attempt to send traffic over WAN1, with no resultant success.

How do we deal with situations b. and c. ??

b. STANDARD FAILOVER ADDITIONcheck-gateway=ping
add dst-address=0.0.0.0/0 check-gateway=ping**** distance=2 gateway=ISP1 routing-table=main
add dst-address=0.0.0.0/0 distance=4 gateway=ISP2 routing-table=main

The router now ensures that the path from your Router to the ISP gateway is pingable aka working, and if so all is good. IF unable to ping successfully ( tries every 10 secs, twice )
Result1: Router switches traffic to WAN2 if available
Result2: The ping check keeps occurring and when WAN1 is back online, the router switches traffic BACK to WAN1.

Note: In some cases you may have most users on WAN1 and some on WAN2, so sometimes we use check-gateway=ping for both WANs.

++++++++++++++++++++++++++++++++++++++++++++++++

The above works great until it doesn’t. Specifically there are situations where YOUR connection to the ISP is working great, but the ISP is failing to connect to the gateway, AKA a problem at their end.

c. Recursive routing means we use the ISP gateway to reach a known USUALLY available DNS site, like google or cloudflare.
In other words, the check-gateway=ping is inadequate in some situations.

So we create two hops where previously there was one hop to the ISP gateway for all IP addresses.

First we create a furthest hop to the DNS site for all IP addresses (recursive)
add dst-address=0.0.0.0/0 check-gateway=ping distance=2 gateway=8.8.8.8 routing-table=main

Second we create the closer hop from the gateway to the DNS site, ensuring distance is the same. ( resolving direct connected route )
add dst-address=8.8.8.8//32 distance=2 gateway=ISP1 routing-table=main

There are specific TARGET SCOPE and SCOPE rules to follow for recursive starting in version 7, these also work in Version 6.
1-Basically the Further HOP should have a higher Target scope then the closer hop. The farther one gets from the router, the TS increases by one.
2-The resolving direct connected router (closer hop) must have a SCOPE that is EQUAL TO or LESS than the TS of the closest farther HOP.
( generally we select a scope for all routes that is less than the lowest TS on any routes).

add dst-address=0.0.0.0/0 check-gateway=ping distance=2 gateway=8.8.8.8 routing-table=main scope=10 target-scope=12
add dst-address=8.8.8.8/32 distance=2 gateway=ISP1 routing-table=main scope=10 target-scope=11
add dst-address=0.0.0.0/0 distance=4 gateway=ISP2 routing-table=main

No need to do recursive on WAN2, as if WAN1 is not available, matters little if WAN2 is not available.
However, many folks do both and would look like.

add dst-address=0.0.0.0/0 check-gateway=ping distance=2 gateway=8.8.8.8 routing-table=main scope=10 target-scope=12
add dst-address=8.8.8.8/32 distance=2 gateway=ISP1 routing-table=main scope=10 target-scope=11
add dst-address=0.0.0.0/0 check-gateway=ping distance=4 gateway=9.9.9.9 routing-table=main scope=10 target-scope=12
add dst-address=9.9.9.9/32 distance=4 gateway=ISP2 routing-table=main scope=10 target-scope=11

Note1: Use a different DNS provider for each WAN for independent checking
Note2: Dont use the same DNS remote servers as your recursive checkers.

Yes you can use multiple DNS to check the same gateway.
add dst-address=0.0.0.0/0 check-gateway=ping distance=2 gateway=8.8.8.8 routing-table=main scope=10 target-scope=12
add dst-address=8.8.8.8/32 distance=2 gateway=ISP1 routing-table=main scope=10 target-scope=11
add dst-address=0.0.0.0/0 check-gateway=ping distance=4 gateway=1.1.1.1 routing-table=main scope=10 target-scope=12
add dst-address=1.1.1.1/32 distance=4 gateway=ISP1 routing-table=main scope=10 target-scope=11
add dst-address=0.0.0.0/0 distance=6 gateway=ISP2 routing-table=main

In this scenario, your router check first WAN1 via google and if not available checks cloudflare on WAN1.
This rules out the extreme rare case google is not available as the cause of no traffic.

Thank you @anav for taking the time to put this explanation together. I understand it now.

@anav
Bravo.