Sanity checking on a plan for failover with two DHCP client WAN

Hi all, I am bad at layer 3 stuff, so apologies if I’m missing some obvious things.

I’ve read a whole bunch of prior art on these forums, reddit, and elsewhere on doing failover on mikrotik routers. The default docs with recursive routing works reasonably well if I have a static gateway on my ISP. And it does for short term testing on my side. Unfortunately, both of my ISP (fiber and cable) will periodically renew my lease onto a wholly different gateway. This means I can’t follow that doc verbatim. So I have an almost-as-simple solution wherein I set my DHCP client to not add default routes, and instead do it via a script. Here’s the first proof of concept, which is dirt simple and just replicates the default DHCP client default routes, with no recursive routing or anything:

{
:log info "Executing DHCP client script for Cable"
:local count [/ip route print count-only where comment="cable"]

:if ($bound=1) do={
  :log info "DHCP bound via cable WAN"
  :if ($count = 0) do={
    :log info "No existing ip route for cable WAN, adding routes"
    /ip route add dst-address="0.0.0.0/0" gateway=$"gateway-address" comment="cable" distance=5
  } else={
    :local existing [/ip route find where comment="cable"]
    :log info "Existing cable WAN route is $existing but needs new gateway-address"
    /ip route set $existing gateway=$"gateway-address"
  }
} else={
  :log info "Not bound, removing cable WAN routes"
  /ip route remove [find comment="cable"]
  
}
}

Assume that there’s a similar block for the fiber WAN at a different distance. That mimics the DHCP client default behavior (approximately, I’m sure I’m missing some details). If I want to have it do recursive routing following the docs, it’s almost the same, just with a couple extra bits? This is where I get a bit stuck:

{
:log info "Executing DHCP client script for cable"
:local count [/ip route print count-only where comment="cable"]

:if ($bound=1) do={
  :log info "DHCP bound via cable WAN"
  :if ($count = 1) do={
    :log info "No existing ip route for cable WAN, adding routes"
    /ip route add dst-address="0.0.0.0/0" gateway=$"gateway-address" comment="cable" distance=5
    /ip route add dst-address="8.8.8.8" gateway=$"gateway-address" comment="cable-google-dns" target-scope=10
    /ip route add distance=5 gateway="8.8.8.8" comment="cable-gateway-ping" target-scope=11 check-gateway=ping
  } else={
    :local existing [/ip route find where comment="cable"]
    :local existingDNS [/ip route find where comment="cable-google-dns"]
    :log info "existing route is $existing but needs new $gateway-address"
    # TODO(BadAtLayerThree): Clean up with foreach?  Especially if we do more than one gateway ping.
    /ip route set $existing gateway=$"gateway-address"
    /ip route set $existingDNS gateway=$"gateway-address"
  }
} else={
  :log info "Not bound, removing cableroutes"
  /ip route remove [find comment="cable"]
  /ip route remove [find comment="cable-google-dns"]
  /ip route remove [find comment="cable-gateway-ping"]
}

}

Again with a similar block for the fiber ISP. The questions I have are then:

  • Is that a reasonable approach to recursive routing with no static gateway on my WAN’s DHCP?
  • Does that recursive routing setup do conntrack clearing? (
/ip firewall connection remove [find]

or equivalent), or is it not even necessary? Or would I need to reach for a netwatch script to accomplish more seamless failover when the primary ISP fails?

  • If I do need to reach for a netwatch script, what’s the best approach to swap ISP? I’ve seen people suggest switching distance high/low based on what’s alive, but if I’ve got DHCP scripts operating separately, then I’ve got race conditions and would need some reconciliation script, and it seems a little brittle.

Thanks for any guidance, and even if not, I figured it might be good visibility to post this proposed solution since most of the guides assume that the gateway of the ISP is static. Wish I could get a static gateway (or better yet, static IP), but alas, my ISPs are not willing.

For completeness, the only other modifications I needed to make from the default router’s config were:

  • pick some specific port for the second WAN (in my case sfp1)
  • drop that from the bridge, add it at as a WAN
  • add those dhcp scripts (saving in flash/some-file-name.rsc)
  • change the default DHCP client for WAN1 to not add default routes, and run script that’s basically just /import flash/fiber-dhcp-client.rsc
  • add another DHCP client for WAN2, same as above, with its own dhcp client script
  • Do the masquerade step in this doc