WAN failover

Hi everyone,

I’m trying to follow this piece of documentation about WAN failover.

High level overview:

  • Masquerade is set up on the WAN interfaces
  • Mangle routes are setup up on the output chain which mark new connection with a connmark depending on the output interface, and mark connections with a routing mark dependinding on the connmark
  • Routes are set up to be resolved recursively, to allow for pinging an arbitrary host (this I intend to skip - I will use a custom script for link quality measurement); there’s no default route in the main routing table

Two things I find fishy:

  • Except for IPsec packages, packages from the LAN interface won’t go through the output chain
  • How are connections assigned to the two different WANs? The mangle rules filter on the output interface - so how is that decided in the first place since there is no default route in the main routing table?

On the second point, the output interface is of course selected by the routing process that happens before the output chain processing (after which the packet is re-routed depending on the routing mark). Thus it seems to me that, even ignoring the packet flow issue (i.e. packets don’t go through the output chain at all), this setup wouldn’t work at all as it is - it needs some way to assign packets of new connection to the right output interface during the first routing phase.

My impression is that if I were to replicate this setup, packets would simply be dropped as no matching route would be found in the main routing table (default routes are only present in the to_ISPx tables).

Am I missing something?

You are. Check this chapter of the documentation, it will explain you why rules in the output chain don’t handle LAN->WAN packets.

Rarely have I seen anyone use the output chain.
Standard practice is to use the input chain (traffic to and from the router itself) and the forward chain (traffic through the Router WAN to LAN, LAN to WAN, LAN to LAN)

Failover is a very basic premise, basically you set one IP route with a lower distance than the other secondary ISP.
In the primary you state check-gateway=ping which means every 10 seconds or so ensure that the ISP is pingable if not after two attempts the router will switch to the secondary and will keep checking. As soon as its back up the router will revert back to the primary ISP.

Most people prefer to use recursive fail over routing which checks PAST the ISP and actually uses public IP DNS sites to check if the ISP is serviceable, in other words THROUGH the ISP connection.
Why… because apparently its possible that your connection to the ISP is fine and operating but the ISP connection to the internet is down. The recursive method is more fool proof.