Advanced Routing Failover without Scripting

Correct. In forwarding, you have ‘prerouting’ step before routing decision, and in Output, the process starts from routing decision, then mangle - so there’s additional “Routing adjustment” step: https://help.mikrotik.com/docs/display/ROS/Packet+Flow+in+RouterOS#PacketFlowinRouterOS-Output

So, to get to routing adjustment, you need a successful initial routing decision (for not yet marked packets)

Unfortunately I observe a problem with Hairpin NAT that is related to the recently added config for WAN Failover.

Since longtime I have two NAT rules very similar to those described in https://help.mikrotik.com/docs/display/ROS/NAT, the second one being required to access my reverse proxy web server also from LAN.

/ip firewall nat add chain=dstnat action=dst-nat dst-address=172.16.16.1 dst-port=443 to-addresses=10.0.0.3 to-ports=443 protocol=tcp
/ip firewall nat
add action=masquerade chain=srcnat dst-address=10.0.0.3 out-interface=LAN protocol=tcp src-address=10.0.0.0/24

Now web server doens’t work anymore from LAN and the issue is related to the recent changes. After further investigation I noticed that dstnat rule is applied but not subsequent srcnat rule.

I also noticed that, after dstnat rule application, in filter forward chain, packets appear to have the private ip address of web server as destination but the WAN as out-interface and not the LAN anymore!

The issue can be related to a “too wide” application of routing mark in mangle prerouting.

 1 X  ;;; Routing Mark for ISP1 (load balancing default)
      chain=prerouting action=mark-routing new-routing-mark=ISP1 passthrough=yes src-address-list=rfc6890_not_global_ipv4 
      dst-address-list=!rfc6890_not_global_ipv4 log=no log-prefix="prerouting-mark-ISP1" 

 2 X  ;;; Routing Mark for ISP2 (load balancing exceptions, by interface list)
      chain=prerouting action=mark-routing new-routing-mark=ISP2 passthrough=yes src-address-list=rfc6890_not_global_ipv4 
      dst-address-list=!rfc6890_not_global_ipv4 in-interface-list=ROUTED ISP2 log=no log-prefix="prerouting-mark-ISP2"

It is because in prerouting packets still have public ip of web server so they get marked for ISP1?

A good way to prevent?

Mangle-prerouting is before DST-NAT: https://help.mikrotik.com/docs/display/ROS/Packet+Flow+in+RouterOS#PacketFlowinRouterOS-Forward

So yes, prerouting sees original Dst. IP Address.

The best way I can think of in 10 seconds is “dst-address-type=!local”, so packets to the router’s addresses are not marked

Trying this instead, seems ok

 1    ;;; Routing Mark for ISP1 Preferred (load balancing default)
      chain=prerouting action=mark-routing new-routing-mark=ISP1 Preferred passthrough=yes dst-address-type=!local 
      src-address-list=rfc6890_not_global_ipv4 dst-address-list=!rfc6890_not_global_ipv4 log=no log-prefix="prerouting-mark-ISP1" 

 2    ;;; Routing Mark for ISP2 Preferred (load balancing exceptions, by interface list)
      chain=prerouting action=mark-routing new-routing-mark=ISP2 Preferred passthrough=yes dst-address-type=!local 
      src-address-list=rfc6890_not_global_ipv4 dst-address-list=!rfc6890_not_global_ipv4 in-interface-list=ROUTED ISP2 log=no 
      log-prefix="prerouting-mark-ISP2"

I came to the same conclusion in the meanwhile :smiley:

Hello Chupaka,

Thanks a lot for your documentation.
I’m a newbie in network, and I try to set up failover on a hAP ax lite LTE6, with WAN1 on ether1 and WAN2 on LTE1.
Could you please explain why you use 10.X.X.X (private addresses) as virtual hops? and as GW for clients?

Thank you for your help.

Just because they are not used anywhere else.

Router IP address is a GW for clients.

Private Address “hopping” is required when one wants to do nested recursive routing. It is not essential but can be more efficient in certain cases.
There is nothing wrong with sticking to standard “Flat” recursive routing

@unlikely, two points regarding external traffic reaching the Router.

  1. The first case is external traffic to the Router such as VPN handshake. There are two options to handle that

a. if there is no mangling on the device and one wants to avoid the mangling then simply use routing rules ( predicated on a static fixed WANIP ).

/add fib table=preferWAN1
/add fib table=preferWAN2
/routing rule add src-address=WANIP1 action=lookup-only-in-table table=preferWAN1 { we ensure wan1 replies via ISP1 }
/routing rule add src-address=WANIP2 action=lookup-only-in-table table=preferWAN2 { we ensure wan2 replies via ISP2 }
/ip route
add dst-addres=0.0.0.0 gateway=ISP1-gw-IP routing-table=preferWAN1
add dst-addres=0.0.0.0 gateway=ISP2-gw-IP routing-table=preferWAN2

b. For a dynamic WANIP and or if there is mangling involved and since mangling over rides routing rules, if overlapping potentially exists, sometimes its safer to go all mangling and we use the input/output chains to ensure that regardless of the initial routing selection according to IP routes, the return traffic from the router itself is modified, on the way out, as per the route-marks.

_/routing-table add fib name=viaISP1
/routing-table add fib name=viaISP2

/ip firewall mangle
add action=mark-connection chain=input connection-mark=no-mark
in-interface=ether1 new-connection-mark=WAN1 passthrough=yes
add action=mark-connection chain=input connection-mark=no-mark
in-interface=ether2 new-connection-mark=WAN2 passthrough=yes

add action=mark-routing chain=output connection-mark=WAN1
new-routing-mark=viaISP1 passthrough=no
add action=mark-routing chain=output connection-mark=WAN2
new-routing-mark=viaISP2 passthrough=no

/ip route
add distance=1 gwy=WAN1-gw-IP table=main {standard route}
add distance=2 gwy=WAN-gw-IP2 table=main {standard route}
add dst-address=0.0.0.0/0 gwy=WAN1-gw-IP table=viaISP1
add dst-address=0.0.0.0/0 gwy=WAN2-gw-IP table=viaISP2_

  1. To ensure external traffic after reaching internal servers goes out the same WAN, we use mangling as above but in this case use the forward/PreRouting chains.

_/routing-table add fib name=viaISP1
/routing-table add fib name=viaISP2

/ip firewall mangle
add action=mark-connection chain=forward connection-mark=no-mark in-interface=ether1
dst-address-list=server-List-A new-connection-mark=WAN1 passthrough=yes
add action=mark-connection chain=forward connection-mark=no-mark in-interface=ether2
dst-address-list=server-List-B new-connection-mark=WAN2 passthrough=yes

add action=mark-routing chain=prerouting connection-mark=WAN1
new-routing-mark=viaISP1 passthrough=no
add action=mark-routing chain=prerouting connection-mark=WAN2
new-routing-mark=viaISP2 passthrough=no

/ip route
add distance=1 gwy=WAN1-gw-IP table=main {standard route}
add distance=2 gwy=WAN-gw-IP2 table=main {standard route}
add dst-address=0.0.0.0/0 gwy=WAN1-gw-IP table=viaISP1
add dst-address=0.0.0.0/0 gwy=WAN2-gw-IP table=viaISP2_

Note: Ensuring proper sourcenat rules for the above traffic is important.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Regarding hairpnat… ( when users in same subnet of Server are required to reach server by the domain name/url )

  1. forward firewall rule
    add chain=forward action=accept connection-nat-state=dstnat

  2. Hairpin sourcenat rule
    add chain=forward action=accept src-address=ServerSubnet dst-address=ServerSubnet

  3. Properly formatted dstnat rules.
    (static wanip)
    add chain=dstnat action=dst-nat dst-port=XXX protocol=yyy dst-address=staticWANIP to-addresseses=ServerLANIP ( to-ports only req’d if doing port translation )
    (dynamic wanip)
    add chain=dstnat action=dst-nat dst-port=XXX protocol=yyy dst-address-list=MyWAN to-addresseses=ServerLANIP

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Personally I would get my routing, NAT and Servers working FIRST, then worry about queues…

You still can’t use interface routes (“gateway=1.2.3.4%ether1”) for the recursive routes failover in RouterOS 7.x ?

I agree that it should work on Vers7, but why bother?
Lets think through the logic!

What are the chances that one IP address is working and the other NOT working with a single ISP provider (and gateway).
Probably checking one WANIP through the same gateway is all that is needed.
Now its remotely possible there is some different piece of hardware gear within the ISP used for the different WANIPs, but I have my doubts.
So unless one has a third Tertiary Route, ANOTHER OPTION to go to… if the network is down due to to ISP hardware fault, its down for both.

Several Starlink routers used as ISPs. You can’t change the router’s GW address (bypass mode is a workaround, but it’s not always acceptable), and it’s a fairly common situation where one of the terminals loses Internet connection, but the router is still pingable.

You can’t change the router’s GW address (bypass mode is a workaround, but it’s not always acceptable)

I See no problems in adding multiple Starlink gateways or even multiple cable modems that operate in the same providers subnet and offer the same LAN subnet … If you accept SRC-NAT.
Those ISP gateways are identified by a separate local Ip address in my network. Therefore they are easily addressable as specific devices.
The MT facing the ISP gateway will do the SCR-NAT to the required subnet for the modems and gateway. From that MT onwards they can all be the same.

Cable modem provider here uses very large subnets for its (2048,4096, 8192) clients modems WAN IP addresses. So neighbors are in the same WAN subnet. Getting there via another (eg DSL) provider while load balancing could be a problem.
I did use load balancing on multiple cable modems from the same ISP, to have a muitiple of the modems bandwidth. (Yes it goes over the same ISP backbone, but that has a much larger capacity than the cable modem). The same applies to the 150Mbps Starlink connections.

The Starlink APP connects to the original Starlink-gateway or Dishy fixed IP address only. It will take the then active LAN route to access that GW or Dishy on the original IP address. That is only a potential problem for the local Starlink traffic, not for the forwarded traffic.

I would like to have logging of state changes of routes with “check-gateway”.
I.e. when the state of these routes changes (up to down or down to up) a message is logged with at least the dst-address, gateway, and routing-table.
Right now, it is largely invisible how failover solutions like this perform. We had an internet outage and the failover did not quite work correctly, but in the logs (that are saved to a logserver) it isn’t even possible to see when exactly the route with check-gateway went down. :frowning:

Do you use netwatch?

one way is enabling some aditional logs, the problem is that will generate abundant logs maybe your are not interested or neeeding

/system logging
add topics=route

Example of logs generated when recursive route was down and up

11:37:51 route,debug,calc 2.2 Merge forwarding path updates
11:37:51 route,debug,calc Prepare queued IP/9.9.9.9/30-12/2
11:37:51 route,debug,calc Disqualified fwp IP/9.9.9.9/30-12/2
11:37:51 route,debug,calc Resolving IP/9.9.9.9/30-12/2
11:37:51 route,debug,calc Resolve as unreachable, gateway is not active
11:37:51 route,debug,calc 3 Main publish
11:37:51 route,debug,calc 6.1 Cleanup merge
11:37:51 route,debug,calc 3 Main publish
11:38:40 route,debug,calc 2.2 Merge forwarding path updates
11:38:40 route,debug,calc Prepare queued IP/9.9.9.9/30-12/2
11:38:40 route,debug,calc Set initial reachability for IP/9.9.9.9/30-12/2
11:38:40 route,debug,calc Apply reachability to IP/9.9.9.9/30-12/2
11:38:40 route,debug,calc Resolving IP/9.9.9.9/30-12/2
11:38:40 route,debug,calc Resolved link IP/9.9.9.9/30-12/2 via 9.9.9.9->IP/192.168.108.1/11-10/0 FLD{1} rr tr has metric BEST/32
11:38:40 route,debug,calc 3 Main publish
11:38:40 route,debug,calc 6.1 Cleanup merge
11:38:40 route,debug,calc 3 Main publish

But does that include up/down messages for routes with check-gateway?
I am asking this in the context of this topic, i.e. using recursive routes with ping check to implement failover.
Failover seems to work but one never knows if it is happening…
Indeed I fear that this topic will include all kinds of other routing debug.
With !bgp it removes BGP debugging but still every second there are several messages with calc topic which probably also will include the up/down messages.
Well, I will test…

I tested on a debug router but it isn’t really a viable solution.
There are messages logged but they all have topic route,debug,calc and there is no specific message about the recursive route that goes up/down, only information about the reachability of a specific address occurring somewhere in a check-gateway, like:

Resolved link IP/8.8.8.8/30-25/2 via 8.8.8.8->IP/192.168.200.1/20-10/2 FLD{1} rr tr has metric BEST/32

I’m trying to adapt the original setup to Ros 7.16.2, but without success. The original setup suggested to me, that selecting Host1, Host2 to test a connection would not result in loosing backup on these addresses. It would make sense to me to test if an important address, like the headquarter is reachable.
I started with the setup posted at the beginning, only added mangle rules to select the preferred isp as ISP1 and to mark the connections to see in firewall-connections window what is happening:

:local Host1 1.1.1.2
:local Host2 1.1.1.3

/ip address
	add address=$IP1 interface=ether1
	add address=$IP2 interface=ether2
	
/ip route
	add dst-address=$Host1 gateway=$GW1 scope=11
	add dst-address=$Host2 gateway=$GW2 scope=11
	
	
/routing table
	add name=ISP1 fib
	add name=ISP2 fib
	
/ip route
	add distance=1 gateway=$Host1 target-scope=11 routing-table=ISP1 check-gateway=ping
	add distance=2 gateway=$Host2 target-scope=11 routing-table=ISP1 check-gateway=ping
	
	add distance=2 gateway=$Host1 target-scope=11 routing-table=ISP2 check-gateway=ping
	add distance=1 gateway=$Host2 target-scope=11 routing-table=ISP2 check-gateway=ping
	
/ip firewall nat
	add chain=srcnat out-interface=ether1 action=masquerade
	add chain=srcnat out-interface=ether2 action=masquerade
	
/ip firewall mangle
	add chain=output connection-mark=no-mark action=mark-routing new-routing-mark=ISP1
	# the first packet from outide marks the connection - visible in firewall-connections
	add chain=prerouting in-interface=ether1 action=mark-connection new-connection-mark=ether1 passthrough=no
	add chain=prerouting in-interface=ether2 action=mark-connection new-connection-mark=ether2 passthrough=no

/ping 8.8.8.8 gave “no route to host” error, the mangle rule that sat the routing mark to ISP1 dis did not count here. So I copied the rule from table ISP1 to main:

/ip route
add distance=1 gateway=1.1.1.2 target-scope=11 check-gateway=ping
add distance=2 gateway=1.1.1.3 target-scope=11 check-gateway=ping

After this /ping 8.8.8.8 worked, changing the mangle rule to prefer ISP2 also worked.
To test failover I blocked forwarding packets on the next hopp to ether1 (lab environment) while ISP1 was the preferred.
After this /ping 8.8.8.8 gave 10 times “no route to host”, about 30 times “timeout”, 10 times received reply and than all over again “no route to host”-“timeout”-“reply”. The route status changed to USHI for only a short period for the non working route, most of the time the status was AS. I’ve put the also the recursive rules into the routing tables:

/ip route
add dst-address=1.1.1.2 gateway=$GW1 scope=11 routing-table=ISP1
add dst-address=1.1.1.3 gateway=$GW2 scope=11 routing-table=ISP1
	
add dst-address=1.1.1.2 gateway=$GW1 scope=11 routing-table=ISP2
add dst-address=1.1.1.3 gateway=$GW2 scope=11 routing-table=ISP2

This stabilized the situation, /ping 8.8.8.8 worked continuously - I waited 50seconds.
If you observed, I ended up with 3, almost identical routing tables and lost backup on Host1 and Host2.

Is it really necessary to put both routing rules (one with scope=11 and the one with target-scope=11) in all the routing tables? And when a mangle rule selects the routing table, is it mandatory that also the main table contains a rule routing that ip address? A rule that is not used after all.

Yes, you need to do that.
Because the probe packets to check the routes are sent using the main table and your actual traffic is sent via the ISPx table.
Unfortunately there is no way to auto-copy some routes between different tables, so you need to do that manually.
(MikroTik will tell you to use VRF but I would not know how to implement this thing using VRF instead of manually configured route tables)