Double ISP on v7 issue

Hello!

Need help with migrating double ISP redundancy config from v6 to v7 (due to new hardware not supporting v6 anymore).

This is a test lab setup, hence both ISP1 and ISP2 routes have the same internet GW address of 10.3.1.1.

ISP failover logic is implemented using custom script (ping through both ISPs, change primary internet route distance, disable and then re-enable ethernet interface which corresponds to the failed ISP to clear connections) which functions correctly and is irrelevant at this point.

Unfortunately, I’ve encountered issues with ghost GRE-tunnels indicating that something is off with packet rounting in my configuration, and any advice is welcome.

/ip address
add address=10.3.150.10/16 comment=ISP1 interface=WAN1 network=10.3.0.0
add address=10.3.150.11/16 comment=ISP2 interface=WAN2 network=10.3.0.0
add address=192.168.0.1/24 comment=LAN interface=LAN network=192.168.0.0

/routing table
add disabled=no fib name=ISP1
add disabled=no fib name=ISP2

/ip firewall mangle
add action=mark-connection chain=input comment="ISP 1" dst-address=10.3.150.10 \
    in-interface=WAN1 new-connection-mark=ISP1-in passthrough=no
add action=mark-connection chain=input comment="ISP 2" dst-address=10.3.150.11 \
    in-interface=WAN2 new-connection-mark=ISP2-in passthrough=no
add action=mark-routing chain=output comment=ISP1 connection-mark=ISP1-in \
    new-routing-mark=ISP1 passthrough=no
add action=mark-routing chain=output comment=ISP2 connection-mark=ISP2-in \
    new-routing-mark=ISP2 passthrough=no

/ip route
add comment=ISP2Route disabled=no distance=2 dst-address=0.0.0.0/0 gateway=\
    10.3.1.1 routing-table=main scope=30 suppress-hw-offload=no target-scope=10
add comment=ISP1Route disabled=no distance=1 dst-address=0.0.0.0/0 gateway=\
    10.3.1.1 routing-table=main scope=30 suppress-hw-offload=no target-scope=10
add comment=ISP1 disabled=no distance=1 dst-address=0.0.0.0/0 gateway=10.3.1.1 \
    routing-table=ISP1 scope=30 suppress-hw-offload=no target-scope=10
add comment=ISP2 disabled=no distance=1 dst-address=0.0.0.0/0 gateway=10.3.1.1 \
    routing-table=ISP2 scope=30 suppress-hw-offload=no target-scope=10

Adding the interface to the gateway might help. Something like:

add comment=ISP2 disabled=no distance=1 dst-address=0.0.0.0/0 gateway=10.3.1.1%WAN2 \
    routing-table=ISP2 scope=30 suppress-hw-offload=no target-scope=10
1 Like

What is the purpose of mangling>>>>>>>>>>> Are you attempting load balancing of some sort??
OR simply want to ensure traffic coming in WAN1 goes back out WAN1 and traffic coming in WAN2 goes out WAN2, both WANS are sharing the load as in ECMP load balance.

Here is the problem with your test setup:

When you configure the IP addresses on the interface like this:

You create two ECMP connected routes for the destination 10.3.0.0/16. ECMP because the destinations are the same and the distance are the same (purple arrows in the screenshot above). As a result, the two routes have the + sign (orange arrows) which denotes ECMP (equal cost multi-path).

Afterwards, it doesn't matter what distance values and tables you choose your default routes (destination 0.0.0.0/0), if the gateway is specified as 10.3.1.1. Next-hop lookup will see that the destination 10.3.1.1 have the best route which is the ECMP group with destination 10.3.0.0/16. As a result, all the 4 routes with the green marks you see in the screenshot become ECMP routes too (see + signs with blue arrows), and all have the same two possible next-hops / immediate gateway candidates.

For all those routes, as the exit gateway, there is a 50%-50% chance that the connections go out of WAN1 (ether2 in the screenshot) or WAN2 (ether3 in the screenshot). And the distribution (which depends only on the source addresses and destination addresses of the packets) are the same regardless of the routing table chosen. Your mark-connection and mark-routing mangle rules are useless.

In the following screenshot you can see the exit gateway depends on the destination address (different src-address/dst-address pairs produce different ECMP hashes):

If you specify the interface (as %WAN1 or %WAN2) in the gateway as @rplant suggested above, then you avoid this ECMP situation:

The routes will have the expected exit gateways and no longer have the ECMP (+ sign) flag. And now the same exit gateway is used for all destinations (active route in the main table)

Forcing the route to the ISP2 table also results in the correct exit gateway to be used:

So CGGX, if the person wants load balancing ECMP style then just remove mangling, simple clean done.
If the op wants WAN1 primary and WAN2 secondary (no load balancing), then use the %symbol in the routes?

In OP case the problem is that the router has services, such as GRE as he wrote, that should not be load balanced. A packet of that service arriving on WAN2 of the router should cause the response to it to come out of WAN2. That's why he has the mangle mark-connection rules on the chain input, not prerouting or forward (it's for connections destined for the router itself), and the mark-routing rules are for the output chain, acting only on the response packets the the router sends back to the other side.

So, connection coming through WAN2 have the response packets correctly marked to use the ISP2 routing table. But the problem is that, due to the config using the same gateway address 10.3.1.1 without interface notation, and the fact that destination 10.3.1.1 can be satisfied by ECMP route, the result is that:

  • It doesn't matter the routing mark or the routing table chosen, the next-hop selection is the same, and...
  • That next-hop selection, because of ECMP, has a 50% chance of putting the packet through the wrong gateway.

Which means incoming connections on WAN2 are marked, but there is no difference if they are not marked or wrongly marked. And the response can go out of WAN2 or WAN1, depending on the address pair (hashing src-address and dst-address decides the ECMP branch chosen).

To make it works, OP has to make sure that the ISP2 and ISP1 tables have different immediate gateways (and the matching exit interface) for the default route, which is the case in my 3rd screenshot above.

1 Like