I am having a bit of an issue getting a working routing configuration with a pair of MikroTiks.
My configuration consists of two RB750 routers each connected to separate network switches via eth2 (port 3). The switches are connected together via a single port. The routers are connected to each other via eth1 (port 2). Both of the routers have a connection to an upstream provider via eth0 (port 1).
The eth0 interfaces each have a /30 for routing to the upstream provider. The default route on both routers has a gateway on this interface and the gateway is monitored via ping. This is a "manual fail-over" configuration. The eth0 interface on both routers has the same address so someone can simply move the cable if one router should cease to function. Yes, I know. Don't ask. This part is functioning exactly as intended.
The eth1 interfaces have addresses in the 10.0.20.0/30 network. This is so that no matter what is happening on the eth2 interface, the routers can always talk to each other. This is also configured as a default route with a metric of 2. This ensures that if a router does not have the cable connected to eth0 it can still route packets out to the world at large. This part is not functioning properly because of what is happening on eth2.
The eth2 interfaces have a VRRP interface sitting on top. Both the eth2 interfaces and the VRRP interface have addresses in 10.0.10.0/24. The VRRP functions perfectly and the other systems connected to the switches have no problem with the gateway moving around.
The problem comes in with what happens when the switch connected to the router with the uplink cable attached (we can call it router1) fails. Or the link from that router to the switch fails. It does not matter either way. In this scenario, the VRRP fails the gateway to the router without the uplink cable attached (router2). Packets destined for a non-local network are then sent to router2 as expected. Since router2 has no uplink, the default route with a metric of 1 is not active and router2 uses the default route with a metric of 2 to forward packets to router1 via the 10.0.20.0/30 network on eth1. Then router1 forwards packets out over the uplink. So far so good.
Inbound packets are a different story. The packets arrive across the uplink to router1. An address in the destination network exists on one of the router interfaces so the connect route is selected even though the link is down. As a result, the packets never get forwarded to router2 so they can be forwarded on to the network. In testing, if I disable the 10.0.10.x address on router1 and thus remove the connect route, everything begins working perfectly.
So here is the rub. I need some way to keep the router from routing packets when the interface is down. I use "down" here fairly loosely as I would like to make sure other hosts are reachable even if the link is up. I think I can use netwatch and some scripting to simply disable the address like I am in testing, but I want to make sure there is not some better way I am overlooking.
Things I cannot change:
- One uplink only.
- VRRP... I have to support clients which have the ability to only talk to a single gateway. Unless there is another way to move the gateway IP address, VRRP is it.
Any input would be much appreciated. Cheers.