I'm using a script to provide automatic failover. To make it work without having to bake all the routing information into the script, is with the following recursive routing setup, divided into three levels:
- 0.0.0.0/0 routes for each gateway, for each routing table needed (incl. the default table). Their gateway is set to 127.100.1.ID, where ID is a number unique for each uplink. These rules have scope=30 target-scope=10, and their distance is dynamically updated by the failover script. So, basically, traffic will use the 127.100.1.ID decided by the failover script.
- the target for level 1. These are either dynamic 127.100.1.ID PPP routes for PPP links (the route address is set by the PPP profile remote-address option), or DHCP-script created static routes of the form 127.100.1.ID with gateway set to the gateway provided by the DHCP server, for DHCP links. These have scope=10 target-scope=10. Their distance doesn't matter as they are all disjunct.
- only for DHCP links: the dynamic entries for the DHCP addresses, added automatically when the address is assigned. These will contain both the router address and the DHCP-provided gateway address, which has been set as the gateway for the Level 2 rule. Their gateway obviously is the DHCP interface (e.g. etherX).
Now, the setup works perfectly for PPP interfaces. However, it breaks in two ways for DHCP/eth interfaces:
- when the DHCP script updates the level 2 route with a new gateway address, the recursion on level 1 is not recomputed. However, this is very easy to solve, by just disabling and re-enabling the level 1 rules for the particular level 2 gateway in the DHCP script
- RouterOS is not "smart enough" to work out directly-connected networks in this situation, for interface-specific locally-originated traffic. For example, if I try to ping 8.8.8.8 from a DHCP interface set up like this, RouterOS sends an ARP request which obviously goes unanswered. If I disable ARP for the interface, RouterOS just sends out all packets by setting the destination MAC address to its own, so obviously the gateway doesn't pick those up. However, if I add a static ARP entry for 8.8.8.8 with the gateway MAC address it works! (meaning that the gateway works just fine).
Let me give you a more practical example of how the routes might look like for a DHCP interface (omitting additional routing tables - should be irrelevant). Let's say the router is assigned 192.0.2.1/30 and the gateway is 192.0.2.2/30, on interface ether8. The routing table would be set up as follows:
Code: Select all
Route Gw Gw (as computed recursively by RouterOS) Note
0.0.0.0/0 127.100.1.5 127.100.1.5 recursive via 192.0.2.2 ether8 Level 1, static route
127.100.1.5 192.0.2.2 192.0.2.2 reachable ether8 Level 2, created by DHCP script
192.0.2.0/30 ether8 ether8 Level 3, automatic route for address
- Static ARP entries. However, I can't have 2^32 of them
- Disabling ARP, wrapping the interface inside a bridge, and using bridge NAT to DNAT all outgoing packets, rewriting the destination MAC address to the address of the gateway. This should work in theory, but DNAT doesn't support specifying the output bridge, so it's probably too resource-intensive for having a simple backup connection on standby 99% of the time. Furthermore, it doesn't work in practice: it appears that locally-originated packets are not matched by the bridge firewall rules, even when disabling hardware offloading.
- Leaving ARP enabled, wrapping the interface inside a bridge, and using bridge firewall to craft ARP-replies for any IP address that the router requests. Again, this should work, but locally-originated packets seem not to be matched.
Now, why do PPPs work nonetheless? That's easy: packets exiting a PPP interface don't use L3->L2 resolution, as you only have one peer by definition (i.e. there's no L2 inside the tunnel). They are just thrown out.
Now, how to fix this? I honestly have no clue. I am inclined to think that there might be some firewall-fu that could be done. But even if there is, I'm not going to do it. This is quite simply a bug! Chosing the outbound interface should not result in information from the routing table being ignored.
Any ideas? (No, not the loopback trick