modifying route distance dual wan

cyayon · December 29, 2022, 8:32am

Hi,

I am going to be crazy with this issue.
I have 2 wan :
WAN1 primary : dhcp-client inside a bridge (to do bridge-filters and modifying COS, required by ISP)
WAN2 secondary : standard interface to a tier router

WAN1 default route distance is 1
WAN2 default route distance is 2

/ping address=8.8.8.8 count=3 interface=wan1 → ok
/ping address=8.8.8.8 count=3 interface=wan2 → ok
I can confirm that it worked as expected because latencies are differents one each link and it match.

/ip route set [find where dst-address=0.0.0.0/0 and gateway=<wan1_gw>] distance=10
/ping address=8.8.8.8 count=3 interface=wan1 → KO
/ping address=8.8.8.8 count=3 interface=wan2 → ok

/ip route set [find where dst-address=0.0.0.0/0 and gateway=<wan1_gw>] distance=1
/ping address=8.8.8.8 count=3 interface=wan1 → ok
/ping address=8.8.8.8 count=3 interface=wan2 → ok

Why ?

I tried routing tables and rules, mangle output/prerouting, etc… No way.
Perhaps a same issue with recursive routing to detect wan failure do not work… Or i didn’t manage to make it work…

I am using a CCR2004 on ROS 7.6, i tried ROS 7.7rc3. no change

thanks

pcunite · December 29, 2022, 1:51pm

This can be tricky to get correct. I may do an entire article series on this someday. At the moment, forum member anav has an extensive resource here. See section I. IP ROUTE - Multi-WAN. If you prefer a video format, this example is useful.

Take it slow understanding this concept. I know you want it all figured out now, but its going to take a while to get it and adapt to MikroTik’s way of doing things.

sindy · December 29, 2022, 2:00pm

Just a suggestion for investigation, as I could only test part of it on a remote device I cannot afford to tamper with too much.

If you specify an interface as a parameter of the ping command, RouterOS doesn’t look for the best route out of those whose gateway interface is the specified one, but it sends an ARP request from that interface, and if it gets a response, it sends the echo request packet to the MAC address returned. But it seems that before doing that, it queries the normal routing, and if the gateway interface indicated by normal routing is the one specified for the ping, it sends the echo request to the MAC address of the corresponding gateway without first sending an ARP request for the actual destination of the ping.

So the difference between your wan1 and wan2 may be the support of the proxy-arp functionality on the adjacent routers (gateways) - if my assumption is correct, the one connected to wan2 supports it and the one connected to wan1 doesn’t. So ping with interface=wan1 only works if wan1 is the gateway interface of the active default route.

Sniffing on your system in both states (wan1 distance=1 and wan1 distance=10) will show you whether my assumption is correct.

cyayon · December 29, 2022, 3:50pm

Just a suggestion for investigation, as I could only test part of it on a remote device I cannot afford to tamper with too much.

If you specify an interface as a parameter of the ping command, RouterOS doesn’t look for the best route out of those whose gateway interface is the specified one, but it sends an ARP request from that interface, and if it gets a response, it sends the echo request packet to the MAC address returned. But it seems that before doing that, it queries the normal routing, and if the gateway interface indicated by normal routing is the one specified for the ping, it sends the echo request to the MAC address of the corresponding gateway without first sending an ARP request for the actual destination of the ping.

So the difference between your wan1 and wan2 may be the support of the proxy-arp functionality on the adjacent routers (gateways) - if my assumption is correct, the one connected to wan2 supports it and the one connected to wan1 doesn’t. So ping with interface=wan1 only works if wan1 is the gateway interface of the active default route.

Sniffing on your system in both states (wan1 distance=1 and wan1 distance=10) will show you whether my assumption is correct.

Hi,

from a another router (a custom archlinux based router), it works as expected and i have no issue at all.
In both case, the adjacent router is the same.

Why on mikrotik it does not work ?
Do i have something special to do to allow just a ping for an interface even after changing the distance/metric ?

thanks

sindy · December 29, 2022, 4:22pm

One possibility I can think of is that the ping with specification of interface works different on the two operating systems. Another possibility is that whereas Mikrotik definitely ignores the ICMP router advertisements possibly sent by the gateway router, the archlinux may use them to associate a gateway with an interface, so it knows what gateway to use for packets sent from “wan1” even though the gateway doesn’t support proxy-arp.

I would just use routing-table instead of interface as a parameter of the ping.

Also, I wouldn’t jump to any extensive conclusions based on the behaviour you observe - if sniffing confirms that when the route via wan1 is not active (because its distance is too high), RouterOS sends ARP requests for 8.8.8.8 via wan1 and gets no response, it says nothing more than that it indeed behaves that way. The issue with recursive routing is highly unlikely to caused by this.

To let the forum analyse your actual issue with failover, you have to provide the export of the configuration and explain what exactly behaves different than you expect.

cyayon · December 29, 2022, 4:40pm

Hi,

thanks for your answer.

here an export.

how to define a routing-table with ping command on ROS ?
router1_20221229.rsc (30.2 KB)

sindy · December 29, 2022, 5:31pm

Oops… not possible in ROS 7 (hopefully it’s a temporary state). So you’d have to use a routing rule matching on a src-address (the one attached to the wan in question) and specify a src-address as a parameter of the ping.

In the export I cannot see any signs of an advanced failover setup for the wans - the only failover assumed is from wan1 (Orange via bridge, distance of the default route added by DHCP is 1) to wan2 (distance of the manually configured default route is 3) if wan1 physically fails. But the thing is that the way it is configured (due to the caprice by Orange), the route via wan1 will not go down even if you physically disconnect the Ethernet/SFP interface, because in RouterOS, the bridge interface remains up even if all member interfaces of that bridge are down. So the route via wan1 will only disappear from the routing table once the DHCP lease expires, which may take hours after the physical interface goes down.

So you have to monitor the actual transparency of the uplink via wan1 all the way to the internet, using either the scriptless failover based on recursive next-hop search or some scripted solution.

cyayon · December 29, 2022, 5:59pm

Thanks,

Before configuring something more advanced i am trying to check something simple.
I have followed recursive routing doc before (not is this export), but it didn’t work at all.

The routes were flapping ok/KO every 2 or 3 minutes.
After that I was going to do failover with a script as a workaround, when the interface fail (or isp), I would like to update distance of the failed interface route to something superior to promote the other one. That is the story.

Could you please confirm me the configuration in my context to make recursive routing work ?

Thanks.

sindy · December 29, 2022, 6:34pm

Sorry, I don’t understand what you ask (and my French is really bad).

Your current configuration doesn’t contain anything related to recursive routing, so I cannot confirm anything.

To make it work, you have to add the recursive hierarchy of two routes - one to a /32 destination address, some “canary” one (such as 8.8.8.8****) out there in the internet that uses the actual wan1 gateway as gateway, and another one to 0.0.0.0/0 that uses the canary address as a gateway. The scope of the former one must be lower than the default one of 30 (I usually set 10), the target-scope of the latter must be one higher than the scope of the former, and the latter must have check-gateway set to ping.

This ensures that RouterOS keeps pinging the canary address, and the latter route is only active if it receives responses to that ping. The relationship between the target-scope of the latter and the scope of other routes than the “former” one ensures that if the former one becomes inactive, the test pings will not use any other route, such as the one via wan2. And the recursion ensures that packets matching the latter route will actually be sent via the gateway of the former route.

But in your case, as the gateway of wan1 is obtained via DHCP, you have to either be sure that Orange will assign always the same gateway address, or you have to add a script to the DHCP client that will update the “former” route each time the gateway address changes.

pcunite · December 29, 2022, 6:43pm

Watch the video I linked to. Watch slowly.

cyayon · December 30, 2022, 7:52am

Hi,

i have removed all mangle firewall rules, removed all default gateway and disabled on dhcp-client add-default-route (ISP1).

Then i executed the following (80.11.60.1 is main default route ISP1/dhcp and 192.168.6.1 is LTE failover router) :

/ip route
add check-gateway=ping distance=3 dst-address=0.0.0.0/0 gateway=1.1.1.1 scope=10 target-scope=12
add distance=3 dst-address=1.1.1.1/32 gateway=80.11.60.1 scope=10 target-scope=11
add check-gateway=ping distance=4 dst-address=0.0.0.0/0 gateway=8.8.8.8 scope=10 target-scope=12
add distance=4 dst-address=8.8.8.8/32 gateway=80.11.60.1 scope=10 target-scope=11
add check-gateway=ping distance=5 dst-address=0.0.0.0/0 gateway=9.9.9.9 scope=10 target-scope=12
add distance=5 dst-address=9.9.9.9/32 gateway=80.11.60.1 scope=10 target-scope=11
add comment=Failover distance=10 dst-address=0.0.0.0/0 gateway=192.168.6.1 scope=10 target-scope=30

Is it correct ?

For now, the link/route table seems stable, i will check longer…

In case of a link failure from ISP1, the failover is automatic as i understand.
But the rollback recover is also automatic or do i have to script something ?

if i would like to make sure that 80.11.60.1 never change and prefer a dhcp script, do you have an example to found gateway from dhcp-client please ?

thanks.

sindy · December 30, 2022, 10:56am

The general idea is OK, but the implementation details are not. The purpose of scope and target-scope is to define the hierarchy of the routes for the recursive next-hop search or, in another words, to prevent looping. So set the scope of all the routes with dst-address=0.0.0.0/0 to the default 30, otherwise they could use each other in the recursion.

Just to avoid a misconception - the distance parameter only differentiates between routes with the same dst-address and routing-table. So there is no need to set it differently for each of the /32 routes, unless you do it to make Winbox sort them next to the /0 routes that use them as carrier ones.

Yes and no depending on your traffic. Active TCP connections will fail at each change; depending on the applications that use them, some will get re-established automatically and some will have to be re-established manually. UDP connections that are periodically updated, such as IPsec or SIP connections, need to be removed using a script after a failover, because the router itself or the device in its LAN keep sending packets even if no responses come back, and the router sets the source address of these packets to the one of the WAN through which the respective connection has been established initially, unless that address has been lost (due to interface down or expiration of DHCP lease) and they have been src-nated using a masquerade rule.

Something like
/ip dhcp-client set [find where interface=bridge-wan1] script=“:if ($bound=1) do={
\n /ip route set [find where dst-address~"/32" scope=10] gateway=$"gateway-address"
\n}”

To check that it really works (I have adjusted it to your config and I may have made a mistake), set the gateway of one of the routes to some bogus IP address like 10.22.33.44, and do
/ip dhcp-client release [find interface=bridge-wan1]

If the gateway you have changed changes back to the correct one, it works.

cyayon · December 30, 2022, 11:24am

Hi,
Many thanks for your answer.

Do I have to check-gateway ping the failover wan too ?

Just to make sure, could you please my implementation ?

Thanks !

sindy · December 30, 2022, 11:50am

You don’t exactly have to, but any backup solution is almost useless if it is not monitored - as it stays unused for months or even years, it may silently fail and when the primary one fails, the backup is not available. I prefer to use the Telegram app to send notifications about the state of SOHO networks to their users, but of course you can use e-mail as well.

For the operation itself, monitoring of the state of the WAN of last resort makes no sense, as there is no further backup the router could use if it fails. But there are setups with multiple WANs where each of them is the primary one for some class of traffic and a backup one for another class, and there you obviously have to monitor all of them to facilitate a failover.

Sure I can have a look at the export of your current configuration, but it will be much more useful if you check it practically first (using /interface/disable ether1 and/or by adding firewall rules dropping the test pings to chain output of firewall filter). E.g. if you use /ip/firewall/filter add chain=output dst-address=1.1.1.1 protocol=icmp action=drop, you will imitate that 1.1.1.1 doesn’t respond, so the /0 route via 1.1.1.1 will become inactive and the route via 8.8.8.8 will take over.

cyayon · December 30, 2022, 12:09pm

I will post here my last export. As soon as i have access later today…

Could you please confirm that it is better to do this :
dst-address=0.0.0.0/0 → scope=30
dst-address=x.x.x.x/32 → distance=3

/ip route
add check-gateway=ping distance=3 dst-address=0.0.0.0/0 gateway=1.1.1.1 scope=30 target-scope=12
add distance=3 dst-address=1.1.1.1/32 gateway=80.11.60.1 scope=10 target-scope=11
add check-gateway=ping distance=4 dst-address=0.0.0.0/0 gateway=8.8.8.8 scope=30 target-scope=12
add distance=3 dst-address=8.8.8.8/32 gateway=80.11.60.1 scope=10 target-scope=11
add check-gateway=ping distance=5 dst-address=0.0.0.0/0 gateway=9.9.9.9 scope=30 target-scope=12
add distance=3 dst-address=9.9.9.9/32 gateway=80.11.60.1 scope=10 target-scope=11
add comment=Failover distance=10 dst-address=0.0.0.0/0 gateway=192.168.6.1 scope=30 target-scope=30

But I thank that i must set target-scope > scope ?
Or I make a mistake ?

Thanks

sindy · December 30, 2022, 12:52pm

Yes.

I don’t say that having the distance values of the /32 routes as you had them before is wrong; what I say is that it doesn’t matter what those distance values are because those /32 routes differ in dst-address.

The relation between target-scope and scope of the same route is irrelevant; what matters is that the target-scope of a “client” route was at least one higher than the scope of the “server” route, see a hypothetic example below. Red and blue indicate the related elements, gray indicates default values (that are normally not shown in export):
/ip route
add dst-address=8.8.8.8 gateway=192.168.1.1 scope=10 target-scope=10
add dst-address=8.8.8.255 gateway=8.8.8.8 check-gateway=ping scope=11 target-scope=11
add dst-address=0.0.0.0/0 gateway=8.8.8.255 scope=30 target-scope=12

In this example, the route to 8.8.8.8/32 via an actual gateway is the “bottommost server” one, and the route to 0.0.0.0/0 via 8.8.8.255 is the “topmost client” one. In this static case, the address 8.8.8.255 is only a linking element - nothing actually uses it (but it must be an address to which you never need to send any actual traffic). In a dynamic routing environment, which is the actual reason why the recursive next-hop search has been implemented, things may be much more complex. In short, a router somewhere at the border of your network advertises that it has a gateway to some destination subnet, but you have no common subnet with that border router, so to get the traffic for that destination subnet to that border router for delivery, you have to send it to some adjacent router that does have a way to forward it to the border one.

cyayon · December 30, 2022, 1:43pm

Ok, i think i understand.

in this implementation, if i understand correctly everything is fine :

/ip route
add check-gateway=ping distance=3 dst-address=0.0.0.0/0 gateway=1.1.1.1 scope=30 target-scope=12
add distance=3 dst-address=1.1.1.1/32 gateway=80.11.60.1 scope=10 target-scope=11
add check-gateway=ping distance=4 dst-address=0.0.0.0/0 gateway=8.8.8.8 scope=30 target-scope=12
add distance=3 dst-address=8.8.8.8/32 gateway=80.11.60.1 scope=10 target-scope=11
add check-gateway=ping distance=5 dst-address=0.0.0.0/0 gateway=9.9.9.9 scope=30 target-scope=12
add distance=3 dst-address=9.9.9.9/32 gateway=80.11.60.1 scope=10 target-scope=11
add comment=Failover distance=10 dst-address=0.0.0.0/0 gateway=192.168.6.1 scope=30 target-scope=30

explanation for 1.1.1.1 :
server route : add distance=3 dst-address=1.1.1.1/32 gateway=80.11.60.1 scope=10 target-scope=11
client route : add check-gateway=ping distance=3 dst-address=0.0.0.0/0 gateway=1.1.1.1 scope=30 target-scope=12
client target scope : 12 > server scope : 10 - OK

right ?

but i do not understand why you choose in your example to add a route to 8.8.8.255 ? I know that i will never have to send traffic to this one, but why not only using dst-address=8.8.8.8 abd dst-address=0.0.0.0/0 routes ?

thanks !

sindy · December 30, 2022, 3:08pm

Right.

Mostly to illustrate that there may be multiple levels of recursion and how they are related. But a typical approach when using multiple “canary addresses” looks as follows:

dst-address=1.1.1.1/32 gateway=192.168.1.1 scope=10
dst-address=8.8.8.8/32 gateway=192.168.1.1 scope=10

dst-address=10.22.33.44/32 gateway=1.1.1.1 target-scope=11 scope=11 check-gateway=ping
dst-address=10.22.33.44/32 gateway=8.8.8.8 target-scope=11 scope=11 check-gateway=ping

dst-address=0.0.0.0/0 gateway=10.22.33.44 target-scope=12

The goal of using the intermediate route to a fictious /32 destination is to have only a single default route per each WAN. This simplifies configurations where multiple routing tables are used (e.g. “prefer wan1” and “prefer wan2”). For your simple case, your approach with multiple default routes is more efficient as you have 6 routes in total for 3 canary addresses, whereas the typical approach needs 7. For two routing tables, it’s 9 routes with your approach and 8 with the typical one.

cyayon · December 30, 2022, 6:06pm

Many thanks for your answer !

I will test and keep you informed

Just a last questions
When wan1 go down, the failover link wan2 will take over.
But, will I be able to join 8.8.8.8, 1.1.1.1. And 9.9.9.9 from wan2 ?
If not, i will have to choose others ip addresses …

What does check-gateway arp and bfd ?

Thanks.

anav · December 30, 2022, 6:50pm

Canary??
I call it flat vs nested…
Sindys routes look bang on…

This is what I was informed were the basic tenants of recursive in vers 7.
TWO RULES OF THUMB (scope & target scope):
First Rule. The resolving route (DIRECT - connected route) with dst-address TO the “real WWW IP (dns site)” and with local ISP gateway IP, has Target-Scope=X and the recursive route (INDIRECT - external route) with gateway IP VIA the “real work WWW gateway IP (dns site)” has Target-Scope=X+1. In other words, the farther one gets from the router, the TS increases by one.

Second Rule. Between the same two routes being compared, the Direct , connected route, with local ISP gateway IP (resolving route) has to have a SCOPE that is equal to or less than the TARGET SCOPE of the recursive route. In other words, the scope of the route must be equal or less than the target scope of the next farthest route.

EX.
FARTHEST ROUTE: SCOPE= (doesnt matter) / TARGET SCOPE=Y+2 (recursive route)
CLOSER ROUTE: SCOPE= Y+2 or less / TARGET SCOPE=Y+1 (recursive route)
CLOSEST ROUTE: SCOPE=Y+1 or less / TARGET SCOPE=Y (gateway=ISP, resolving route)
INTERNAL ROUTE: ( within router, scope is not used, no recursive action at all )

Thus,
A FLAT two recursive setup would look like… So we only have a Y+1 scenario.
/ip route
add check-gateway=ping distance=3 dst-address=0.0.0.0/0 gateway=1.0.0.1 scope=10 target-scope=12
add distance=3 dst-address=1.0.0.1/32 gateway=PrimaryISP-gatewayIP scope=10 target-scope=11
+++++++++++++++++++
add check-gateway=ping distance=4 dst-address=0.0.0.0/0 gateway=9.9.9.9 scope=10 target-scope=12
add distance=4 dst-address=9.9.9.9/32 gateway=PrimaryISP-gatewayIP scope=10 target-scope=11
+++++++++++++++++++
add comment=SecondaryISP distance=10 dst-address=0.0.0.0/0 gateway=SecondaryISP-gatewayIP scope=10 target-scope=30

Note using scope=10 for all primary associated routes is ‘safe’.

=====================================================

Lets do it nested! In this case we use a fictitious _ ahhh this is what Sindy means by canary!! address to force the router to resolve it via two recursive routes.
Where 10.10.10.10 is an address/gateway that is private but does not exist on the router…

/ip route
dst-address=0.0.0.0/0 gateway=10.10.10.10 scope=10 target-scope=14
++++++++++++++++
add check-gateway=ping dst-address=10.10.10.10/32 gateway=9.9.9.9 scope=10 target-scope=13
add dst-address=9.9.9.9/32 gateway=PrimaryISP-gatewayIP scope=10 target-scope=12
+++++++++++++++
add check-gateway=ping dst-address=10.10.10.10/32 gateway=1.0.0.1 scope=10 target-scope=13
add dst-address=1.0.0.1/32 gateway=PrimaryISP-gatewayIP scope=10 target-scope=12
+++++++++++++++
add comment=SecondaryISP distance=10 dst-address=0.0.0.0/0 gateway=SecondaryISP-gatewayIP scope=10 target-scope=30

Note: All Primary routes have same distance, only the last Secondary route has a higher distance.
Note: Again using scope=10 is ‘safe’ all the way round.

Caveat: Since these rules were communicated there may have been some nuances or discoveries that I am not aware of, in which case would have to update my page.

+++++++++++++
Last word I tried to help someone with three WANS, and the level of complexity jumps considerably such that one needs an excel table to keep track of all required routes. More than you think!