WireGuard Multi-WAN Policy Routing

Larsa · March 6, 2024, 2:09pm

One wouldn’t need specialized DHCP scripts if Mikrotik fixed its connection tracker to use the incoming interface address as the outgoing source address.

I’ll try to create a simple diagram and some packet traces that illustrate the whole thing, but considering your previous response you seem to have understood the basic idea.

ECMP routes are good example of ROS lacking sufficient flexibility or perhaps simpler tools to manage seemingly simple routing problems. Just something as simple as using logical interfaces instead of addresses would make things much easier. With routing rules, one would like to be able to add interfaces as sources like this: /routing/rule add action=lookup src-interface=ether2 table=ether2

Amm0 · March 6, 2024, 2:47pm

Well… If you’re looking for affirmation there is a potential bug… @anav should be right on the logic, but I believe your results.

The whole idea is RouterOS abstracts away Linux kernel details into the unified config scheme and packet flow diagram. Here, I suspect it’s the kernel doing the keepalive internally and it’s using the kernel’s routing table to do it. And since kernel does know about routing rules, it’s why those work. Since Mikrotik builds the kernel, the behavior is under their control.

It’s not me you’d have to convince , but likely Mikrotik in a bug report. And they won’t believe anything without supout.rif and ideally some traces.

Amm0 · March 6, 2024, 3:05pm

Similar things happen on ZeroTier, where it’s tunnels (e.g. “zt1” instance/process) do appear in the firewall… but without an interface e.g. (unknown) — since there the outer VL1, not the zerotier1 inner traffic (where zerotier1 does appear as interface in firewall). Peers are dynamic, but constructing firewall marking rules from /zerotier/peer via script would be error-prone at best. But this make doing QoS on ZT’s tunnels going out specific WAN where there really isn’t a solution. (You can QoS zerotier1 but it’s too late if the idea to favor ZeroTier’s tunnels out specific WANs.

Also other stuff like PMTUD happens entirely in kernel without touching firewall – stuff like this make sense. So make similar sense that WG keepalive be in kernel – but this has the side-effect you find here…

But these “escapes” from firewall connection tracking should, at least, be documented.

Larsa · March 6, 2024, 3:06pm

I’m pretty sure the standard response would be it’s a feature, not a bug!

But it is the kernel that actually stores, manages, and executes the routing rules using nftables, it’s just the configuration hassle that occurs in userland, i.e. ROS. The connection tracker is tightly coupled to the nftables routing engine.

Assume that connection tracking wouldn’t work for any of ROS’s built-in services like WinBox, OSPF, BGP, etc. In a multi-WAN environment, you would then need to set up policy routes for each WAN interface and individual service that doesn’t arrive through the default gateway.

A purely philosophical question then arises: is this a bug or just a very flexible router? I mean, after all, it can still be fixed even if it becomes very cumbersome every time you need to add or modify a WAN interface..

Amm0 · March 6, 2024, 3:09pm

Both?

Larsa · March 6, 2024, 3:11pm

Haha, but of course! My personal take on this is that all built-in services should behave the same when it comes to routing and connection tracking. I see no obvious reason why they shouldn’t.

Amm0 · March 6, 2024, 3:12pm

These built-ins do start/end from userland. WG is mainly in kernel.

And why I do see ZeroTier tunnels and probes in firewall connection – ZT is also in userland.

But I 100% agree WG should be treat same as rest.

anav · March 6, 2024, 9:09pm

@ AMMO, I did not know you were a fiction writer.
I think the issue is other side also knows about the 3 WANs – it’s not a smartphone/desktop wanting VPN access. It’s the far-end wants to steer some traffic down a particular WAN(s), that may not be the “primary”*. I don’t think DDNS/etc solve this issue — somehow the dynamic public IP address needs to make it into a /routing/rule. And /routing/rule is based on IP/subnet without any address-list or interface-list support… so options are limited.

The client peer has no shmucks about 3 wans. It only knows what IP address (or potentially dyndns url) that is sitting in the Allowed IP settings along with a port.
Beside its not how wireguard is designed which as I stated above is for the peers to continue communicating to each other about the endpoints, after establishing the initial tunnel!

@LARSA
Why are you trying to inform ROS how to behave instead of accepting how it behaves and working from that realistic contsruct??
In short, ROS connection tracker mishandles WireGuard handshakes. It forces response packets through the default gateway, breaking the protocol if the initial handshake came from a different interface.
WRONG!, RoS is acrtually following correctly its Operating System code on how to route traffic. Its your unreasonable expectation that MIkrotik would create additional code, not in the Wireguard Design, to take into account every possible scenario. One simply uses the available tools to move traffic as necessary. As Ammo alluded to, others have designed ADDITIONAL helper programs to streamline such additional ( although agree common sense ) functionalities. Personally, I think MTs work at BTH is far more important and groundbreaking and useful!!!

A basic script for IP routes ( gateway IP ) is not much of a bother and necessary when one cannot know the future WAN information. It does not affect fixed static IPs, or pppoe-out1 interfaces, and even my straight cable connection works great. Its just some IP DHCP client setups (like my fiber connection) need help.

My conclusion, is that you will be happier to sticking to open source linux based routers!

Amm0 · March 6, 2024, 10:46pm

I just believe in Santa Claus.

To @anav’s point, WG is trying to create it’s own peer-to-peer tree, so you are “fighting” WG when trying to get Mikrotik involved in it’s routing. It was designed to use route rules.

Or, just use IPSec with IKEv2 in tunnel mode – I believe – that would let you do this “script-less” using IPSec. But WG was invented since configuring IPSec is hard… but that likely be “script-less”, perhaps even faster depending on router.

I get @Larsa’s frustration. Multi-WAN is unnecessarily hard. Pepwave sell routers for 3X the $$$ at same performance as Mikrotik & all they do put a friendly HTML with dropdown to pick all the “MultiWAN” stuff. Not suggestion Mikrotik change WG to fix @Larsa’s issue – but all the approaches involve some fragile things, somewhere. And idea with having multiple WANs is to INCREASE reliability…

My conclusion is just always “pick-your-poison” with Multi-WAN. The RFCs really never conceived of “Multi-WAN”, outside of BGP in E2E world - so there not some standard to guide how this should work.

Larsa · March 6, 2024, 11:09pm

I’m sorry, but there is no such thing! The Linux network engine is configured and controlled dynamically entirely by ROS. That’s how Linux-based routers operates. It does whatever you tell it to do. If you instruct it to behave badly, it will. And the WireGuard handshake is behaving badly.

Btw, I’m testing another workaround besides policy routing as we speak but I’m not done testing just yet. How about that ‘easy fix’ you promised me?

anav · March 7, 2024, 12:42am

Well its not a fix, its simply using the tools available properly (already posted in detail )

By the way in a three WAN scenario where 1 fails to 2, fails to 3.
If the wireguard is set to look for WAN1 to establish an initial handshake connection, and does so, then WG will gracefully handle any combination of WAN failures…
Its only if the starting point initial handshake connection is not to the primary WAN, that we need to do some funky stuff.

Larsa · March 7, 2024, 4:03pm

Yup, it’s the starting point itself that creates the initial hurdle in a multi-WAN environment.

I’m trying to identify how different configurations behave, for example by using different subnets on the WAN interfaces.

One test I’ve performed is with ether1 as the default gateway and five WAN interfaces on a separate subnet. Suddenly, one of the WAN interfaces sends the response packet back the proper way which completes the handshake, while the rest is sent through the default gateway (and the handshake fails). Very strange! There’s probably a logical explanation for this, but it’s about finding the root cause that might be pretty cumbersome. I plan to make a clean install to see if I can repeat this behavior.

Larsa · March 7, 2024, 4:27pm

WireGuard, like IPsec, doesn’t appear as a service like FTP, they have separate configuration menus. Btw, what are you trying to say using the VyOS commands?

Larsa · March 7, 2024, 4:47pm

You’ll probably have a greater chance of getting assistance in connecting VyOS with ROS if you open a separate thread for it.

Amm0 · March 7, 2024, 5:08pm

~~[quote=wfburton post_id=1061491 time=1709827611 user_id=215408]~~
If using public ip’s wouldn’t this work?
[/quote]

And that’s the rub here… They’re public IP, but may change since via DHCP. So some static Linux route rules need to get updated when that public changes (where in VyOS or RSC). The solution (on RouterOS) is to use a script from within DHCP client to update the /routing/rule if those public changes. @Larsa AFAIK is trying to avoid using a DHCP script.

Larsa · March 7, 2024, 6:35pm

Yep, that sounds about right! The whole exercise has currently resulted in two different issues:

Q1. Why are WireGuard handshake responses sent through default gateway rather than the originating interface?
My initial research indicates this is a known issue with some proposed fixes already sent upstreams to the WireGuard devs. Additionally, there are some workarounds that help the connection tracker with proper routing. I’ll get back when something interesting pops up.

Q2. Is there a convenient workaround for the above issue that doesn’t rely on a DHCP script when multiple WAN interfaces with dynamic IP addresses are used as WireGuard responders? Well, currently this seems pretty difficult since ROS route control is purely based on static IP addresses rather than the logical interface names that are available in the kernel (e.g. “ip rule add from DEVICE table table-name”). However, I’ve got an idea that I haven’t had the chance to test yet.

Amm0 · March 7, 2024, 7:17pm

Well I think you’ve distilledthe core issue here:

The route rules are easier than firewall things in general IMO. But what you lose is the address-list and interface-list… Even just an plain interface (“DEVICE”) matcher in /routing/rule would go a long away. …assuming kernel does know about interface (since it’s curious Mikrotik didn’t include it already).

The other issue in “dynamic WAN address, without using scripting” category is “check-gateway=ping”. It is having one tiny setting in the /ip/dhcp-client that could add a check-gateway=<ping|bfd> to the dynamic default route added by DHCP (e.g. check-gateway=ping). That gets even more in the way of “script-less” multiwan setup (with DHCP WANs).

Larsa · March 7, 2024, 7:21pm

@wfburton/Amm0, I have a similar idea that doesn’t involve separate routing tables.

Amm0 · March 7, 2024, 8:22pm

~~[quote=wfburton post_id=1061533 time=1709839818 user_id=215408]~~
I don’t known what version your running but keep an eye out for check gateway options. It has one for ping.
[/quote]

Of course. BUT again you need use a static route to set. e.g. the check-gateway=ping cannot be added to dynamic default route from DHCP client, without using a script. And, since this a common task… cut-and-paste some script and carefully commenting routing is a PITA. The script+comment approach makes multi-wan just a few more steps difficult – and dealing with multiple WAN is already hard.

Some dropdown with “Check Reachability” with ping, etc.

@Larsa, I had a feature request open on “check-gateway” in dhcp-client for over a year (SUP-118400). The most recent response was
~~[quote]~~We will consider implementing such a feature.

Unfortunately, giving any ETA for when the feature will be implemented is impossible.[/quote]
We’re talking about one option in a dropdown box… So I don’t think WG heartbeats in kernel is changing anytime soon

Amm0 · March 7, 2024, 8:45pm

I get folks think @Larsa is overly pedantic about using scripts for adjusting routing rules… but the release thread highlight the non-theoritical side-effects of using script for stuff like WAN routing:
http://forum.mikrotik.com/t/v7-15beta-testing-is-released/174120/138

All my router use dhcp-client scripts & most router are remote… so I’m more sympathetic to the problem. I even know script really well! But I cannot control Mikrotik changing something in how scripting works…