BTH BUG Bleeding Into Regular Wireguard.

I have a regular scenario where WIREGUARD should come in WAN2, despite WAN1 being the primary Route.
Case in point, an AX3 as peer (client for handshake) to a CCR2004 peer ( server for handshake )
Easily handled by basic mangling and table and route.
/ip mangle
add chain=input action=mark-connection connection-mark=no-mark in-interface=WAN2
new-connection-mark=incomingWAN2 passthrough=yes
add chain=output action=mark-route connection-mark=incomingWAN2
new-route-mark=useWAN2 passthrough=no
/routing-table add fib name=useWAN2
/ip route
add distance=1 dst-address=0.0.0.0/0 gateway=gw1-IP check-gatway=ping
add distance=2 dst-address=0.0.0.0/0 gateway=gw2-IP check-gatway=ping
add dst-address=0.0.0.0/0 gateway=gw2-IP routing-table=useWAN2

The problem is that although the tunnel seems to come up it is not acting properly and the proof is in the fact that one cannot access the router via winbox.
The rather poor work around, that seemed to work was adding this Dstnat rule, which by the way is totally non-intuitive.
/ip firewall nat
chain=dstnat dst-address-type=local in-interface=WAN2 protocol=udp dst-port=wg-port action=dst-nat to-addresses=ip.of.wan.
1

On connections tracking, Prior to using this rule, one would see the initial attempt from my public IP to WAN2 but the response would come from WAN1… very strange.
Both of these would be designated C connections… ( meaning no responses?)
After using this rule, there were no more returns from WAN1 to my wireguard connection, so progress.
The connection tracking alternated (never at the same time) between Cd now and SACd and they would appear then disappear…

connections.jpg

Why do I think there is a BTH bug involved?
Because no keep alive is set on this Peer ( server for handshake ) and thus WHY is the wireguard module contacting or using WAN1 despite our mangle?
Why is it ACTIVELY trying to reach the wireguard peer ( client for handshake )?

The only answers that make sense are
a. persistent keep alive setting ( not the case )
b. the wireguard is attempting to send some payload back to the peer ( maybe the case? ).
c. BTH bug where the Server Peer continually attempt to re-establish or connect to client peer

In both cases B, C the wireguard is trying to connect to the last passed peer address./port which would be the public IP (my public IP) of the Peer CLient but it uses the default route of the ROUTER, in this case WAN1 to do so. In the case of the BTH traffic, this is not a response to the peer client its the wireguard attempting to request or update peer IP information…
Regardless, the wireguard module is ignoring the destination address used to handshake and simply going out primary routing WAN, as its not seemingly a response but originated traffic.


Why the dst-nat rule kinda works is that its doing some magic!
We make any responses from the local interface appear as if they were responses to the remote incoming traffic.
and thus they will get assigned connection-mark as well and then get routed out WAN2.

++++++++++++++++++++++++++++++++++
the bug can be described as follows:

in terms that even if there is no keepalive at either end so there is complete silence until the initiating peer sends a first ever transport packet provoked by a payload one, the responding peer sends its response from an address chosen by routing rather than from the one to which the initial packet has arrived

+++++++++++++++++++++++++++++++++

Request others to attempt to duplicate this WEIRD, if not buggy functionality.
Want to be sure I am not totally bonkers, before going to MT…

Why would you expect heartbeats/keep-alives to take a different path than other WG traffic. Routing is happening in kernel before mangle in both cases. Or at least that’s the way I rationalize what you’re seeing. e.g. same logic as in http://forum.mikrotik.com/t/wireguard-multi-wan-policy-routing/174145/70

It happens with persistent keep alive OFF on both ends… ( whats left BTH shenanigans )

Both cases will use main. Mangle has no effect in either case is what I’m saying as WG in kernel already processed it. e.g. it not just keepalives that use only main.

Did this setup work in some older versions (e.g. before BTH)?

Not 100% sure. I have never stumbled across it before, or at least recognized it.
I do know that BTH is causing many similar issues.

From RouterOS POV, at least logically, your correct mangle should work in the case. I just think it acts like generic Linux, thus needs routing rules.

Certainly possible the BTH introduced some change in this logic. That’s kinda the big question. If mangle worked before at some point in this case – that for sure be a bug.

But I think you’re right if one believes (and follows) the Packet Flow diagram in docs.

The router is not doing anything wrong, but it seems that wireguard is doing something unexpected!
Interesting comment about routing rules… not sure one could help in this scenario but you do have me thinking, but in the end, there is no port
to make use of in routing rules so a dead end.

At least a part of the problem is that if/when packets/connections are marked coming into the wireguard port,
responses, etc are not marked when leaving from the wireguard port.

Routing rules do work. (And you then don’t need to mark anything for wg routing purposes)
However if you have dynamic IP addresses, and gateways. You need to have some appropriate scripting.

/routing rule
add action=lookup comment="min-prefix=0, all except 0.0.0.0/0" disabled=no min-prefix=0 table=main
add action=lookup comment="return path for wg/openvpn into wan2" disabled=no src-address=WAN2_IP table=useWAN2
add action=lookup comment="return path for wg/openvpn into wan1, (not needed...)" disabled=yes src-address=WAN1_IP table=main

It does mean ANYTHING leaving for the internet that has the WAN2_IP as source IP will leave via wan2 unless forced by packet marking (which seems to make sense, as mostly these will be things that originally came in via Wan2)

No idea what you are talking about rplant. There is no such thing as wireguard port.

What I would like explained is the first routing rule you have, what is its purpose and what does it do??

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

By the way, since we believe its wireguard module originating a request not responding to a request, your routing rule will have no affect on the scenario being described.
However will test. Even if it does work, at this point really want to know what is going on…

FWIW, @rplant’s policy routing rules is roughly the equivalent of the following mangles:

/ip firewall mangle
add action=mark-connection chain=input connection-state=new in-interface=WAN1 new-connection-mark=ISP1
add action=mark-connection chain=input connection-state=new in-interface=WAN2 new-connection-mark=ISP2
add action=mark-routing chain=output connection-mark=ISP1 new-routing-mark=ISP1_table
add action=mark-routing chain=output connection-mark=ISP2 new-routing-mark=ISP2_table

These normally work to allow access to /ip/services via non-active/“backup” WAN, so the packets going back out from something like webfig/REST/etc goes back out same WAN it came in on. But AFAIK this approach does not work with WG, or at least that’s the supposition here.

That is correct, the mangles, dont work as the traffic coming from wireguard is not connection marked and thus must be originated by the wireguard module.

Wireguard port (Really only applies to server), often for Mikrotik 13231

/routing rule
add action=lookup comment="min-prefix=0, all except 0.0.0.0/0" disabled=no min-prefix=0 table=main

Just an easy way of making all routes that are not 0.0.0.0/0 use the main routing table
This is often a correct assumption.

Notes:

  • If you use the routing rule, packets hitting the mangle output chain from wireguard have the correct IP address on them.
  • Edit If you attempt packet or route marking only, packets hitting the mangle output chain, already have the wrong IP address and gateway assigned to them.
  • Assuming the packets need to use the non default gateway
    • Because the routing is done before the mangle output chain (unmarked packets), (with a routing adjustment later)

Rplant. I am not using packet marking. I am using mark connections.

Sure, but it’s same issue. AFAIK I understand this, WG already picked before it enters any firewall services. WG will check routing rules in its decision on what to pick, and if none will use main.

I just wish they’d have WG on the packet flow diagrams. On those today, @anav is right mangle should work. But WG seems to follow the pure Linux scheme that isn’t rationalized in docs.

So the two solutions appear to be dst-nat rule noted above…
or using routing rules as per Rplant.

Until such time MT sorts out this mess. :frowning:

@anav, did you report a bug on this?


They may never… Part of why WG is fast is that it happens in the kernel, so dropping down to mangle likely be some performance hit. But I don’t know.

Kinda the reverse complaint of @DarkNate’s [Discussion] MikroTik configuration abstraction complexity where he’s advacating for more kernel-based datapaths, instead of mangle rules.

“Flexible” isn’t the problem, relying on Linux Netfilter framework for data plane is.

(translation: /ip/firewall/mangle == Linux Netfilter)

I have the same problem with the exact same scenario with two WANs and WG on the non-primary WAN.

Well, you’re better off using use routing rules, not mangle. While mangle should work here to be consistent with RouterOS… but WG seems to overly follow what Linux kernel does, not Mikrotik’s packet flow.

Hmm there are two options,
You can use a dst-nat rule or use Routing Rules.
Routing rules is more intuitive to be sure.

My Initial post though was talking about the MT that was acting as server for handshake!
Not sure about the MT acting as client at handshake, but if trying to control which WAN, imagine similar.

To refresh dst nat rule.
/ip firewall nat
chain=dstnat dst-address-type=local in-interface=WAN2 protocol=udp dst-port=wg-port action=dst-nat to-addresses=ip.of.wan.1

When you want Wireguard traffic on WAN2 but WAN1 is primary. We fool the router into un-destination-natting any traffic for wireguard coming from WAN1 back to WAN2 on the way out.

Routing rules are not applicable when there are dynamic IPs involved.

I am currently using the DNAT rule that Anav came up with and it works, but this is 100% a bug.