RouterOS blatantly ignores pref-src. Can this really be a bug?

@divB, you’re absolutely right to point out that this is a flawed implementation of WireGuard and it drove me nuts too before the root cause was identified. The good news is that it’s actually pretty easy for MikroTik to fix if they decide to. WireGuard works perfectly on Linux with the standard tools. For now, there are two pretty good workarounds; @Sindy’s NAT solution or routing rules I mentioned in the link from @Amm0.

@lurker888, unfortunately mangle doesn’t work with WireGuard’s initial handshake process.

Hi,

I will assume that you probably meant to write:
iptables - t nat -a output -s 177 -o wan1 - j snat -to 210

In essence these correspond to my mangle/src-nat rules below:

/ip/firewall/mangle
add chain=output action=mark-packet new-packet-mark=wg passthrough=yes protocol=udp src-port=13231

/ip/firewall/nat
add chain=srcnat action=src-nat to-addresses=192.168.84.1 packet-mark=wg

So here we are in agreement. The only difference is that you id the packets based on their (according to op’s wishes incorrect) ip, and I do it by being output (generated by router) and proto/port. I think mine is better :slight_smile:, but the approach is the same.

And this is all fine when the .210 (or .177) device initiates the connection. The appropriate conntrack entry is created with the appropriate translation.

The problem comes along when the other side sends the first packet. In this case the src-nat chain is not run. In fact the only chance src-nat would get to run is for the response packet, but src-nat is only run for connection state new packets - so actually no address translation is done.

For this case a dst-nat is needed, to enable the same translation for when the other side initiates the connection (from a conntrack perspective):

/ip/firewall/nat
add chain=dstnat action=dst-nat to-addresses=192.168.80.1 protocol=udp dst-address=192.168.84.1 dst-port=13231

My last rule simply makes sure that the lookup is done in the correct table: (This is there because src-nat is actually performed after routing adjustment, and as such it would not be able to influence table selection - for example exactly when src address is used as a selector in PBR.)

/ip/firewall/mangle
add chain=output action=mark-routing new-routing-mark=wg passthrough=yes packet-mark=wg

I agree with you that not all rules are necessary in all specific cases, I merely wanted to put the issue fully to rest. Some cases:

  • no alternative routing table → no mark routing is necessary
  • connection is always initiated by the remote side → src-nat is not necessary (and also mark-packet may be omitted)
  • connection is always initiated by us → no dst-nat is necessary

Through testing I have found, that if the connection somehow breaks externally, and there is constant traffic flowing through the tunnel, the second case (again, from a conntrack point of view) cannot be avoided.

That the remote side initiates the connection (third point) cannot really be avoided.

And if you want to make sure that the correct table is used, I would add the routing mark.

So in summary I think:

  • to be correct whoever sends the first packet, both dst- and src-nat rules should be present
  • if multiple routing tables are used, we should direct the router to use the correct one
  • I prefer to identify the packets in mangle/output, but this is just a preference on my part (src address selection is kind of messy)

I just tested the rules as I have quoted on an rb5009 running 7.17rc1, and the mangle absolutely works for the initial handshake.

In @divB’s case it’s impossible for pref-src to work, because his rule selects the table based on the source address. If there is no src address yet, there is nothing to match. If there already is a source address, there’s nowhere to put the pref-src.

Alright, good to know it works with 7.17rc1. Not sure when this changed, but it didn’t work before. If mangle works, that’s a third option along with NAT and routing rules.

Could you please point me to the mangle that doesn’t work? I’d like to read up on it a bit… (I bet it’s probably some load balancing/fail-over scenario)

Same thread that Amm0 posted previously: http://forum.mikrotik.com/t/wireguard-multi-wan-policy-routing/174145/1

Thanks! I saw the thread, I just didn’t want to get into it. This whole thread is about why it will not work in the way that you proposed.

And I have just set up a simulated multi-wan scenario (with a gre tunnel between my test devices serving as the second wan,) and with my proposed solution (adding dst-nat) it works correctly.

Why yours doesn’t is actually not because the packet goes out on the wrong interface, but because it has a bad source address. And this is exactly because of the thing discussed here in detail.

The misunderstanding comes from the fact that it is quite hard to determine which interface a packet egresses in these situations, because the mangle/output rules apply the routing mark after the initial routing decision and before the routing adjustment step. (Therefore only during the routing adjustment phase can routing marks be used to select the routing table.) However firewall logging takes place before the routing adjustment, so in the logs you will always see the packet egressing the interface selected by the initial routing decision. (I have actually captured the packets external to the device (in the switch :slight_smile: ) to verify this. Probably sniffing the packets will also yield a good answer because it uses raw sockets - though I have not verified this.)

Perhaps you didn’t read the whole thread and might have missed the most crucial parts:

  1. During WG’s initial handshake, there’s no “connection state,” so mangle rules can’t apply
  2. The initial handshake response always egresses through the default gateway unless you trick ROS into using another interface.

Maybe this changed in 7.17rc1, I dunno, but how it’s handled is implementation dependent.

Handshake overview: https://www.wireguard.com/protocol/
Complete protocol stack: https://www.wireguard.com/papers/wireguard.pdf

I admit that I have not read everything in full detail, however I see conntrack functioning normally during initial handshake.

Initial incoming handshake as captured by dst-nat:
(rule: chain=dstnat action=dst-nat to-addresses=192.168.80.1 protocol=udp dst-address=192.168.200.1 dst-port=13231 log=yes log-prefix=“DN”)
DN dstnat: in:gre-tunnel1 out:(unknown 0), connection-state:new proto UDP, 192.168.200.8:13231->192.168.200.1:13231, len 176

Handshake response captured by mangle:
(rule: chain=output action=mark-packet new-packet-mark=wg passthrough=yes protocol=udp src-port=13231 log=yes log-prefix=“MA”)
MA output: in:(unknown 0) out:gre-tunnel1, connection-state:established,dnat proto UDP, 192.168.80.1:13231->192.168.200.8:13231, NAT (192.168.80.1:13231->192.168.200.1:13231)->192.168.200.8:13231, len 120

When routing is set (same packet as before, next mangle rule):
(rule: chain=output action=mark-routing new-routing-mark=wg passthrough=yes packet-mark=wg log=yes log-prefix=“RT”)
RT output: in:(unknown 0) out:gre-tunnel1, packet-mark:wg connection-state:established,dnat proto UDP, 192.168.80.1:13231->192.168.200.8:13231, NAT (192.168.80.1:13231->192.168.200.1:13231)->192.168.200.8:13231, len 120

And in the next rule I do a check mathcing on the routing mark, and it is correctly set:
(rule: chain=output action=passthrough routing-mark=wg log=yes log-prefix=“RC”)
RC output: in:(unknown 0) out:gre-tunnel1, packet-mark:wg connection-state:established,dnat proto UDP, 192.168.80.1:13231->192.168.200.8:13231, NAT (192.168.80.1:13231->192.168.200.1:13231)->192.168.200.8:13231, len 120

And the appropriate routing table is used, and the appropriate interface is egressed.

There is clearly a valid connection state associated with these packets and they are caught in both dst-nat (incoming) and mangle (outgoing).

Maybe I’m a bit dense, tired or don’t exactly understand the point, could you elaborate what you mean by “no connection state”, “can’t apply mangle”, “can’t set routing table”. They seem to work as intended.

When I started the thread earlier, the initial handshake had to finish before the connection state became “established”[*] which prevented mangling from working. I see you’re using NAT, which might be affecting things similarly to routing rules. Have you tried running a packet trace (assuming on 7.17rc1) without NAT to see the difference?

[*] mangling only works on marked (established) connections.

Yep. dst-nat annotates the conntrack entry, and without that annotation, the conntrack machinery cannot identify the handshake response as belonging to the same connection. (That’s why it’s there :slight_smile: )

Without the dst-nat rule, the handshake response is not identified as part of the same connection, the connection is in state new, and the connection-marks are lost. Additionally, for it to work correctly, the dst-nat to-addresses has to point to the pref-src of the route in the main table where it would egress. (This is only important for conntrack, the src address is later rewritten because of the dst-nat rule.) This (while absolutely doable) is combersome.

What I would ask from Santa - were he otherwise not occupied because of the season - would be:

  • wg should by default, for handshake responses, use the incoming packets dst addr as src addr in the sento() call (without modifying the internal state associated with the wg peers)
  • wg interfaces should have an optional local-address attribute (where all underlay packets, including handshake responses, would carry this src addr; this should generally be done by a bind() call)
  • wg interfaces should have a VRF

What you write about mangling only being possible on established connections, well, that’s not exactly true.

Let’s mark the connection based on which interface it ingressed:

/ip/firewall/mangle
add chain=prerouting action=mark-connection new-connection-mark=thru-gre passthrough=yes connection-mark=no-mark in-interface=gre-tunnel1 log=yes log-prefix="CM"

In this case an incoming initial incoming handshake looks like this:

CM prerouting: in:gre-tunnel1 out:(unknown 0), connection-state:new proto UDP, 192.168.200.8:13231->192.168.200.1:13231, len 176
DN dstnat: in:gre-tunnel1 out:(unknown 0), connection-mark:thru-gre connection-state:new proto UDP, 192.168.200.8:13231->192.168.200.1:13231, len 176
MA output: in:(unknown 0) out:gre-tunnel1, connection-mark:thru-gre connection-state:established,dnat proto UDP, 192.168.80.1:13231->192.168.200.8:13231, NAT (192.168.80.1:13231->192.168.200.1:13231)->192.168.200.8:13231, len 120
RT output: in:(unknown 0) out:gre-tunnel1, packet-mark:wg connection-mark:thru-gre connection-state:established,dnat proto UDP, 192.168.80.1:13231->192.168.200.8:13231, NAT (192.168.80.1:13231->192.168.200.1:13231)->192.168.200.8:13231, len 120
RC output: in:(unknown 0) out:gre-tunnel1, packet-mark:wg connection-mark:thru-gre connection-state:established,dnat proto UDP, 192.168.80.1:13231->192.168.200.8:13231, NAT (192.168.80.1:13231->192.168.200.1:13231)->192.168.200.8:13231, len 120

As you can see, prerouting rules were correctly applied to a connecection-state=new (not established) connection, and they are preserved throughout the handshake.

Even though I still think that wireguard is totally broken and can’t work in my setup, I’m not yet giving up hope.
Just to confirm, 192.168.80.1 would be 192.0.2.177 and 192.168.84.1 would be 192.0.2.210 in my case?
(I tried reversed as well)

Sadly still not :frowning:
I added “–log” and “–log-prefix” to keep track of when the rules trigger.

Actually, only the DNAT ever triggers:

11-29 17:34:27 firewall,info [***WG-DNAT] dstnat: in:wg-bg1-ftth out:(unknown 0), connection-state:new proto UDP, 200.95.5.88:9081->192.0.2.210:51820, len 176

And I also don’t get any “Receiving handshake initiation from peer” messages from wireguard in the logs. So the packet never reaches wireguard.

This is super weird because as soon as I change the DNAT rule to use 172.20.215.1 (the loopback interface), at least I see wireguard receiving it and sending a reply. I don’t understand yet why the heck this is happening though.

Regardless, if I understand your solution correctly, you do NOT DNAT to a loopback address but the “victim” address .177 directly. That way, the source address is already set properly?

Here I want to re-emphasize the reason that I use .210 in the first place and not .177: .177 is just the IP of the P2P uplink. I just use a public one because I have one but it’s common practice to use RFC1819 addresses for that. Now there are multiple such uplinks, I actually have .177/31, .186/29, .249/31, .253/31. The fact that .177 is used is purely “random” because that’s just how iBGP selects the default route out. If some connection drops, it will be a different one. That’s the reason I can’t make the client connect to .177 and that also means we can’t rely on these addresses to be even available.

Would you mind summarizing again explicitly what the first 2 options are (“1. NAT” and “2. routing rules”)?

Re 1: I believe I described why the NAT one can’t work here … but I’m still hoping to be wrong.

Re 2: I assume you mean this thread. But I can’t find routing rules posted by you. Do you mean rplant’s answer? If so, wouldn’t this just be the (D)NAT solution?

Your initial version of IPs is correct. And up to this point only the dnat rule should be triggered.

Without the exact rules, it’s quite hard to say… Maybe you could at least send us the rules that should interact with the incoming handshake packet (preferably all of them) and confirm (via logging) that they are indeed triggered.

Playing the guessing game: are you sure you are not filtering the incoming packet? Can you give the rule in filter input chain that accepts it?

@lurker,

i will assume that you probably meant to write:

iptables - t nat -a output -s 177 -o wan1 - j snat -to 210

no no.. it is literally -s 210 -o wan1 -j snat -to 210

ok. let us try to break down @divb first scenario:

  • vlan bridge/loopback/wireguard listen ip 210
  • wan1 ip 177 no nat. full routing.

the problem:
the internet can reach 210 directly, but the reply always rewritten to 177 (hence got rejected by initiator).

try by analyzing these 2 povs :

  1. using dnat for incoming internet via wan1 to vlan wg Bridge 210

iptables -t nat -a input -i wan1 -d 210 -j dnat -to 210.

  1. using snat for outgoing vlan bridge to internet via wan1 (this is the actual problem: 210 rewritten to 177)

iptables -t nat -a output -o wan1 -s 210 -j snat -to 210.

just to make that ip 210 persistent across interface traffic.

please carefully note those incoming and outgoing interfaces corresponding to their iptables chain. no marking no rules - just simple nat.

hopefully they’ll work - except if there is any other thing on the routing which alters their functionality.

as for the other ip’s - just mirror those config.

The commands you listed don’t really work because dstnat can only be in output but not input (and snat in input but not output).

I have used these commands (pls let me know if I misunderstood):

/ip firewall nat
add action=dst-nat chain=dstnat dst-address=192.0.2.210 dst-port=51820 log=yes log-prefix="[***WG-DNAT]" protocol=udp to-addresses=192.0.2.210
add action=src-nat chain=srcnat dst-port=51820 log=yes log-prefix="[***WG-SNAT]" protocol=udp src-address=192.0.2.210 to-addresses=192.0.2.210

I do not see how both of them work at the same time. For the second rule, if address would already be 192.0.2.210 we wouldn’t have an issue in the first place. Indeed, it’s not even matched.

And the first rule (DNAT) does not work for the reason I mentioned already: Since wireguard answers with a random source address, it is never matched with the connection tracking entry of the DNAT connection. This can be seen by the fact that the output packet is “connection-state:new”.

If I missed anything, please let me know

This is log output:

15:35:34 wireguard,debug wg-mobile: [miPhone] ***: Receiving handshake initiation from peer (200.95.7.232:33018) extra:0 (einval) 
15:35:34 wireguard,debug wg-mobile: [miPhone] ***: Sending handshake response to peer (200.95.7.232:33018) 
15:35:34 firewall,info [***WG-DNAT] dstnat: in:wg-bg1-ftth out:(unknown 0), connection-state:new proto UDP, 200.95.7.232:33018->192.0.2.210:51820, len 176 
15:35:34 firewall,info [***WG-IN] input: in:wg-bg1-ftth out:(unknown 0), connection-state:new proto UDP, 200.95.7.232:33018->192.0.2.210:51820, len 176 
15:35:34 firewall,info [***WG-OUT] output: in:(unknown 0) out:vlan2, packet-mark:wg connection-state:new proto UDP, 134.180.130.235:51820->200.95.7.232:33018, len 120

This is packet sniffer output:

58 time=187.238 num=59 direction=rx interface=wg-bg1-ftth src-address=200.95.7.232:33016 dst-address=192.0.2.210:51820 protocol=ip ip-protocol=udp size=176 cpu=1 ip-packet-size=176 ip-header-size=20 dscp=0 
   identification=52484 fragment-offset=0 ttl=49 

59 time=187.238 num=60 direction=tx interface=wg-bg2-ftth src-address=134.180.130.235:51820 dst-address=200.95.7.232:33016 protocol=ip ip-protocol=udp size=120 cpu=2 ip-packet-size=120 ip-header-size=20 dscp=34 
   identification=20323 fragment-offset=0 ttl=64

It seems finally I could get it working with the combined ideas of @lurker888, @wiseroute, @Larsa and myself.

Note that neither DNAT or SNAT by itself work for the reasons I mentioned a couple of times. It is important to indeed use a separate dummy device for DNAT. It is also necessary to create a dummy routine table and PBR rules based on source address

Anyway, let me just post my solution:

/interface bridge
add arp=disabled fast-forward=no name=dum1 protocol-mode=none

/ip address
add address=172.20.215.1 interface=dum1 network=172.20.215.1

/routing table
add comment="Dummy table containing a default route for everything from 172.20.215.1" fib name=bugfix_wg

/routing rule
add action=lookup-only-in-table src-address=172.20.215.1 table=bugfix_wg

/ip firewall mangle
add action=mark-routing chain=output log=yes log-prefix="[***WG-MANGLE-RT]" new-routing-mark=default_myas passthrough=yes protocol=udp src-address=172.20.215.1 src-port=51820

/ip firewall nat
add action=dst-nat chain=dstnat dst-address=192.0.2.210 dst-port=51820 log=yes log-prefix="[***WG-DNAT]" protocol=udp to-addresses=172.20.215.1 to-ports=51820

This is how I think it works:

  • The DNAT rule as prev proposed translates 192.0.2.210 → 172.20.215.1
  • For the translation of the return packet to work, it is instrumental that wireguard will use 172.20.215.1 as source address
  • This is accomplished by using dummy table bugfix_wg and looking it up when the source address comes from the dum1 interface (src-address=172.20.215.1). (Note: Would it be correct what @lurker suggested that for the initial “routing decision” the src-address rule can’t be selected because source address is empty, this wouldn’t work. But my table is clearly selected)
  • Now the packet is in interface dum1 with source address 172.20.215.1. This is important because only now the source address can be reverted by the connection tracking of the DNAT rule!
  • However, now the packet would never get out because it’s stuck in dum1 (via dummy routing table). Hence the mangle rule ensures that the routing decision is overruled and the “default_myas” table is selected

I did not expect this could ever work and this is the worst hack I have ever done. In 3 months from now I will have no clue how and why this configuration works. Only because of an (in my opinion), terrible wireguard design decision and implementation. Would just one of these be implemented, one wouldn’t have to revert to such criminal hacks:

  • Do not just bind to any interface/address of the system; at least support “bind-address”
  • Do not just leave the source port unset (especially when there was no connection context before and a client just connected to the right endpoint address)
  • Do implement VRF for wireguard

One day later and this stopped working. The crux really seems to be the initial source address that is assigned on the first routing decision.

I’m sure I have the same config but different source address is chosen. Instead of 172.20.215.1, it’s again the ISP IP or something else.

@lurker888: Do you know where this routing decision documented? In particular, how, based on /routing/rule and /ip/route is the address on the first step routing decision decided?
From my tests it still seems to me that pref-src or even the selected route is totally arbitrary for a local packet with unset source address.

For example, I tried to SNAT the external connection to the local address 172.20.215.1. That way, the wireguard response will always go to 172.20.215.1 and I can add a route to 172.20.215.1:

/ip/route/add dst-address=172.20.215.1/32 gateway=dum1 pref-src=172.20.215.1

Shouldn’t this route be selected all the time now and pref-src be honored?

I think the issue with pre-src on 172.20.215.1/32 is that there is another direct connected route with higher priority.

Alright, another crazy option: SNAT to an invalid address and then use a static route to force pref-src on that prefix.
Disadvantage: All packages appear to come from the own address, visibility on road warrior IP address is lost

/ip route
add comment=bugfix-wg dst-address=172.20.215.254 gateway=vlan44 pref-src=192.0.2.210 routing-table=main
/ip firewall filter
add action=accept chain=input dst-address=192.0.2.210 dst-port=51820 protocol=udp
/ip firewall nat
add action=src-nat chain=input dst-address=192.0.2.210 dst-port=51820 protocol=udp to-addresses=172.20.215.254

Note that 172.20.215.254 is not assigned anywhere locally on the router.