RouterOS blatantly ignores pref-src. Can this really be a bug?

Some one managed to solve this problem?

My environment have one wireguard instance bound to 13231. There is 2 different ISP each having 3 external IPs. This is used for load balancing internet access and access from internet to local dnated services. Only the problem with wireguard source IP and port can not set up.

I’ve tried the examples here with adjustment to my routing markings and ips. Nothing does work. Or there is no connection established, or the established connection does not pass crypted traffic. The only working setup is for client specify first (of mikrotik logic) external ip to connect and then it will work.

What I observed is, that if client starts a connection to any other ip (than the first one from mikrotik’s point of view), then (accordingly to wireguard’s philosophy) a new connection is created as response and that connection has a src IP set to first IP of the mikrotik’s point of view. This is shown in connections tracking table. So there is no any tracking of connections as wireguard does not respond to the incoming packet, but rather creates a new one with destination of client’s ip and port.

If I try to mark-routing the outbound packets accordingly to examples here, then indeed packet is send via specified provider, but the src address of packet is not correct for that provider. And so the packet is filtered at providers firewall as incorrect src addresses.
If I try to src-nat the output so the src address will be the one needed for that provider (and the one client addressed at first packet), then mikrotik routers change the src port to some random one and on client I see the packet coming from desired IP, but the port is not the servers wireguard port.

For reference, I am on 7.16.2 firmware.

Have you tried policy routing using routing rules, with separate routing tables for each WireGuard instance?

I’ve tried to use routing rules, but that does not help at all. Creating multiple wireguard instances is out of concept for multiwan access. Wireguard instances can not be bound to specific interface or ip.

The routing rules can capture routing marks (as src address and dst address are somehow dynamic) of packets but after output packed was created by wireguard and a routing decision and a src ip was assigned. So setting the routing mark at mangle/output does not change the src ip. And if a routing rule is applied, the packed is routed to desired outgoing interface, but the pref-source is not used to rewrite the src ip of the packet.

It works. With routing rules, you force outbound traffic to use a specific egress interface, which automatically sets the source address to that WAN’s IP. But you can also use the Sindy NAT trick: “/ip firewall nat chain=dstnat dst-address-type=local in-interface=WAN2 protocol=udp dst-port=wg-port action=dst-nat to-addresses=ip.of.wan

Can you please supply an example of configuration with routing rules for this situation:
ISP1:
IP1 1.2.3.2/24
IP2 1.2.3.3/24
ISP2:
IP1 2.1.3.4/24
IP2 2.1.3.5/24

wireguard port: 13231

Clients should connect to any of the IP and receive packets back from same IP and same port (13231).

I dont see how NAT trick can handle multiple wan ip addresses on same WAN2 interfaces.

Okay, but first try to explain what you’re trying to accomplish without using too many technical terms.

That what is used the wireguard for:

Clients (some remote routers) should connect to any IP given by any ISP at specified port 13231. The mikrotik as a server only acts as responder for clients. Mikrotik should never initiate inital connection to clients. Mikrotik will only respond when client make first connection.

Clients does not have any incoming port open (they are behind some provider) and each client make connect from random IP and random port. The providers will have connection tracking (I hope) that response from wireguard will reach the clients. For this to happen wireguard should respond with same IP and port on which client had first contacted.

Practically this is a multi-wan multiple public IP setup.

I think I understand what @Mimiko want: simply to have many WAN addresses, and for wg to always answer on that address.

WG intentionally as a bit of a strange behavior (different from the usual stuff like ping, DNS over UDP, OpenVPN over UDP, etc.)

One way of doing what is asked for here is the following. It’s not pretty, but it actually works and does so without abusing any functionality.

Create a bridge just for internal use. We will use this to mess with the wg traffic in order to get conntrack to be able to identify connections properly:

/bridge
add name=wg-br
/ip/address
add address=192.168.222.1/30 interface=wg-br

And then we NAT:

/ip/firewall/nat
add chain=dstnat dst-address=1.2.3.2 protocol=udp dst-port=13231 action=dst-nat to-address=192.168.222.1
add chain=dstnat dst-address=1.2.3.3 protocol=udp dst-port=13231 action=dst-nat to-address=192.168.222.1
(…)

/ip/firewall/nat
add chain=input dst-address=192.168.222.1 action=src-nat to-addresses=192.168.222.2

There you have it. Again, not pretty but it works.

EDIT: Because this appeases the gods of conntrack, this also makes connection marks sticky, so it can be used with multiple routing tables, VRFs, etc.

EDIT2. I finished the previous edit with “In fact the whole wg-br stuff can be hidden in a VRF is you want.” This has been implemented in normal Linux for quite some time; it’s done using the fwmark property of the wg interface (which marks underlay packets when being emitted) and then using routing rules based on this fwmark. Unsupported in the MT world, unfortunately.

@lurker888: Great summary! Just wondering, why go with a bridge instead of assigning the addresses straight to the wg interface in this case?

@Mimiko: You’ve probably already thought about it, but just a heads-up that each client still needs a unique public key in a separate peer entry, otherwise WireGuard’s internal crypto routing won’t work properly.

@Larsa: Of course this also works with the two addresses on any interface (one assigned to the router and one not assigned, but routed to the interface in the “main” table - these are the criteria for conntrack to work). For me the wg interface is associated with the overlay traffic and in my applications is often included in a vrf, while we are tracking/natting underlay traffic in this instance. In other words for me it just “feels better.”

It would also be valid to ask why input src-nat is omitted from the official packet flow diagrams by Mikrotik, while being fully and correctly supported.

Yeah, that question has come up a few times before. Personally, I think those packet flow diagrams have a lot of room for improvement! :wink:

Thank you. This does work. I already have marked and sticky connections implemented per each incoming WAN IP.
Also @Larsa, of course each client is configured with its own public key and all allowed addresses. It was just this thing about different WAN IP.

@lurker888 why define a rule in /ip/firewall/nat for each dst-address? I’ve omitted the dst-address and left only dst-port.
Using src-nat hides the ip address of the client. Is this possible to fix? The real ip address of the client give a good troubleshooting info.
If I remove the src-nat rule, the problem comes back.

I’ve tried the solution in this post http://forum.mikrotik.com/t/wireguard-multi-wan-policy-routing/174145/90 last posts.

While using that method do send packet to correct gateway, the initial src ip of the packet generated by wireguard does not change even if I put pref-source on the route entry in fib.

The src-nat is necessary for conntrack to be able to recognize the connections correctly (and to be able to apply the reverse translation for the dst-nat, and add the connection marks). This does mean that the source IP of the connections is masked. I know of no way around this. Of course you can log the packets in firewall or print them from the conntrack entries etc., however these are just workarounds.

Of course only specifying the dst-port for the dst-nat rules will also nat the intended connections. This however also has the effect of natting every packet that is udp/13231, and so no one who uses this port will be able to correctly communicate with anyone if their traffic is routed by your device. This is bad. (Maybe creating an address list would satisfy you?)

As to why the usual way of using output mangle results in the correct src address not being set, even though the correct routing table is selected: I wrote a really long post on this some time ago that takes you through the packet flow and the decisions the router makes step-by-step. You are only a search away. That thread also explains why wireguard is special in this regard.

Also, if you actually plan on using this dst-nat and input src-nat solution, take care to use a firewall udp-timeout larger than your wg keepalive, and of course a longer udp-stream-timeout. I use 35s, 10s and 6m respectively.

I think here using dst-address-type=local (instead of listing the dst-address entries) might work?

Tried the search and is hard to find that post.

@CGGXANNX I use interface lists for this.

HI Larsa and Lurker, have been attempting to follow these entangled threads but not making much headway other than Lurker seems to have come up with a way regardless of scenario to basically ensure that in a multi-wan scenario, RoS can be manipulated to ensure wireguard connections work properly. Notice I avoid the “S” word because in reality it is some bad ass work-around.

One of the threads pointed out that the usual mangle method didnt guarantee the proper WAN for output which was ‘hacked’ by sindy so that the return from a secondary WAN would indeed go out that WAN instead of the primary. This required the usual mangling of input and output chains for traffic into and out of a specific WAN, and then adding in Sindy’s magic.
/ip firewall nat
chain=dstnat dst-address-type=local in-interface=WAN2 protocol=udp dst-port=wg-port action=dst-nat to-addresses=ip.of.wan.1

Which, in effect ensures that any traffic associated with the wireguard and WAN2 is dstnatted to WAN1 and thus on the way out of the router, the traffic is undestinatted so leaves the router with WAN2 IP.

++++++++++++++++++++++++++++++++++++++

The above seems to work with traffic arriving at the router as opposed to traffic originating from the traffic.
Can you state conceptually what you are doing that is different from the above… and what your are adding to the mix… ( in terms that a non-IT person can grasp ).

+++++++++++++++++++++++++++++++++++

As an aside MT recently introduced responder checkbox and it seems tied to a peer which Ithink incorrectly identifies its an issue with the remote peer whereas I think it tries to fix a problem at local router and thus should be part of the wiireguard interface menu not peer.
Does anyone know what this selection actually does, and does it have any relationship to the above …

@Mimiko
My post that explains why wireguards has this idiosyncrasy in its design (right here on this thread):
http://forum.mikrotik.com/t/routeros-blatantly-ignores-pref-src-can-this-really-be-a-bug/180360/15

And the one that explains the source address assignment in detail: (You should probably scroll though the entire thread)
http://forum.mikrotik.com/t/mangle-policy-based-routing/181586/6

@anav
I don’t know if the question in the +++ / +++ was addressed to me, but: The workaround presented both by sindy and me rely on the same way of using dst-nat to achieve connection tracking. Mine is of course better (obviously :slight_smile: ) in that sindy’s fails if:

  • the wan1 interface goes down (the address we dst-nat to is lost)
  • the default route’s pref src changes (for example because of wan1 going down)

My not-exactly-“s” works around these by: (1) dst-natting to a local address that can’t go down and (2) using the input src-nat ensuring that wg underlay output packets are addressed to the (in some ways virtual) 192.168.222.2 address, the route to which (and its pref-src) cannot change.

Of course sindy’s method preserves the source address, which is nicer. On the other hand in VRF-heavy scenarios input src-nat does wonders for overloaded source IPs (in fact conntrack cannot be correctly maintained without it).

Thanks much lurker, that is most helpful for me and will take the time to digest traffic flows as you have manipulated them!!

Any thoughts on what the responder checkbox is trying to do??

Sorry, I meant to answer that one as well, just forgot by the time I got there.

Responder is a very useful feature. Wireguard in its default form (that generally should not be reconfigured by the user) behaves in the following way:

  • the purpose of handshakes is to create an ephemeral symmetric key for data encryption (and MAC generation)
  • an ephemeral key is discarded (invalidated, overwritten, shredded) 180s after its creation
  • if a packet is to be sent to the other side (including keepalives) and the peer entry has a valid ephemeral key, then the data packet i sent using this key
  • if when sending a packet there is an ephemeral key, but it is more than 150s old (180s - 30s) a new handshake is initiated - until the new handshake completes, but the existing key has not been discarded, the existingt key is used for communication
  • if when sending a packet, there is no ephemeral key (that is valid) a handshake is initiated - traffic will resume when (after a successful handshake) a new ephemeral key is established

WG treats the two sides of the tunnel the same (both are peers) but it can’t be overlooked that in many instances it is used as part of a client-server architecture. In these instances it is useful to preclude the server from initiating repeated handshakes. Marking the client peer on the server side as responder=yes does this - that is it simply instructs the side where it is set to never initiate a handshake. (Note that wg maintains a “current endpoint” ip/port separate from the configured “endpoint” address/peer, that is: if the client is no longer available, the server will send repeated and endless handshake requests to the last address of the client as long as it sees packets that should be routed to that client.)

There is a bit of a misconception that keepalive should only be enabled on the client side and disabled on the server. This is partially true in that this is a correct way of configuring things. However it is not mandatory to do so, and it is also correct to set keepalives on the responder=yes side. This simply has the effect of the server sending keepalives as well, which will be silently dropped if no handshake from the client side is initiated for some time (because with responder=yes, the server side will never initiate a handshake.)