ok, I’m stumped on this.
I have a situation where I’ve had WG working on a specific 5009 for months. The site added a 2nd ISP. Wanted to be able to use WG on both connections.
I enabled ECMP on both default routes, and added the following mangles to keep ISP1 and ISP2 connections to their own lane.
If I try to connect to this WG instance via ISP1, via my wired ISP, handshake fails. If I try to connect to this WG instance via ISP1, via my LTE data, handshake works everytime.
If I try to connect to this WG instance via ISP2, via my wired ISP, handshake works. If I try to connect to this WG instance via ISP2, via my LTE data, handshake works everytime.
To make sure I did not have a WG config error or any kind of other user overlap. I created a brand new WG instance, seperate subnet, allowed it out, allowed the WG ports in… and the exact same symptoms.
I’m struggling to find why this would block me on WG on my hard wired ISP vs say my mobile data.
Any suggestions on what to check?
Since the mangle alone do not do all the works, all the other part of the config must be analyzed, like missing interface on WAN list, error on routing table, etc. etc.
If you need to have your car checked out, do you just bring only the tank cap to the mechanic?
Unlike a car, I can’t bring you the physical device. There’s a lot of superfluous config on this device that is outside the purpose of WG and I’d be gladly posts the parts that are requested. To sanitize the entire config would take hours.
Both ISP interface are on the WANlist. I’m not flawless and I’m more than willing to admit its possible I made a config mistake. If you have specific sections you’d like to see, I’ll sanitize those and post them.
The gist of is, that the response to a query is already bleeding out of WAN1 instead of WAN2, for the initial handshake.
Therefore we tell the router that all traffic coming from ether2 (wan2) is destinatted to WAN1.
Thus when WAN1 incorrectly replies to traffic from that port ithe router sends the traffic UN Destinationatted back via WAn2. (ether2).
Now, here’s a thing what we’ve noticed in our testing. Reboots cause cause the ‘active’ WG interface to change. So for my issue, yesterday WAN1 would not work hardly at all for WG. However after a reboot last night, WG works fine on WAN1, but doesn’t work at all for WAN2 today.
No this only affects the WAN that is second in natural priority ( the failover wan so to speak).
You still need some mangling going on, and so I would have to see the config to comment further
What parts of the config do you need and I’ll gather those?
My thoughts are…
interface wireguard (i'll need to hide the private keys)
ip route
ip addresses (i'll change the public IP)
ip firewall filter
ip firewall nat
ip firewall mangle
Why did you completely disregard the advice provided??
No this only affects the WAN that is second in natural priority ( the failover wan so to speak).
You still need some mangling going on, and so I would have to see the config to comment further
To be fair, I didnt look at your whole config but will look tonight.
(1) Why the four or five wireguard interfaces. I like simple and clean. Unless there is a reason to have four or five you only need one interface!
You can actually define and use multiple IP subnets to a single wireguard interface.
The only reason you would need multiple INTERFACES is if there was any router traffic that needed to go out the internet on the remote end.
Then you would have to use 0.0.0.0/0 on the allowed peer settings, and thus not possible to have more than one peer effectively.
(2) What a mess, you have bridge but dont use bridge vlan filtering… as I stated simplify.
Nothing to be gained by the over complex structure.
(3) For example why do you have two management subnets…makes no sense
/ip pool
add name=dhcp_pool_mavico ranges=10.1.10.50-10.1.10.254
]add name=dhcp_pool_administradores ranges=10.1.60.2-10.1.60.254 add name=dhcp_pool_management ranges=10.1.50.10-10.1.50.253
add name=dhcp_pool_alquiler1 ranges=10.1.1.2-10.1.1.254
add name=dhcp_pool_VPN ranges=192.168.89.10-192.168.89.254
add name=dhcp_pool_servidores ranges=10.0.0.2-10.0.0.254
add name=dhcp_pool_alquiler2 ranges=10.1.2.2-10.1.2.254
add name=dhcp_pool_alquiler3 ranges=10.1.3.2-10.1.3.254
add name=dhcp_pool_invitados ranges=10.1.100.2-10.1.100.254
add name=dhcp_pool_wifi_mavico ranges=10.1.80.2-10.1.80.254
add name=dhcp_pool_ILO ranges=10.0.1.2-10.0.1.6
add name=dhcp_pool_can&t ranges=10.1.15.2-10.1.15.18 add name=dhcp_pool_management_network ranges=10.0.2.2-10.0.2.254
add name=dhcp_pool_MINER ranges=10.0.250.2-10.0.250.4
add name=dhcp_pool_productora ranges=10.1.150.25-10.1.150.254
(4) Rp filter strict is a no no especially in multiwan scenario, should be set to LOOSE
This is a client router and I’m coming in after its been deployed and their last network guy left.
I agree the vlans need to be cleaned up, its a mess along with a bunch of other things.
I’d like the is thread to focus on the WG issues with multi wan.
After our last discussion, I did find and change the RP filter to loose.
The client has had multiple WG interfaces since before I took them on.
wireguard1 - has both roadwarrior and a site to site tunnels. Meant to come in on WAN1
wirenetlife - has both roadwarrior and a site to site tunnels. Meant to come in on WAN2
wiregrelive - has only roadwarrior tunnels, meant to come in on WAN1.
wireguard2-test - was set up as a brand new test for me to see if the problem stemmed from prior setup. This will be deleted
While I’m confident I can merge wiregrelive and wireguard1 together, that doesn’t solve the core issues I’ve reported in this thread.
There are bandwidth and latency reasons why certain connections need to come in WAN1 and not WAN2, and vice versa. Once I have this issues sorted, WG ports will only be allowed in via their respective WAN interfaces.
You asked for the config, I’ve provided it.
I listed the mangles above, and in the config. I’ll list them again in case you missed them.
/ip firewall mangle
add action=mark-connection chain=input comment="Keep eth1 inbound connections to eth1" connection-mark=no-mark in-interface=ether1 new-connection-mark=ISP1-IN passthrough=yes
add action=mark-routing chain=output comment="Keep eth1 connections to eth1 - mark route" connection-mark=ISP1-IN new-routing-mark=ISP1 passthrough=no
add action=mark-connection chain=input comment="Keep eth2 inbound connections to eth2" connection-mark=no-mark in-interface=ether2 new-connection-mark=ISP2-IN passthrough=yes
add action=mark-routing chain=output comment="Keep eth2 connections to eth2 - mark route" connection-mark=ISP2-IN new-routing-mark=ISP2 passthrough=no
I’ve added these DSTnat rules per your recommendation.
During my tests, from my home ISP, I can handshake with all WG instances via WAN1 and WAN2, however, from my home ISP, I cannot route any actual traffic. (like checking ipinfo.io)
However, if I test using my cell phone with the exact same WG peer profiles, WAN2 instances work, but, and only wiregrelive works via ISP1.
It’s completely inconsistent.
My only guess is that there’s some kind of hashing algo related to ECMP that’s happening with each WG instance in RouterOS, and once that has has been made, it stay resident in memory until the unit is rebooted. If we reboot the device, WG could work fine via WAN1, or only work via WAN2.