Good afternoon
I configured Wireguard between two Mikrotik routers:
Router-A:
CCR2004-1G-2XS-PCIe v7.15.2
To access the Internet, a pppoe connection and a static IP address are used.
With these settings it works, ping passes.
The problems start after rebooting Router-A, solution: change the port from 52203 to any other, for example 52204, but after reboot it doesn’t work again until you change the port.
Has anyone had this problem?
Why might this happen?
However I would turn off the persistent-keepalive on router-B.
Perhaps trying to connect back to the IP/Port it was last connected too is doing something.
Also, you can check the counters on the firewall rule on Router-B, and see if packets are actually getting in, enable logging on the firewall rule.
See where they are coming from.
Perhaps for more detailed logging
You could enable debug logging for wireguard under system/logging
You could create a mangle passthrough firewall rule, that counts (and perhaps briefly log) every packet coming into port 52203.
Another slight possibility
Router-B is not happy, because router-A still on same port, but wg timestamps from router-A went backwards,
Perhaps router-A wireguard needs restart after Router-A has got correct time.
I disabled persistent-keepalive on Router-B, changed the port on Router-A to 13231, pings pass, I reboot Router-A, pings do not pass, I see from the Firewall and Mangle counters that packets are arriving:
Router-A
wireguard info:
wireguard203: [Router-B] public_key: Handshake for peer did not complete after 5 seconds, retrying (try 2)
wireguard debug:
wireguard203: [Router-B] public_key: Sending handshake initiation to peer (Router-B_ip:52203)
Router-B
wireguard debug:
wireguard203: [Router-A] public_key: Sending handshake response to peer (Router-A_ip:13231)
firewall info:
prerouting: in:ether1 out:(unknown 0), connection-state:established src-mac 08:96:ad:35:ca:d6, proto UDP, Router-A_ip:13231->Router-B_ip:52203, len 176
firewall info:
input: in:ether1 out:(unknown 0), connection-state:established src-mac 08:96:ad:35:ca:d6, proto UDP, Router-A_ip:13231->Router-B_ip:52203, len 176
The problem only occurs between Router-A and Router-B.
I have another router that connects to the Internet via a USB modem and installs a tunnel similar to Wireguard to Router-B, there are no problems with it. This connection works even after a reboot.
There is also a 4th router, which is located behind NAT and also installs a Wireguard tunnel to Router-B. No problem.
I think I have seen something similar in the past, if you turned off the wireguard interface and then turned it back on it fairly soon after. It
didn’t seem to reset to its defaults properly, it seemed to remember at least some of the running state it had before it was turned off.
If you left it off for a while (don’t know how long), it then did seem to reset properly.
Maybe it is supposed to work this way??
I haven’t looked at this for a long time now.
Maybe it would be nice to have a cold restart button, setting on the whole wg interface, and/or individual peers.
a) Changing the port helps
b) Idling for some time helps
At this point it looks like a connection tracker with an old connection stuck in it.
c) Non-routable IP address
There’s surely a connection tracker - your ISP does NAT, and in most cases that requires a connection tracker.
d) If the non-routable IP address is new each time you connect PPPoE?
If the address is new every time, and older connections are not purged from the NAT connection tracker when you reconnect PPPoE (they should be purged), the connection tracker uses an old NAT rule, redirecting the packets destined to the WG port on your routable static IP address to your old non-routable IP address, thus the packets can’t reach your device (and “Counter = 0” confirms it). Changing the port bypasses the old connection, and idling lets the connection timeout.
If it’s the case, it’s unfortunately not under your control. And getting the ISP to fix their NAT could be just next to impossible.
Well, more than a solution, it is a (please allow me, ugly) workaround, if it takes 600s to execute, better than no connection, still …
In an earlier post you mentioned that changing the port to another and then restoring the original port worked to re-establish the connection, maybe doing that in the script would take less than 10 minutes?
In the near future the equipment will move to another location and I will exclude PPPoE.
Hopefully the problem is really in the ISP and the problem will be solved.