Wireguard bug: connections via WG tunnels suddenly failing

I am experimenting since 3 monhts or so with the wireguard implementation running on a RB450G. It it works, it works like a charm, but I regularly see clients suddenly failing to route via the tunnel, without having touched the condiguration on either side. The incomming connection is shown in the server, the tx counter on the client increases, but rx stays at 92 byte, after a few seconds 120 byte, etc. Whatever I tried did not fix this, only deploying a new client key pair helped.

Meanwhile I found a simpler way: If I change e.g. one character of the client’s public key on the server, safe the key, and then change it back to the original, correct value and safe it, all works fine again. It seems like the internal representation of the client’s public key on the server becomes somehow corrupted after a non-deterministic time. I cannot reproduce this phenomenon, and it also does not happen regularly.

Maybe this observation helps whoever is in charge of the wireguard implementation.

I’m currently debugging something similar. Couple questions to you:

  1. Is disabling the WG interface and re-enabling it again fixes the problem?
  2. Can RB ping the client in this broken state?

I just started using WG tunnel between an RB450Gx4 acting as a server behind a CCR1009 router and the other end is an RB4011 behind a consumer router.
Not enough experience to know if this happens. what specific log entry can be made to pinpoint if this happens?
Otherwise way to much noise on logs??

Sorry, I am not here too often … as to 1, I did not try this, will do so when it happens next time; however, since a reboot did not solve the problem, I guess the answer here is “no”. The answer to 2 is “no”.

There seems to be no specific logging topic for wireguard - in fact I haven’t seen any usefol log entry in this case. Debugging wireguard connections is really tough…

I also just added my iphone as a wireguard client to my server and the MT app works great over that.

I usually perform the following ritual when wg acting as a “client”:

  1. Disable/enable WG interface
  2. Ping the WG endpoint/server
  3. Ping the internal IP which should go over the tunnel

…and the tunnel magically comes back.

This is a problem with WG overall, not just on MT. As they essentially just shoot the packets over UDP hoping for the best even the WG itself has little to no knowledge about the tunnel… there’s no “connection” or a “session”. It’s a blessing and a curse.

Well the only reason it doesnt work for me is when I have an incorrect configuration. My limited knowledge in networking and vpns doesnt help LOL.
The best tools are sniffing traffic on ports along the various interfaces as well as ones log (assuming key firewall rules were set to be logged).

The problem I described was with the Mikrotik router being the WG server, clients are diverse (Andorid, IPhone, Win10); the problem is not bound to a specific client.

Just wanted to say thanks for documenting this, I was pulling out my hair trying to understand why my wireguard config wasn’t working after moving it from one router to another. The symptoms were exactly as you describe, and I found that applying the same fix (changing each client’s public key and then changing back to the original) fixed the issue for me also. I’ll see if this issue comes back after a while.