Frustrating, out of the blue malfunction

This happened completely out of the blue. That is, I have made no changes to any configuration.

The only thing that could possible be related is that we have rolling internet outages for the past several days.

I have 8 sites – call them A, B, C, D, E, F, G, H

All connected via Wireguard, and all have been working perfectly for many months.

All sites can ping the Mikrotik devices at all other sites.

But, devices behind the the Mikrotik device at site A cannot ping the Mikrotik device or any devices behind the Mikrotik at site B.

All other devices at all sites can ping all other devices at all sites (VLAN and filtering rules notwithstanding).

I understand my description is a little abstract, so here it is in more concrete and explicit terms:

Site A has a Ubiquiti UDMPro and a LAN IP range of 192.168.0.1/24. Behind UDM is a hEX at 192.168.0.11 that handles all Wireguard connections.

Site B has an RB-5009 and a LAN IP range of 192.168.2.2/24.

From the 5009 and all devices on the LAN at site B I can ping all devices in 192.168.0.0/24 (site A). And, from all devices connected to the 5009 in the 192.168.2.0/24 LAN I can also ping all devices in the 192.168.0.0/24 (site A) LAN.

From the hEX at site A I can ping everything in the 192.168.2.0/24 LAN. But, from devices on the LAN at 192.168.0.0/24 (site A), I cannot ping any device at 192.168.2.0/24.

I tried using packet sniffer to see where the packets go and the results are even more confusing:

While at a Windows PC at 192.168.0.155, if I ping 192.168.2.222 (which is not assigned any device at site B), I can packet sniff the hex and see if go out the WG interface. But, I also see tons of frames coming from 192.168.0.1 (the UDMPro) to 192.168.2.222 (the non-existing device).

I tried rebooting the UDM, the hEX, the RB-5009 multiple times.

I tried different different devices in the 192.168.0.0/24 network and the same behavior occurs.

My hunch is that the problem is something in the UDM. I haven’t changed anything, but maybe an automatic upgrade did something.

Other details:

I’ve SSH’d into the UDM and can run traceroute.

Traceroute to 192.168.30.2 (another site) shows the frame going to 192.168.0.11 (the local hex handling the wireguard connections) and then to 192.168.30.2

When I traceroute to 192.168.2.2, the frame gets to 192.168.0.11 and doesn’t get any farther (asterisks only).

Anyone have any ideas?

I know I have not posted the configs – hoping for more conceptual approach (at least to start).

I ran the dyndns update script on both routers, rebooted, and now it works.

Wireguard uses the dyndns hostname as the remote endpoint.

I suspect that’s what the problem was.

I spoke too soon.

After several days, the exact same problem occurred.

This time, running the dyndns script on both routers, and then restarting the routers, has not fixed the problem.

Same exact symptoms.