Community discussions

MikroTik App
 
Josephny
Member
Member
Topic Author
Posts: 495
Joined: Tue Sep 20, 2022 12:11 am

Lost Connectivity

Sat Mar 11, 2023 9:07 pm

It's been a strange day in MT-Land.

We had a substantial snow storm overnight.

At 7:40am a remote hEX dropped off the wireguard tunnel. I was still able to ping the router from its public IP address and the users there still had Internet access.

I tried pinging the both the WG and the LAN private addresses and did not get a response. Unable to use Winbox. Local hEX stopped WG handshaking.

I had someone there (100 miles away) reboot and it worked for 40 minutes, then the same symptoms.

Had them reboot again and it stayed working.

Now, 6 hours later, same exact symptoms at a different site (2 miles from the first remote site).

I can ping the public address and the devices at the site still have Internet access, but I have no access to the hEX.

When I regained access to the first hEX, I opened an SSH session to it I saw startup messages that the DHCP-Client lost its IP address on ether1. Ether1 is the WAN port on the hEX and connected to the cable internet provider's modem.

Any suggestions on how to continue troubleshooting this?
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 19323
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: Lost Connectivity

Sat Mar 11, 2023 9:17 pm

So when the HOST router has connection issues ( host= server for initial handshake ), what happens is that the client router will attempt to re-connect with the host router.
THis also happens when a dynamic WANIP changes.
So the Mikrotik client attempts to find the endpoint again.

The problem is if the endpoint address is not available right away. The wireguard client stops trying to connect. One has to literally turn off the wireguard interface at the client and turn it back on again.
This also happens if the wireguard router is to slow to resolve the new DYNDNS name ( ie wireguard connection attempt happens first).

I have asked MT to address this internally on any MT device acting as a client. In other words if the MT device has keep alive set on a peer, then they should attempt to reconnect to the peer not just once but on a scale of time ( right away, after 1min, after 5min, after 30min, after 1 hour, after 6 hours etc... and probably stop at 24 hours).

In any case you can find scripts people have written to overcome this known issue (that sadly MT refuses to deal with and thus eventually hits everyone like yourself like a slap in the face!!)
See Para 6 - viewtopic.php?t=182340
 
Josephny
Member
Member
Topic Author
Posts: 495
Joined: Tue Sep 20, 2022 12:11 am

Re: Lost Connectivity

Sat Mar 11, 2023 9:33 pm

Wow -- great!

So, Easy, Basic, Advanced, Elegant, or Brutal?

I think I'll go with Elegant.
:foreach i in=[/interface/wireguard/peers/find where disabled=no endpoint-address~"[a-z]\$"] do={
  :local LastHandshake [/interface/wireguard/peers/get $i last-handshake]
  :if (([:tostr $LastHandshake] = "") or ($LastHandshake > [:totime "5m"])) do={
    /interface/wireguard/peers/set $i endpoint-address=[/interface/wireguard/peers/get $i endpoint-address]
  }
}
I added a line to make a log entry, but $i is just a number, so I'm trying to figure out how to reference it back to something recognizable like the contents of comment.
 
Josephny
Member
Member
Topic Author
Posts: 495
Joined: Tue Sep 20, 2022 12:11 am

Re: Lost Connectivity

Mon Mar 13, 2023 12:50 pm

I added 2 lines to create a log entry for when the reentry of the endpoint address due to last handshake being more than 5 minute occurs:

:foreach i in=[/interface/wireguard/peers/find where disabled=no endpoint-address~"[a-z]\$"] do={
  :local LastHandshake [/interface/wireguard/peers/get $i last-handshake]

# Added this:
  :local endpoint [/interface/wireguard/peers/get $i endpoint-address]

  :if (([:tostr $LastHandshake] = "") or ($LastHandshake > [:totime "5m"])) do={
    

    /interface/wireguard/peers/set $i endpoint-address=[/interface/wireguard/peers/get $i endpoint-address]

# Added this:
   :log info "WG-iface-restart script found WG peer with last handshake greater than 5 minutes; then reset the endpoint-address to reload dns of endpoint:  $endpoint"

  }

Who is online

Users browsing this forum: ianjay06, memo009525, raiser and 108 guests