Community discussions

MikroTik App
 
kelarlee
newbie
Topic Author
Posts: 29
Joined: Thu Dec 27, 2018 5:48 pm

Ipsec stuck connections problem

Sun Nov 27, 2022 12:57 pm

Hi I have a strange issue with my IPSEC tunnels on my Mikrotiks and trying to solve it without any luck for long time :( I have different MT hardware ( 2 CCR-1009(6.48.6), 2 RB3011(6.48.6), CHR(7.5)) and same issue on all of them. Also on all MT i have dual WAN setup with failover and all connected with L2TP+Ipsec site-to-site and several of them with cloud infrastructure via IPSEC tunnel to VMware Edge router. The main problem that sometimes when my routers losing WAN connection via main ISP and changing to reserve ISP my VPN connections not reconnecting. For some reason not all of them not reconnecting (within the same router), sometimes one IPSEC tunnel reconnecting but another not. Log showing this:
09:01:52 ipsec,info ISAKMP-SA deleted =local ip[500]-=endpoint ip[500] spi:924e9e8599605026:54a2647d6ee30774 rekey:1 
09:01:53 l2tp,ppp,info CHR: initializing... 
09:01:53 l2tp,ppp,info CHR: connecting... 
09:01:53 ipsec,info initiate new phase 1 (Identity Protection): =local ip[500]<=>=endpoint ip[500] 
09:01:53 ipsec,info ISAKMP-SA established =local ip[500]-=endpoint ip[500] spi:e2b4b521ad9c6f05:8fc0149b947757b3 
09:02:17 l2tp,ppp,info CHR: terminating... - session closed 
09:02:17 l2tp,ppp,info CHR: disconnected 
09:02:18 ipsec,info ISAKMP-SA deleted =local ip[500]-=endpoint ip[500] spi:e2b4b521ad9c6f05:8fc0149b947757b3 rekey:1 
09:02:20 l2tp,ppp,info CHR: initializing... 
09:02:20 l2tp,ppp,info CHR: connecting... 
09:02:20 ipsec,info initiate new phase 1 (Identity Protection): =local ip[500]<=>=endpoint ip[500] 
09:02:20 ipsec,info ISAKMP-SA established =local ip[500]-=endpoint ip[500] spi:df9dae4095d5c01f:ed458fd6635ad8d5 
09:02:44 l2tp,ppp,info CHR: terminating... - session closed 
09:02:44 l2tp,ppp,info CHR: disconnected 
09:02:45 ipsec,info ISAKMP-SA deleted =local ip[500]-=endpoint ip[500] spi:df9dae4095d5c01f:ed458fd6635ad8d5 rekey:1 
09:02:50 l2tp,ppp,info CHR: initializing... 
09:02:50 l2tp,ppp,info CHR: connecting... 
09:02:50 ipsec,info initiate new phase 1 (Identity Protection): =local ip[500]<=>=endpoint ip[500] 
09:02:51 ipsec,info ISAKMP-SA established =local ip[500]-=endpoint ip[500] spi:01537ce2a42cf634:7e9a4790f01b2218 
09:03:14 l2tp,ppp,info CHR: terminating... - session closed 
09:03:14 l2tp,ppp,info CHR: disconnected 
09:03:15 ipsec,info ISAKMP-SA deleted =local ip[500]-=endpoint ip[500] spi:01537ce2a42cf634:7e9a4790f01b2218 rekey:1 
09:03:24 l2tp,ppp,info CHR: initializing... 
09:03:24 l2tp,ppp,info CHR: connecting... 
09:03:24 ipsec,info initiate new phase 1 (Identity Protection): =local ip[500]<=>=endpoint ip[500] 
09:03:25 ipsec,info ISAKMP-SA established =local ip[500]-=endpoint ip[500] spi:e93478b40170688f:c1e9b60e99d4c277 
09:03:48 l2tp,ppp,info CHR: terminating... - session closed 
09:03:48 l2tp,ppp,info CHR: disconnected 
09:03:49 ipsec,info ISAKMP-SA deleted =local ip[500]-=endpoint ip[500] spi:e93478b40170688f:c1e9b60e99d4c277 rekey:1 
09:03:58 l2tp,ppp,info CHR: initializing... 
09:03:58 l2tp,ppp,info CHR: connecting... 
09:03:58 ipsec,info initiate new phase 1 (Identity Protection): =local ip[500]<=>=endpoint ip[500] 
09:03:59 ipsec,info ISAKMP-SA established =local ip[500]-=endpoint ip[500] spi:529291b79a2248c1:e73000f0084d83d8 
09:04:22 l2tp,ppp,info CHR: terminating... - session closed 
09:04:22 l2tp,ppp,info CHR: disconnected 
09:04:23 ipsec,info ISAKMP-SA deleted =local ip[500]-=endpoint ip[500] spi:529291b79a2248c1:e73000f0084d83d8 rekey:1 
09:04:32 l2tp,ppp,info CHR: initializing... 
09:04:32 l2tp,ppp,info CHR: connecting... 
09:04:32 ipsec,info initiate new phase 1 (Identity Protection): =local ip[500]<=>=endpoint ip[500] 
09:04:33 ipsec,info ISAKMP-SA established =local ip[500]-=endpoint ip[500] spi:3bc36bc854ed620e:ea87335190839cf2 
09:04:56 l2tp,ppp,info CHR: terminating... - session closed 
09:04:56 l2tp,ppp,info CHR: disconnected 
09:04:57 ipsec,info ISAKMP-SA deleted =local ip[500]-=endpoint ip[500] spi:3bc36bc854ed620e:ea87335190839cf2 rekey:1 
09:05:06 l2tp,ppp,info CHR: initializing... 
09:05:06 l2tp,ppp,info CHR: connecting... 
09:05:06 ipsec,info initiate new phase 1 (Identity Protection): =local ip[500]<=>=endpoint ip[500] 
09:05:07 ipsec,info ISAKMP-SA established =local ip[500]-=endpoint ip[500] spi:a4a61b9c2894a080:80869e3dac93d5be 
09:05:30 l2tp,ppp,info CHR: terminating... - session closed 
09:05:30 l2tp,ppp,info CHR: disconnected 
09:05:31 ipsec,info ISAKMP-SA deleted =local ip[500]-=endpoint ip[500] spi:a4a61b9c2894a080:80869e3dac93d5be rekey:1 
09:05:33 system,info,account user kelar logged out from C8:1F:66:DF:E0:8C via winbox 
09:05:40 l2tp,ppp,info CHR: initializing... 
09:05:40 l2tp,ppp,info CHR: connecting... 
09:05:40 ipsec,info initiate new phase 1 (Identity Protection): =local ip[500]<=>=endpoint ip[500] 
09:05:41 ipsec,info ISAKMP-SA established =local ip[500]-=endpoint ip[500] spi:6f52361bcf6e696c:1b0575c926f1e79e 
09:06:04 l2tp,ppp,info CHR: terminating... - session closed 
09:06:04 l2tp,ppp,info CHR: disconnected 
09:06:05 ipsec,info ISAKMP-SA deleted =local ip[500]-=endpoint ip[500] spi:6f52361bcf6e696c:1b0575c926f1e79e rekey:1 
09:06:14 l2tp,ppp,info CHR: initializing... 
09:06:14 l2tp,ppp,info CHR: connecting... 
09:06:14 ipsec,info initiate new phase 1 (Identity Protection): =local ip[500]<=>=endpoint ip[500] 
09:06:15 ipsec,info ISAKMP-SA established =local ip[500]-=endpoint ip[500] spi:d0e68a144b6529d4:5703ab202b06fa7f 
09:06:38 l2tp,ppp,info CHR: terminating... - session closed 
09:06:38 l2tp,ppp,info CHR: disconnected 
09:06:39 ipsec,info ISAKMP-SA deleted =local ip[500]-=endpoint ip[500] spi:d0e68a144b6529d4:5703ab202b06fa7f rekey:1 
09:06:48 l2tp,ppp,info CHR: initializing... 
09:06:48 l2tp,ppp,info CHR: connecting... 
09:06:48 ipsec,info initiate new phase 1 (Identity Protection): =local ip[500]<=>=endpoint ip[500] 
09:06:49 ipsec,info ISAKMP-SA established =local ip[500]-=endpoint ip[500] spi:9e751284f1ea35af:84ba9b2e3b7aa6d9 
09:07:12 l2tp,ppp,info CHR: terminating... - session closed 
09:07:12 l2tp,ppp,info CHR: disconnected 
09:07:13 ipsec,info ISAKMP-SA deleted =local ip[500]-=endpoint ip[500] spi:9e751284f1ea35af:84ba9b2e3b7aa6d9 rekey:1 
09:07:23 l2tp,ppp,info CHR: initializing... 
09:07:23 l2tp,ppp,info CHR: connecting... 
09:07:23 ipsec,info initiate new phase 1 (Identity Protection): =local ip[500]<=>=endpoint ip[500] 
09:07:23 ipsec,info ISAKMP-SA established =local ip[500]-=endpoint ip[500] spi:1b2c6aa4e1b23a25:d5b2148737be221a 
09:07:47 l2tp,ppp,info CHR: terminating... - session closed 
09:07:47 l2tp,ppp,info CHR: disconnected 
09:07:48 ipsec,info ISAKMP-SA deleted =local ip[500]-=endpoint ip[500] spi:1b2c6aa4e1b23a25:d5b2148737be221a rekey:1 
09:07:57 l2tp,ppp,info CHR: initializing... 
09:07:57 l2tp,ppp,info CHR: connecting... 
09:07:57 ipsec,info initiate new phase 1 (Identity Protection): =local ip[500]<=>=endpoint ip[500] 
09:07:57 ipsec,info ISAKMP-SA established =local ip[500]-=endpoint ip[500] spi:838b10deef7e662f:e7d97d4280647c54 
09:08:21 l2tp,ppp,info CHR: terminating... - session closed 
09:08:21 l2tp,ppp,info CHR: disconnected 
09:08:22 ipsec,info ISAKMP-SA deleted =local ip[500]-=endpoint ip[500] spi:838b10deef7e662f:e7d97d4280647c54 rekey:1 
09:08:31 l2tp,ppp,info CHR: initializing... 
09:08:31 l2tp,ppp,info CHR: connecting... 
09:08:31 ipsec,info initiate new phase 1 (Identity Protection): =local ip[500]<=>=endpoint ip[500] 
09:08:31 ipsec,info ISAKMP-SA established =local ip[500]-=endpoint ip[500] spi:3d689c8be4da7058:669f106fa95f6a3b 
09:08:33 l2tp,ppp,info CHR: authenticated 
09:08:33 l2tp,ppp,info CHR: connected 
All i do is killing this 3 connection in connection tracker and VPN tunnel is establishing in 5 sec( https://imgur.com/s34lC5X ). DPD interval is 120 sec and 5 times to fail, but the problem is connection still not closing after DPD interval ends until i manually kill connection. Cant figure out why this happens and i cant find any dependence because every time it is different router and different VPN tunnel.
 
sindy
Forum Guru
Forum Guru
Posts: 10205
Joined: Mon Dec 04, 2017 9:19 pm

Re: Ipsec stuck connections problem

Sun Nov 27, 2022 1:31 pm

You've made most of the analysis yourself :)

The thing is that the lifetime of a UDP pinhole (tracked connection) is 3 minutes by default, and after detecting peer unavailability, the IPsec stack does terminate the existing connection and initiate a new one, but it uses the same local and remote address:port tuples so it hits the same pinhole and keeps refreshing its keepalive timer this way. And after the retransmission timer expires, the IPsec stack starts a new connection attempt much sooner than in 3 minutes.

The connection tracking remembers the reply-dst-address (which is a code name for the source address:port assigned to the initial packet of the connection after it has been eventually src-nated), so if the route changes due to WAN path unavailability, the packets from the local peer to the remote one keep leaving from the same reply-dst-address as before, which either makes the ISP drop them as they come from an "illegal" address, or in best case makes the responses get lost on their way from the remote peer to the local one because the WAN with that address is unavailable.

So if we leave aside cases where you have your own AS and BGP, tunnel-in-tunnel arrangements etc., there are just two options:
  • to periodically run a "garbage collector" script that removes any connections whose reply-dst-address is related to a currently unavailable route
  • to have tunnels via both WANs established permanently and only switch the traffic between them. This approach is quite easy with IPsec-encrypted IPIP, GRE, or L2TP, but not so much with bare IPsec (policy-based). In the latter case, it is possible to have a single policy linked to two peers simultaneously, but the remote peer(s) must be ready to cooperate - the configured policy is only negotiated with one of the peers at a time, so it must be dynamically created at one of the ends.
 
kelarlee
newbie
Topic Author
Posts: 29
Joined: Thu Dec 27, 2018 5:48 pm

Re: Ipsec stuck connections problem

Sun Nov 27, 2022 3:28 pm

You've made most of the analysis yourself :)

The thing is that the lifetime of a UDP pinhole (tracked connection) is 3 minutes by default, and after detecting peer unavailability, the IPsec stack does terminate the existing connection and initiate a new one, but it uses the same local and remote address:port tuples so it hits the same pinhole and keeps refreshing its keepalive timer this way. And after the retransmission timer expires, the IPsec stack starts a new connection attempt much sooner than in 3 minutes.

The connection tracking remembers the reply-dst-address (which is a code name for the source address:port assigned to the initial packet of the connection after it has been eventually src-nated), so if the route changes due to WAN path unavailability, the packets from the local peer to the remote one keep leaving from the same reply-dst-address as before, which either makes the ISP drop them as they come from an "illegal" address, or in best case makes the responses get lost on their way from the remote peer to the local one because the WAN with that address is unavailable.

So if we leave aside cases where you have your own AS and BGP, tunnel-in-tunnel arrangements etc., there are just two options:
  • to periodically run a "garbage collector" script that removes any connections whose reply-dst-address is related to a currently unavailable route
  • to have tunnels via both WANs established permanently and only switch the traffic between them. This approach is quite easy with IPsec-encrypted IPIP, GRE, or L2TP, but not so much with bare IPsec (policy-based). In the latter case, it is possible to have a single policy linked to two peers simultaneously, but the remote peer(s) must be ready to cooperate - the configured policy is only negotiated with one of the peers at a time, so it must be dynamically created at one of the ends.
Thank you very much for detailed explanation. Have both ipsec tunnels from different WAN permanently will not work, because VMware Edge router do not accept same local subnets in two different ipsec tunnels and as i know there is no solution how to create or disable it dynamically :( About "grabage collector" script is there any example so i can adapt to my config ? Thank you.
 
sindy
Forum Guru
Forum Guru
Posts: 10205
Joined: Mon Dec 04, 2017 9:19 pm

Re: Ipsec stuck connections problem  [SOLVED]

Sun Nov 27, 2022 4:35 pm

:local actRouteGwStatus ([/ip route get [find where !routing-mark dst-address=0.0.0.0/0 active] gateway-status]->0)
:local ipPort [:pick $actRouteGwStatus ([:find $actRouteGwStatus "via " 0]+4) [:len $actRouteGwStatus]]
:local gwIf [:pick $ipPort ([:find $ipPort " " 0]+1) [:len $ipPort]]
:local wanCIDR [/ip address get [find where dynamic interface=$gwIf] address]
:local wanIP [:pick $wanCIDR 0 [:find $wanCIDR "/"]]
:foreach conn in=[/ip firewall connection find where protocol=udp srcnat src-address~"^192.168.45" !(dst-address~"^192.168.") !(reply-dst-address~"$wanIP")] do={
  :do command={/ip firewall connection remove $conn} on-error={:nothing}
}
I was lazy when writing that, so if you are unlucky, you may have to take more care about regexp matching of literal . (in regexp syntax, a mere . means "any character" so 2.1.1.3 matches not only on exact IP address 2.1.1.3 but also e. g. on any address that contains a 2x1.1y3 pattern).

As you can see I don't filter the connections by destination port (as I need to let SIP connections migrate to other WANs as well), but I do filter them by src-address because only connections from a particular local subnet use the WAN failover whereas connections from other subnets are only allowed to use a WAN that is used as a backup one in the failover scheme.
The script needs to be run periodically by a scheduler - there is no triggering event you could hook it to.
 
kelarlee
newbie
Topic Author
Posts: 29
Joined: Thu Dec 27, 2018 5:48 pm

Re: Ipsec stuck connections problem

Sun Nov 27, 2022 4:40 pm

:local actRouteGwStatus ([/ip route get [find where !routing-mark dst-address=0.0.0.0/0 active] gateway-status]->0)
:local ipPort [:pick $actRouteGwStatus ([:find $actRouteGwStatus "via " 0]+4) [:len $actRouteGwStatus]]
:local gwIf [:pick $ipPort ([:find $ipPort " " 0]+1) [:len $ipPort]]
:local wanCIDR [/ip address get [find where dynamic interface=$gwIf] address]
:local wanIP [:pick $wanCIDR 0 [:find $wanCIDR "/"]]
:foreach conn in=[/ip firewall connection find where protocol=udp srcnat src-address~"^192.168.45" !(dst-address~"^192.168.") !(reply-dst-address~"$wanIP")] do={
  :do command={/ip firewall connection remove $conn} on-error={:nothing}
}
I was lazy when writing that, so if you are unlucky, you may have to take more care about regexp matching of literal . (in regexp syntax, a mere . means "any character" so 2.1.1.3 matches not only on exact IP address 2.1.1.3 but also e. g. on any address that contains a 2x1.1y3 pattern).

As you can see I don't filter the connections by destination port (as I need to let SIP connections migrate to other WANs as well), but I do filter them by src-address because only connections from a particular local subnet use the WAN failover whereas connections from other subnets are only allowed to use a WAN that is used as a backup one in the failover scheme.
The script needs to be run periodically by a scheduler - there is no triggering event you could hook it to.
Thank you very much!

Who is online

Users browsing this forum: Husky, rplant and 61 guests