I need to connect SiteB to SiteA via IPsec tunnel through Internet. SiteB has dual WAN connection. All the interfaces were configured and IPsec tunnel is up through Modem1 (say ISP1). When I turn off Modem1 (emulation of a link failure) the tunnel is established via second link - Modem2 (ISP2).
Mikrotik has 2 default routes:
0.0.0.0/0 192.168.2.1
0.0.0.0/0 192.168.3.1 (disabled)
Netwatch monitors connection to 8.8.8.8 (icmp every 5 sec)
If the connection is down (ping to 8.8.8.8 is disabled through ISP2):
/ip route set [find comment="ISP1"] disabled=no
/ip route set [find comment="ISP2"] disabled=yes
/ip ipsec policy set [find comment="PolicyMain"] disabled=no
/ip ipsec policy set [find comment="PolicyBackup"] disabled=yes
If the connection is up:
/ip route set [find comment="ISP1"] disabled=yes
/ip route set [find comment="ISP2"] disabled=no
/ip ipsec policy set [find comment="PolicyMain"] disabled=yes
/ip ipsec policy set [find comment="PolicyBackup"] disabled=no
When I turn off Modem1 (emulation of a link failure) the tunnel is established via second link - Modem2 (ISP2). All is good. But when I turn it on again the tunnel is down (though the internet connection is up). Where the problem could be?
Additional info:
Peering router is Cisco 2921. It has peer config like on the picture and keepalive set:
But do you see your up script executed? Cause if not, then it’s the netwatch problem.
I suspect that netwatch may be not working as expected since the default route changes. How do you make sure that your netwatch ping is always using the 192.168.2.1 route? Probably you want to create a separate routing rule for that.
like this:
ip route add dst-address=0.0.0.0 gateway=192.168.2.1
ip route add dst-address=0.0.0.0 gateway=192.168.3.1 routing-mark=backup
then
ip route rule add action=lookup table=backup src-address=!10.10.2.0/24 dst-address=!8.8.8.8 disabled=yes
and in your scripts you will need to enable/disable this rule instead of enabling/disabling the routes
This will give you opportunity to only use the default route (isp1) for netwatch ping, but at a cost that 8.8.8.8 will not be acceccible from the router itself during the ISP1 downtime.
You could make it more sophisticated by creating a mangle output rule to only match icmp packets to 8.8.8.8 and mark them with “main” routing mark
OR you could create a mangle prerouting rule to only set the “backup” routing mark for the forwarded packets, at a cost of not having internet from the router itself (i.e for local processes on the router the default route would be always ISP1, also during the downtime). In this case you don’t need the ip route rule, you will be disabling/enabling this mangle rule.
look at this, another case of reinventing the wheel instead of fixing the IPSec policy redundancy problem! Here is the non exhaustive list of cases I have encountered recently when stumbled upon the same issue
yes, the script is executed: internet connection is always up (either through Modem1, or Modem2 - in case the first one is down). IPsec tunnel is also established. Policy changes (dependent of Link1 state).
What to say about ping 8.8.8.8. Netwatch can perform this operation only when Modem1 works. Otherwise the traffic is blocked by firewall (yeah, quite simple, but functionally similar to your suggestion).
I’ve looked through the pages you’d provided. As far as I understand you point to an issue dated 2011. I can’t believe that problem hasn’t been fixed since that time This config seems so simple…
Seems that rebuilding the tunnel from the scratch when Netwatch triggers is the solution.
But the story is:
I looked for events which happened when Netwatch ran. I noticed that when the outgoing interface was changed by the Netwatch there was old Remote Peer left (in IPsec configuration). It was not permanent but “most cases” situation. It was the cause by which the tunnel could not be changed to backup link and back again after the primary link finally recovered.
I added the only string to the end of both Up and Down event scripts (see above) in Netwatch:
/ip ipsec remote-peers kill-connections
That means that every time we switch to the backup tunnel or to the primary tunnel we always reestablish the connection. Maybe that’s only a workaround. I’ll be glad to hear another way to solve this problem.
Note:
First I tested that command on cellular networks (both ISP) and there was Netwatch flapping while the timeout was too small - 1000 ms. I set it to 3000 ms and flapping stopped.