I have created 1 GRE tunnel with IPSec between 2 sites (through Internet), one day it stoped showing the “R” letter at the left of the interface (in both sites) and stopped working the VPN at all, I didnt know what to do, so I disabled the VPN interface in both routers (mikrotiks), when I enabled again the interfaces nothing happened, after disable both interfaces for more time and re-enable them the tunnel started to run.
Because of that I disabled the VPN interface in both sides and created a L2TP Server with IPSec in site 1 and created one L2TP client in the other site (I created one more l2tp client in another site but it does not matter), it worked fine until Internet failed during 1 second in the client side (I think that was the problem)
So I decided to find a way to solve the issue because I thought that will happen again, I found that the tunnel goes up again after flush SAs (/ip ipsec installed-sa flush) and I found a script to flush the SAs
My knowledge about L2TP and IPSec is por, sorry
Is there a way to avoid that the VPN stop working on that cases?
If not, the only way is to flush SAs? Is it possible to flush only on SA?
If you can confirm that you use exchange-mode=ike2, you have most likely encountered a known bug which has been fixed in 6.44beta6 (I didn’t have a possibility to check that yet as it requires a several days long test given how randomly the issue occurs). The workaround should be to set pfs-group to none but it lowers security.
To flush a single SA (or, better to say, a pair of SAs), it is enough to disable and re-enable the policy responsible for establishing these SAs.
Not sure about the Exchange-mode, in /ip ipsec peer, the only peer I have appears grayed out as “main L2TP”
Could be solved upgrading Routeros? Current version in Server is 6.42.6
About the policies, I have just 1 policy for both users, if I disable and re-enable it I think both connections will restart. Is there another way? Is better to use 1 different policy for each user?
That sounds like you haven’t configured IPsec manually but have ticked the “use ipsec” checkbox in L2TP (or GRE before) configuration. Is that the case? The auto-generated IPsec configuration never uses exchange-mode=ike2.
There cannot be a single policy for several users. Either you have a single policy template and an individual policy generated from it for each user, or you really have a single generated policy which most likely means that you connect both users from behind the same public IP address.
To avoid confusion, please export and post your configurations from all three devices following the guidelines in my automatic signature, and post also the output of /ip ipsec remote-peer print
/ip ipsec policy print
/ip ipsec installed-sa print
while both users are logged in. Obfuscate also the output of these commands systematically (i.e. all occurrences of the same IP address have to be replaced with the same meaningful string to maintain the integrity of information).
A “connection drop-out” may include some firewalls between the IPsec peers to forget the connections, and if the peer “protected” by such firewall only listens and doesn’t actively send anything, the firewall doesn’t let the packets in the active direction in. By flushing the installed-sa or disabling/enabling the policy you trigger the renegotiation, but it again means that you have to do that at the “protected” peer in order that it would create a pinhole in the firewall.
For each of the three devices it is important to know whether it runs a public IP address on itself or whether it is behind some NAT device.
Yes, this is the case, I used the default settings of IPSec, I thought this will work. Is it better to use allways Exchange-mode=ike2?
Sorry, I see again and found 1 policy for each connected device
I will copy the export below
There are 3 sites, which I will call Site1 (Server side), Site2 and Site3, the 3 sites have only 1 mikrotik router and those routers have the server and client tunnels
Site1 (Server) and Site2 (Client) routers have directly a public IP on their WAN Interfaces, Site3 has problem with the ISP device, it cant be set to bridge mode, I am still talking with the ISP trying to get the public IP on the router, anyway, so far, the VPN is still working in both client sites
Thanks in advance.
Regards.
Site1
/ppp profile
add change-tcp-mss=yes dns-server=192.168.50.1,8.8.8.8 local-address=192.168.50.1 name=L2TP remote-address="Pool L2TP"
/ppp secret
add name=Site2 password=QSite101 profile=L2TP
add name=Site3 password=PSite101 profile=L2TP
/interface l2tp-server
add name="L2TP Site3" user=Site3
add name="L2TP Site2" user=Site2
/interface l2tp-server server
set authentication=mschap1,mschap2 default-profile=L2TP enabled=yes ipsec-secret=Password01 max-mru=1460 max-mtu=1460 use-ipsec=required
/ip ipsec remote-peer print
Flags: R - responder, N - natt-peer
# ID STATE REMOTE-ADDRESS DYNAMIC-ADDRESS UPTIME
0 RN established Site3PublicIP 7h38m15s
1 R established Site2PublicIP
/ip ipsec policy print
Flags: T - template, X - disabled, D - dynamic, I - invalid, A - active, * - default
0 T * group=default src-address=::/0 dst-address=::/0 protocol=all proposal=default template=yes
1 DA src-address=Site1PublicIP/32 src-port=1701 dst-address=Site2PublicIP/32 dst-port=1701 protocol=udp action=encrypt level=unique ipsec-protocols=esp tunnel=no proposal=default ph2-count=2
2 DA src-address=Site1PublicIP/32 src-port=1701 dst-address=Site3PublicIP/32 dst-port=1701 protocol=udp action=encrypt level=unique ipsec-protocols=esp tunnel=no proposal=default ph2-count=2
/ip ipsec installed-sa print
Flags: H - hw-aead, A - AH, E - ESP
0 E spi=0x22154A6 src-address=Site2PublicIP dst-address=Site1PublicIP state=dying auth-algorithm=sha1 enc-algorithm=aes-cbc enc-key-size=256 auth-key="423ef300d81817eb83b7545a2b011e8343352112"
enc-key="91ef0c56e3d1d103af52d124a64f75a22bc51214c4715651f0f0e549fe092a58" addtime=sep/18/2018 09:21:41 expires-in=3m50s add-lifetime=24m/30m current-bytes=2400 current-packets=96 replay=128
1 E spi=0x25499B src-address=Site1PublicIP dst-address=Site2PublicIP state=dying auth-algorithm=sha1 enc-algorithm=aes-cbc enc-key-size=256 auth-key="adb7bdcd1575185a1fd3a8c2a91aebca485f825f"
enc-key="77c892fd9ede6190bccfaa496e4561bd2435602649e61b08d7994520876bd25f" addtime=sep/18/2018 09:21:41 expires-in=3m50s add-lifetime=24m/30m current-bytes=2400 current-packets=96 replay=128
2 E spi=0xFCCDEF2 src-address=Site3PublicIP:4500 dst-address=Site1PublicIP:4500 state=mature auth-algorithm=sha1 enc-algorithm=aes-cbc enc-key-size=256 auth-key="d30507c28c5055fe33a8b922bbe1554ee1934ea4"
enc-key="9d1cc7b798cf90b06e9840f83317bea0ac6c5050de41fd8d1b4d54c9aee4ad69" addtime=sep/18/2018 09:41:00 expires-in=23m9s add-lifetime=24m/30m current-bytes=626 current-packets=25 replay=128
3 E spi=0xC15264B src-address=Site1PublicIP:4500 dst-address=Site3PublicIP:4500 state=mature auth-algorithm=sha1 enc-algorithm=aes-cbc enc-key-size=256 auth-key="6ab1eb99f8f09974027293dfadc39592d4a470fb"
enc-key="8d5f1f332ef195e20b5ff8d09da7e78d0343091a2d72b8bd3c5fd649647874bf" addtime=sep/18/2018 09:41:00 expires-in=23m9s add-lifetime=24m/30m current-bytes=626 current-packets=25 replay=128
4 E spi=0xD1CD003 src-address=Site2PublicIP dst-address=Site1PublicIP state=mature auth-algorithm=sha1 enc-algorithm=aes-cbc enc-key-size=256 auth-key="2592dda505b3795a871cda3859cbece46244348f"
enc-key="fa00a3fd526aa77b251e29834831aba389bd6ab480d6a123af6f69cdcd2edff7" addtime=sep/18/2018 09:45:41 expires-in=27m50s add-lifetime=24m/30m current-bytes=274 current-packets=11 replay=128
5 E spi=0xF47625E src-address=Site1PublicIP dst-address=Site2PublicIP state=mature auth-algorithm=sha1 enc-algorithm=aes-cbc enc-key-size=256 auth-key="6b5a468bd9c402acdba7f5be795ecbecab1620eb"
enc-key="26f2aec2c24e63bfdcdbcfbf3d5572a0cb592a2b11ce0398f928046fee4d0bf7" addtime=sep/18/2018 09:45:41 expires-in=27m50s add-lifetime=24m/30m current-bytes=274 current-packets=11 replay=128
Well, for site to site configurations most likely yes as IKEv2 should be a better choice than IKE(v1); however, at least older client implementations on Windows do not attempt to establish IKEv2 sessions for L2TP transport, and you if you want to use IKEv2, you have to configure the IPsec layer manually which needs more understanding than when you let RouterOS do the job for you.
So I’ve mentioned IKEv2 only as one of possible explanations why you’ve encountered a service breakdown without a configuration change, as that very bug was relevant to IKEv2 alone.
Look yet another time - at the central site, there are two dynamically created policies, one per each client (user), exactly as it should be. As the two differ from each other in sa-dst-address, there is no conflict and your issue doesn’t come from such conflict.
The fact that an L2TP/IPsec client is behind a NAT only causes trouble if two clients of the same server would be behind the same NAT IP. So it doesn’t break anything in your case.
All in all, I cannot see anything in the configuration what would explain why after a recovery from a disruption on the internet path between the client and the server the connections should not re-establish. It may take some time before the peers notice that they cannot see each other after the disruption (the dynamically created IPsec peer has dpd-interval=2m and dpd-maximum-failures=5, so it takes up to 12 minutes before the connection is considered dead at IPsec level, I don’t know how many unresponded L2TP keepalive messages are tolerated but these are sent every minute at two different levels).
For the GRE you used before, by default the keepalive messages are sent every 10 seconds and 10 must fail for the tunnel to be considered down.
Unlike Windows and Android, RouterOS doesn’t need to be kicked to attempt re-connection if an IPsec or L2TP or GRE connection breaks down - during and after a network failure, it keeps on trying to re-establish the tunnel until you disable/unconfigure the client.
Are you able to replicate the issue somehow? I.e. is there a known sequence of steps you have to take to make it happen again? If so, do the following on both machines:
Then, replicate the issue and let the fate do its job for 35 minutes. If the tunnel does not re-establish even after that time, do what you have to do so that you could access both devices (preferably, nothing) and then issue /system script job remove $printjob. Then, do whatever is necessary to re-establish the tunnel and download the two log files from the devices, and see what the logs tell you about what has happened. If they tell you nothing, obfuscate the public IPs and possibly the l2tp usernames and publish the logs.
Look yet another time - at the central site, there are two dynamically created policies, one per each client (user), exactly as it should be. As the two differ from each other in sa-dst-address, there is no conflict and your issue doesn’t come from such conflict.
I know, I tried to explain that I finally saw both policies
Are you able to replicate the issue somehow? I.e. is there a known sequence of steps you have to take to make it happen again?
I dont know, maybe disconnecting the Internet connection cable happen again, I could give a try later, maybe tomorrow, I could call someone on the site and ask him to do that
Other case I will need to wait until the issue happen again but maybe where I can see the log it is too late and the events disapear.
I forgot to mention the following, I configured a time ago, in other company, some GRE tunnels between the headcuarters and many different sites, I had the same problem in all of them and finally left the tunnels without IPSEC working fine.
I dont know whether it is related to
The Argentinians ISPs
The settings I make in tunnels. When I create a GRE tunnel, the only thing that I change in the interface are local and remote IPs, and “IPSEC Secret”, also I configure IPs and Routes but these are not involved
IPSec between mikrotiks with the default settings
More than one of previous ítems
Does “main l2tp” Exchange mode works fine usually?
Which IPSec settings do you recommend to configure instead of default?
main-l2tp should only be used for l2tp. Anywhere else I prefer ike2, it should be safer (until someone discovers some vulnerability of the protocol or the Mikrotik implementation of it). I assume the IPsec peer created dynamically if you set ipsec-secret for GRE or IPIP or EoIP tunnel has exchange-mode set to main, doesn’t it?
As you’ve found out yourself, there are several possible explanations why the IPsec tunnels carrying your GRE tunnels fail, but in your specific case I think none of them is relevant as they only seem to fail to recover from a disruption, they never failed to set up initially or to continue running if no external cause broke them. So on top of my previous suggestion regarding logging, I can only suggest you to switch to exchange-mode=ike2 (which requires manual configuration so the easiest way is to copy the dynamically created peer and policy into static ones with disabled=yes, then remove the ipsec-secret value from the /interface gre configuration, and then modify the exchange-mode on the peers at both devices to ike2 and enable them and the policies). But only do that after upgrading to 6.43.1 as in all previous versions the IKEv2 had its own issues now known.
The idea behind is that if there is some issue in the way how IKE(v1) handles the recovery, the IKEv2 which is mostly an independent algorithm is quite likely not to contain the same bug.
But if with these settings you’ll encounter the same effects (disruption on the path between the devices to kill the tunnel until a manual intervention), you’ll have to revert to the logging anyway. So if you can use an external syslog server on both sites, it may be best to engage it; if you cannot, keep the howto from the post above handy at both sites in Spanish so that a local assistant can follow the steps when the connection goes down and doesn’t recover after a disruption.
Sorry, I want to ask one more question related to this.
In another Enterprise there are many GRE tunnels with ipsec (The only ipsec settings modified is the “ipsec secret” in the gre interface settings)
How can I flush an specific SA in there? I have dinamyc policies that I can not disable
Unfortunately you cannot. You can only flush all installed-sa, no way to flush only a particular one (or rather a matching pair thereof, which may be the reason). And no way to disable a dynamically created policy as you’ve found yourself.