Redundant IPsec tunnel - second tunnel cannot connect - a bug?

Hello,

I hope you are well.

I have a rather simple redundant IPsec tunnel setup to one of the major cloud providers (which implies it likely works on their side). You setup one tunnel on their side and they provide two endpoints for redundancy with different secrets. That is to say, there is nothing than you can change for one endpoint different from the other except secret.

So unless there are differences on your side between the endpoints, it should just work. But it does not.

Both endpoints can never connect at the same time. One connects flawlessly, and then the log is filled every 10 seconds with messages about the other one

ipsec,error failed to get proposal from first template
ipsec,info killing ike2 SA: peer2 [my ip][4500]-[address2][4500] [hash1]:[hash2]

What is interesting is that it does not matter which endpoint is the other one. If you disable the first endpoint, second connects flawlessly, then you reenable the first one - and now it produces these messages. You then disable the second endpoint, first endpoint immediately connects flawlessly, you reenable the second one - and now it produces these messages. In other words both of them work, just not at the same time.


My config relevant to IPsec is as follows (RouterOS 7.18.1)

/ip ipsec profile add dh-group=ecp384 enc-algorithm=aes-256 hash-algorithm=sha384 lifetime=8h name=profile-phase1 prf-algorithm=auto
/ip ipsec peer add address=[address1] exchange-mode=ike2 name=peer1 profile=profile-phase1
/ip ipsec peer add address=[address2] exchange-mode=ike2 name=peer2 profile=profile-phase1
/ip ipsec proposal set [ find default=yes ]                    disabled=yes
/ip ipsec proposal add                      auth-algorithms=""              enc-algorithms=aes-256-gcm lifetime=1h name=proposal-phase2 pfs-group=modp1536
/ip ipsec identity add auth-method=pre-shared-key peer=peer1 secret=[secret1]
/ip ipsec identity add auth-method=pre-shared-key peer=peer2 secret=[secret2]
/ip ipsec policy set 0                disabled=yes
/ip ipsec policy add   action=encrypt              dst-address=10.0.0.0/24 ipsec-protocols=esp level=require peer=peer1 proposal=proposal-phase2 src-address=192.168.0.0/24 tunnel=yes
/ip ipsec policy add   action=encrypt              dst-address=10.0.0.0/24 ipsec-protocols=esp level=require peer=peer2 proposal=proposal-phase2 src-address=192.168.0.0/24 tunnel=yes
/ip firewall address-list add address=[address1] comment=peer1 list=tunnel
/ip firewall address-list add address=[address2] comment=peer2 list=tunnel
/ip firewall filter add action=accept chain=input comment=established,related connection-state=established,related
/ip firewall filter add action=drop   chain=input comment=invalid             connection-state=invalid
/ip firewall filter add action=accept chain=input comment="wan new IPsec AH"  connection-state=new                 in-interface=wan1 protocol=ipsec-ah  src-address-list=tunnel
/ip firewall filter add action=accept chain=input comment="wan new IPsec ESP" connection-state=new                 in-interface=wan1 protocol=ipsec-esp src-address-list=tunnel
/ip firewall nat add action=masquerade chain=srcnat ipsec-policy=out,none out-interface=wan1

I might be missing something, however, I’ve examined the config several times over and for the life of me cannot see anything wrong with it. What nudges me to believe this is a bug is that I found a similar problem reported by someone else a year ago on Reddit (the OP reports different problem with similar config when on virtual machine but in the comment they implemented the config on Mikrotik hardware and got the same errors I get).

Also, the log messages imply that ROS is trying to access the disabled default proposal template instead of the explicitly stated proposal-phase2 entry… which does not sound good as well.


Could you kindly help me understand if I’m missing anything, please?

P.S. While debugging, I tried to create a separate profile profile-phase1-peer2 and proposal proposal-phase2-peer2 just for the second peer, and reassign the second peer/policy to them, hoping it would lead to them not trying to reach the disabled proposal template. Nothing changed.

The second tunnel is establishing from the same IP as the first on the default setup there is no way for the tik to be able to work out which tunnel is which.

You either need to set remote id fields in your setup
OR
use a different port or IP for the second tunnel if you want to stay with default setup.

By design, bare IPsec does not permit two distinct policies with identical traffic selectors to be bound to two distinct peers. But the solution here should be to use just a single policy and bind it to both peers: peer=peer1,peer2 .

With this setup, the router establishes Phase 1 to both remote peers, but only establishes Phase 2 to the first one on the list. If Phase 1 to that peer fails later on, Phase 2 gets established to the second one on the list. If Phase 1 to the first peer recovers, Phase 2 stays established with the second one and only establishes with the first one again if Phase 1 to the second one eventually fails.

Of course, DPD must be enabled, otherwise the Phase 1 failure would not be noticed until the next rekey of Phase 2.

Whether this works with your major cloud provider has to be tested.

Thanks for the advice, I did not know policy accepts multiple peers. Notwithstanding DPD discussion, I’ve removed second policy and changed the first to both peers, and logs are filled with the same errors every 10 seconds

/ip ipsec policy add action=encrypt dst-address=10.0.0.0/24 ipsec-protocols=esp level=require peer=peer1,peer2 proposal=proposal-phase2 src-address=192.168.0.0/24 tunnel=yes

ipsec,info new ike2 SA (I): peer2 [myip][4500]-[address2][4500] [hash1]:[hash2]
ipsec,error failed to get proposal from first template 
ipsec,info killing ike2 SA: peer2 [myip][4500]-[address2][4500] [hash1]:[hash2]

Is this expected behaviour?

Tunnel works (as the first peer is connected), I haven’t checked whether failover works. That being said, should it be trying to establish phase2 (is it establishing phase2?) for peer2 every 10 seconds?

JFYI, major cloud provider is Oracle https://www.oracle.com/cloud/networking/site-to-site-vpn/
However, providing two tunnels for redunancy is quite common. AWS https://docs.aws.amazon.com/vpn/latest/s2svpn/VPC_VPN.html does the same.


Thanks for the advice. Unfortunately, neither is supported by the remote, AFAICan tell.

The Mikrotik approach is based on an assumption that the remote peers passively wait for a Phase 2 establishment attempt from the initiator side because a statically configured IPsec policy overrides regular routing, so if it is configured on both peer1 and peer2, they cannot forward the traffic for your site to each other. Since the “unsuccessful” peers keeps trying to establish Phase 2, you won’t be able to get rid of those log messages.


That would be my next step. I would add a chain=output dst-address=peer1 protocol=udp dst-port=500,4500 action=drop rule on the initiator while peer1 is up to see what happens.

1 Like