v6.49.13
I’m trying to setup a site-to-site tunnel. Both ends are currently static, but the remote end will soon be dynamic (Starlink) so that’s how I’m setting it up. I’m using Mode Config on the responding side, and static policy on the initiating side. Phase 2 is successfully established, but I am unable to pass any traffic through. If I ping from one MT to the other either direction the pings are never returned. The Tx packet count in Active Peers increments on the sending side, but the Rx on the receiving side stays 0. I’m sure I’ve just done something wrong, but I can’t figure it out.
First, is my assumption correct that currently both devices have a public IP address on their WAN, so IPsec does not have to encapsulate ESP packets into UDP?
Second, there is nothing in chain input of your firewall filter rules that would permit the Main Office router to accept incoming IKEv2 connections except the action=accept … in-interface-list=WAN src-address-list=admin rule, so I figure the WAN IP address of the Branch Office router is on that list (here, your obfuscation was too intensive so consistency of the information you’ve posted got affected).
So from the symptoms you describe, it appears to me as if the peers have negotiated bare ESP for Phase 2, but some ISP on the path between the twro routers doesn’t let it through, or maybe the public address of the Main Office is not on the admin address list of the Branch Office and you only try pinging from the Main Office via teh tunnel, but that seems unlikely to me.
So once the IPsec connection is established, what does /ip ipsec installed-sa print show - do the src-address and dst-address columns show only IP addresses or also port numbers? If the former, open a command line window on both routers, make it as wide as your screen allows, run /tool sniffer quick ip-protocol=ipsec-esp in both these windows, and try pinging across the tunnel. If my assumption regarding the ISP is correct, you should see ESP packets to leave from the pinging side towards the remote one but never to arrive there.
Once this is clarified and fixed, we can get to the filter rules and also to the issue of a half-configured mode-config.
Yes, both sides are in eachother’s “Admin” list. Which brings up a good point. The branch side will soon be dynamic CGNAT and that won’t work. I added 4500 and 500 (maybe that one’s not necessary since I have NAT traversal set on the office side?) accept rules on the office side, and esp rules on both sides.
It appears that packets leaving the office router are not reliably using the AT&T gateway. I disabled the Starlink interface and everything started working. When I enabled it again it was still working, even after recycling the IKE connection. After a few minutes it was failing again going out Starlink. Is there something else I can do to make sure it goes out the right interface?
BTW, there are no port numbers on the installed-sa list even when it’s working. Will that change automatically when the branch router is behind a NAT?
For now I’ve added mangle rules to mark all ike and esp connections to go through at&t. This of course means that any outgoing ike connections I make in the future will go through the slower backup connection. If you can help me with a better way to do that it would be great. I guess I could script a route from the remote in the active peers list, but that seems a little messy.
Support of NAT traversal is an optional extension in case of IKE (v1); in IKEv2, it is part of the standard so there is no need to explicitly enable it.
When acting as an initiator, Mikrotik sends the very first IKEv2 packet to port 4500, but other initiators may start sending to port 500 even if using IKEv2 (e.g. Microsoft Windows native VPN client), Strongswan for mobile phones is even less intuitive - if you specify a port, it uses it right from the start, if you don’t, it starts talking to port 500 and switches over to port 4500 if/when the NAT is detected.
Indeed. No port numbers mean that bare ESP has been negotiated, hence no NAT must have been detected. If there is NAT anywhere between the peers, ESP gets encapsulated into UDP and multiplexed into the UDP stream created for IKE or IKEv2 communication.
IP (v4) on Starlink means NAT (unless you pay a lot), NAT means no bare ESP, so there is no need to bother about this - with IP, you will only ever see a bare ESP packet if you use a public address at both the Central Office and the remote peer for a given connection.
But as you are going to have Starlink at both sites, I would suggest to set bypass mode on both terminals, as that way you get a static /56 block of global addresses on each (the prefix seems to be linked to the serial number of the terminal, not to its geographical position), so you can set up the IPsec connection using IPv6, whilst the payload can still be IP (v4).
In fact I successfully set my own router to receive IKE connections through my (bypassed) Starlink IPV6 last night. But my understanding is that Starlink does randomly change the prefix. Sometimes every day, other times it may stay for weeks. However, the consensus is that the individual address the router receives which is outside of that prefix is more stable. Unless there’s a way I don’t know about to force MT to connect over IPV6 I guess I will need to use a third party DDNS to set only AAAA record. I would still like a better way to make sure esp traffic goes out the same interface it came in on.
My own experience was that the /56 did not change for months (until the Ethernet adaptor finally broke so it wasn’t possible to use the bypass mode any more without an additional investment).
I don’t remember seeing any other global address except the /56 - does it assign one if you set the DHCP client to request also address, not just prefix? I have nowhere to test right now.
I would say setting the local-address of the peer to an IPv6 one (but not to ::/0) is quite a reliable way to make it choose the AAAA response even if both are available, but it would be kind of a chicken-or-egg issue if its own global address would be changing as well, which apparently would be your case with a Starlink at both ends.
The thing is that the first ESP packet being sent may be carrying some initial request, i.e. not be an indirect response to a previously received ESP packet. But again, if it is a bare ESP, you can be sure it has to be routed via an interface that is not behind a NAT, and if you happen to have multiple interfaces with a public IP, it doesn’t matter much which of them you use as the source one for the ESP, so you can choose by bandwidth.
The other address comes by SLAAC and is the one MT’s DDNS reports.
But in this case (the original question) it is an IPV4 connection. When the ESP attempts to go out the Starlink IPV4 the communication fails. The SA’s correctly go out the AT&T side, so there is no encapsulation. But the subsequent packets attempt to go naked out the Starlink side, so they fail. (I’m really not sure why this happens. I would think they would go out where the SA is established.) So it does matter. Unless I can force ESP encapsulation… Which I have now done by setting the local address on the remote peer to in internal one, so it is NATed. That is an acceptable solution. Though not as elegant as pure ESP, it’s more elegant than scripting the mangle rules. When the remote switches to Starlink I will move it to IPV6 between Starlinks.
Thank you for your help. You definitely pointed me in the right direction.
Ah… as you have to force Mikrotik to use SLAAC when configured as a router, I never went that way.
You mean IKE/IKEv2 here. Both Phase 1 (IKE, IKEv2) and Phase 2 use SAs.
We actually both talk about the same, except that I do not consider mangle rules something that should be avoided What I am saying is “if it is bare ESP, prevent it from using a NATed uplink, no matter what the destination address is”, and for me, mangle rules are a perfectly legitimate tool to do so.
ESP is treated independently. So unless you specify a local-address for the peer, the source address of the ESP packet is chosen after routing, just like with any other traffic originated by the router that is not a response to a previously received packet. And even if you do specify the local-address, the routing still does not automatically choose the gateway that is reachable via the interface to which that address is attached, you have to use mangle rules or routing rules for that.
Yup. I had to do this in scenarios where the ISP was blocking ESP or in early days of IKEv2 on Mikrotik when NAT detection would fail so one end would think there was NAT and the other end would think there isn’t. Maybe it is not “more elegant” than mangle rules but definitely it’s less typing/clicking.