Simple issue, hard to find the real cause
So I have this plain IPsec tunnel running between 2 Mikrotik routers A and B. A has 1 subnet behind NAT and B has 2 subnets behind NAT.
There is no traffic over night. When a host behind router A contacts a host behind router B in the morning, it does not seem to work.
The other way around B —> A works immediately and kick starts the tunnel again (so everything works perfectly again). Does this sound like a bug or am I missing some essential thing here?
IPsec adapts to the network path between the peers. If there is no NAT between them, it uses ESP (or AH, but that’s not your case as you talk about a subnet to subnet tunnel) protocol as transport one; if there is NAT on at least one side, it encapsulates the ESP into UDP in order to traverse the NAT(s).
Unlike other protocol suites where the control and data transport sessions use distinct flows (such as SIP, FTP, PPTP), in case of IPsec using ESP, the connection tracking in firewall doesn’t treat the ESP as connection-state=related to the control connection between UDP ports 500 or 4500 (depending on IKE version). So if there is no communication and the ESP pinhole times out (by default in 10 minutes since the last packet seen in any direction), and if in the input chain of the firewall filter doesn’t accept a new incoming ESP flow from the remote peer, the tunnel seems to be down as the request comes encapsulated in the ESP, the firewall doesn’t let it in, so there is no response, so there is no ESP response, so the pinhole cannot open.
But unless both Mikrotiks in question have a public IP on their WAN, the explanation above is not relevant. So provide more details if that is the case, at best both configurations and information what kind of devices are on the routers between the WANs of the Mikrotiks and the internet, and what is the configuration of those devices. See anonymisation hints in my automatic signature right below.
Hi Sindy
I see you answering a lot of IPsec questions and, once again, you are right on the money. I assumed wrongly I did not have to allow ESP (proto 50)@input chain because it seemed to work without.
By sheer chance, it was allowed on router A for an unrelated tunnel. But it was not allowed on router B.
So at some point I changed one of the peers, router B, to be passive (ergo, it should not initate the connection) and as I just said, B was not allowing ESP@input chain. So your excellent explanation describes what happens in such case: when ESP packects go from B → A no issue as router A allows ESP inbound. Moreover, router A initiates the tunnel so it would work (for a finite amount of time at least) even without allowing ESP inbound. However, in absense of an active B → A flow, ESP-flows from A → B get dropped: this happened when hosts behind A woke up and wanted to contact hosts behind B. Allowing ESP inbound on router B fixed the issue.
Full disclosure, tunnel IPsec endpoints have public IP’s. The hosts are behind a NAT. So I am thinking IPsec transport mode would be better suited but I have yet to figure out how to configure it.
Nothing left to thank you for you expert observation.
It wouldn’t. Even in transport mode, you’d have suffered from the same issue. Transport mode only means more efficient use of the transport packet capacity, but it still uses ESP or AH as the transport protocol. It can only be used when the source and destination IP addresses of the transport packet and the payload packet are identical, so there is no need to deliver the IP header of the payload packet as part of the transported data. In all other cases, tunnel mode is required.
So you can use transport mode to transport the transport packets of some other tunnel (GRE, IPIP) to be able to use regular routing instead of ipsec policy matching, but not to get rid of ESP.