Sun Nov 03, 2019 8:22 pm
OK, thanks for highlighting this. It never came to my mind to look this deep (as I normally deal with application layer keepalives where such tricks are technically impossible).
So actually the GRE keepalive request packet as such uses a normal combination of source and destination address, i.e. the same address tuple as a regular GRE packet carrying a payload. However, it carries a pre-cooked response to itself as its payload, in order that at the responding side, the regular processing of received GRE packets would be sufficient for the keepalive loop path to work. And this is what causes the trouble with the firewall - whereas the keepalive request is originated by the sending router itself, hence it is handled by the firewall chain out, at the responding router, the response is decapsulated from the received request just like any other payload. So the responding router does not originate the response but forwards it, the in-interface being the local end of the tunnel; nevertheless, the source address of the response is one of the own ones of that router. So it is not the firewall of the router where the keepalive functionality is activated but the firewall of the router at the keepalive responding end of the tunnel which causes the keepalive responses to never return to the sender of the requests if it isn't set to permit forwarding of such "weird" packets.
Regarding GRE keepalives being useless if GRE is encrypted using IPsec, I have to disagree - it's fine that we have the IPsec keepalives, but if the IPsec SA goes down, the GRE layer learns nothing about it, there is no alarm propagation from the IPsec layer to the GRE layer. So without GRE keepalives, a GRE interface won't go down if the GRE packets cannot reach the remote router due to the outage of their carrying IPsec SA.
So where we do need to take an action when a GRE tunnel stops working, the use of GRE keepalives is necessary; to allow them to work, forwarding of GRE packets which arrive via GRE interfaces has to be permitted at the responding end. In RouterOS, this used to be handled by the chain=forward action=accept connection-state=established,related,untracked firewall rule which doesn't care about interfaces, but as of now (6.45.7 while writing this), GRE handling by connection tracking is still totally broken after the fix addressing the GRE-related CVE (all new GRE packets seem to be marked as connection-state=invalid, and often but not always GRE exchanges never become tracked), so a dedicated rule needs to be used to permit the keepalive responses to be delivered.
If the remote end is not a RouterOS one, the same situation must be addressed using appropriate means.