NAT-T flag missing in 6.45.3

I found a strange problem with the last releases. The initiator side of an IPsec association is not showing NAT-T flag, while the responder does. Also there is a blackhole between 1406-1422 bytes size in the intiator side. This is happening at least in 6.45.1-3, currently both sides run 6.45.3. How to see the problem:

# Initiator, NATT not shown
[admin@MTClient] > :ip ipsec active-peers print                
Flags: R - responder, N - natt-peer 
 #    ID                   STATE              UPTIME          PH2-TOTAL REMOTE-ADDRESS                            DYNAMIC-ADDRESS  
 0                         established        1h14m39s                6 83.44.NN.NNN                              192.168.90.253   
[admin@MTClient] > :ping 192.168.90.1 size=1406 do-not-fragment count=1
  SEQ HOST                                     SIZE TTL TIME  STATUS                                                               
    0 192.168.90.1                             1406  64 28ms 
    sent=1 received=1 packet-loss=0% min-rtt=28ms avg-rtt=28ms max-rtt=28ms 
[admin@MTClient] > :ping 192.168.90.1 size=1407 do-not-fragment count=1 
  SEQ HOST                                     SIZE TTL TIME  STATUS                                                               
    0 192.168.90.1                                            timeout                                                              
    sent=1 received=0 packet-loss=100% 
[admin@MTClient] > :ping 192.168.90.1 size=1422 do-not-fragment count=1  
  SEQ HOST                                     SIZE TTL TIME  STATUS                                                               
    0 192.168.90.1                                            timeout                                                              
    sent=1 received=0 packet-loss=100% 
[admin@MTClient] > :ping 192.168.90.1 size=1423 do-not-fragment count=1 
  SEQ HOST                                     SIZE TTL TIME  STATUS                                                               
    0                                                         packet too large and cannot be fragmented                            
    sent=1 received=0 packet-loss=100% 
# Responder, NATT shown
[admin@MTServer] > :ip ipsec active-peers print where id=client
Flags: R - responder, N - natt-peer 
 #    ID                   STATE              UPTIME          PH2-TOTAL REMOTE-ADDRESS                            DYNAMIC-ADDRESS  
 0 RN client               established        1h13m56s                6 89.35.NNN.NNN                             192.168.90.253   
 [admin@MTServer] > :ping 192.168.90.253 size=1406 do-not-fragment count=1  
  SEQ HOST                                     SIZE TTL TIME  STATUS                                                               
    0 192.168.90.253                           1406  64 27ms 
    sent=1 received=1 packet-loss=0% min-rtt=27ms avg-rtt=27ms max-rtt=27ms 
[admin@MTServer] > :ping 192.168.90.253 size=1407 do-not-fragment count=1 
  SEQ HOST                                     SIZE TTL TIME  STATUS                                                               
    0                                                         packet too large and cannot be fragmented                            
    sent=1 received=0 packet-loss=100%

My configuration seems to be straightforward enough:

[admin@MTClient] > /ip ipsec export hide-sensitive 
# aug/03/2019 09:43:42 by RouterOS 6.45.3
#
# model = RBD52G-5HacD2HnD
/ip ipsec peer
add address=mt.server name=server send-initial-contact=no
/ip ipsec identity
add auth-method=pre-shared-key-xauth generate-policy=port-strict mode-config=request-only my-id=fqdn:shad.mine peer=server \
    remote-id=fqdn:mt.server username=client

[admin@MTServer] > /ip ipsec export hide-sensitive 
# aug/03/2019 10:43:03 by RouterOS 6.45.3
#
# model = RouterBOARD 962UiGS-5HacT2HnT
/ip ipsec mode-config
add address-pool=vpn2 name=RW-cfg split-include=192.168.88.0/24,192.168.90.0/24
/ip ipsec policy group
add name=RoadWarrior
/ip ipsec profile
add name=rw
/ip ipsec peer
add name=RoadWarrior passive=yes profile=rw
/ip ipsec identity
add auth-method=pre-shared-key-xauth generate-policy=port-strict mode-config=RW-cfg peer=RoadWarrior policy-template-group=\
    RoadWarrior username=client
/ip ipsec policy
add dst-address=192.168.90.0/24 group=RoadWarrior src-address=192.168.88.0/24 template=yes
add dst-address=192.168.90.0/24 group=RoadWarrior src-address=192.168.90.0/24 template=yes

The blackhole is making TCP connections impossible unless I trim the MTU in the initiator side. I’d say that this was not happening pre-6.45, but it is hard to remember if I tried to do tcp connections using IPsec this way while running previous releases.

I have an IPSec link between two devices on 6.44.5 and the ping works at 1422 and fails at 1423.

This is normal if the failure is with an error about “packet too large and cannot be fragmented”, not normal if it fails after a timeout. In my case, from the server(responder) to the client(initiator) it fails like in your case; from the initator is works until 1406, fails with a timeout from 1407-1422, and fails with the normal error from 1423 onwards. This is the blackhole

I was just confirming that I don’t get the black hole in either direction with 6.44.5
1423 does generate “packet too large”.

Are you seeing the “N” (NAT Traversal) flag in both sides when you ask for the active peers? I see it only in the responder and it should be in both sides.

I’m not using NAT Traversal. Active-peers doesn’t exist the same way in 6.44 due to all the changes between 6.44 and 6.45.

NAT-Traversal is not something you “use”. NAT Traversal is a technique used when the ipsec-esp protocol cannot establish a connection between two peers; it then encapsulates the ESP packets in UDP packets and sends them via UDP port 4500. This encapsulation makes for easier NAT traversal, as typically UDP packets are well handled by NAT gateways. But at the same time it reduces the MTU of the IPsec payload, as the UDP header takes 8 bytes… So, if (using for instance “/ip firewall connections” you see ipsec-esp connections you are using “plain” IPsec; if you see UDP port 4500 connections you are doing NAT-T without knowing it.

What I don’t understand in my case is that the blackhole is 16 bytes instead of 8 (from 1406-1422).

I confirm that 6.45.3 initiator doesn’t show the N flag in print (I’ve got no 6.45.3 responder up at the moment) - exchange-mode=ike2 on peer, auth-method=eap on identity. An initiator in 6.45.2 (exchange-mode=ike2, auth-method=digital-signature) does show it.

ESP cannot be handled even by properly configured NATs because it has no notion of ports and because the SPI is not the same in both directions, so there is no information available in it which would allow to decide to which private IP to forward an ESP packet which has arrived to the NAT’s public one.

You don’t need to see the /ip firewall connection list to find our whether the peers have detected NAT, the /ip ipsec installed-sa print shows that - if the dst-address and src-address items show ports after IP addresses, they run in NAT-T mode, otherwise in ESP mode. So if the initiator didn’t detect the NAT, it simply wouldn’t work at all. I did have such cases in the past where the initiator was behind NAT and the responder was running on a public IP, one of the peers did detect the NAT and the opposite one didn’t - I’ve never discovered the conditions under which this happens, but forcing NAT even on the responders running on public IPs always helped.

The blackhole is most likely related to incorrect calculation of the transport packet size. Two elements are variable - the size of the authentication tail which depends on the algorithm used, and the presence or absence of the UDP header before the ESP which depends on whether NAT-T mode is used or not. So I second your way of thinking, that the wrong indication and the “MTU gap” are both caused by a bug in generation of the NAT-T state flag, but there may be another bug in calculation of the auth tail size. I’ve also seen this to happen in the past for some auth algorithms.