GRE over IPSec tunnel - unusable on RB4011 above 7.15.3

Hello,

Any RouterOS version above 7.15.3 (7.16.x and 7.17) sees critical performance issues. Affected are all TCP streams, most noticeable is performance drop while attempting SMB file operations. Wireskark sees multiple duplicated ACKs and retransmissions. Config (working on 7.15.3), GRE addressing not redacted.

/interface/gre/export
/interface gre
add allow-fast-path=no local-address=172.16.64.21 name=gre-mdggdn remote-address=172.16.64.20
add allow-fast-path=no local-address=172.16.64.23 name=gre-mdgwaw remote-address=172.16.64.22

/ip/ipsec/export
/ip ipsec peer
add address=xxxx/32 exchange-mode=ike2 name=node-a send-initial-contact=no
add address=xxxx/32 exchange-mode=ike2 name=node-b
/ip ipsec policy group
add name=common
/ip ipsec profile
set [ find default=yes ] dh-group=ecp521 dpd-interval=30s dpd-maximum-failures=3 enc-algorithm=aes-256 hash-algorithm=sha512
/ip ipsec proposal
set [ find default=yes ] auth-algorithms=sha512 enc-algorithms=aes-256-cbc pfs-group=ecp521
/ip ipsec identity
add auth-method=digital-signature certificate=*redacted* generate-policy=port-strict match-by=\
    certificate peer=node-b policy-template-group=common remote-certificate=*redacted*
add auth-method=digital-signature certificate=*redacted* match-by=certificate peer=node-a \
    policy-template-group=common remote-certificate=*redacted*
/ip ipsec policy
add dst-address=172.16.64.20/32 level=unique peer=node-a protocol=gre src-address=172.16.64.21/32 tunnel=yes
add dst-address=172.16.64.22/32 level=unique peer=node-b protocol=gre src-address=172.16.64.23/32 tunnel=yes

/ip firewall nat
add action=masquerade chain=srcnat comment="defconf: masquerade" ipsec-policy=out,none out-interface-list=WAN protocol=!gre

When using any tunnel protocol, consider making MTU changes to take tunnel overhead into account for preventing packet fragmentation.

Quick calculations here:
1500B - (8B PPPoE) - (20B IPSec header) - (8B Nat-T) - (8B ESP header) - (16B ESP IV) - (24B GRE) - (34B ESP trailer) = 1382 bytes of actual MTU

both sides have this value, I even forced it manually to cross check - that didn’t help. Also, lowering it down to 1300 haven’t helped either.

TL;DR how does the CPU load (/tool/profile cpu=all) look under the traffic load?

There have been some changelog notes in last few versions regarding hardware AES acceleration specifically on the Alpine SoC family being broken and fixed and fixed again, etc.
RB4011 has an Alpine chip, so this might be somehow related - even though the changelogs were specifically highlighting AES-GCM which is not what you have in your config.

I ended with 1376 as working common number for IPSec GRE/IPIP over different providers across Poland.

@BartoszP - I don’t think this is really MTU issue. Autonegotiated MTU actually aligns with one I’ve calculated and lowering it does not help either. I may test it in direct connection with one of the peers to test against NAT-T or other potentially offending voodoo.

@wrkq - on 7.15.3 I see a spike during transfer up to 90% of single core utilization, more typical:

# 2025-01-26 22:02:06 by RouterOS 7.15.3
# software id = RNLB-4V15
#
Columns: NAME, CPU, USAGE
NAME          CPU  USAGE
console         0  0%   
firewall        0  0%   
networking      0  3.5% 
winbox          0  1%   
logging         0  0.5% 
management      0  2%   
encrypting      0  0%   
routing         0  0%   
profiling       0  1%   
bridging        0  0%   
unclassified    0  1.5% 
cpu0               9.5% 
console         1  0%   
firewall        1  0.5% 
networking      1  2%   
winbox          1  0%   
management      1  1.5% 
encrypting      1  0%   
routing         1  0%   
profiling       1  0.5% 
bridging        1  0%   
unclassified    1  0%   
cpu1               4.5% 
ethernet        2  0%   
console         2  0.5% 
firewall        2  0%   
networking      2  0%   
winbox          2  0%   
management      2  1%   
routing         2  0%   
profiling       2  0%   
bridging        2  0%   
unclassified    2  0%   
cpu2               1.5% 
ethernet        3  0.5% 
console         3  0%   
firewall        3  12%  
networking      3  40.5%
management      3  0%   
encrypting      3  0%   
routing         3  8.5% 
profiling       3  1%   
bridging        3  1%   
unclassified    3  2.5% 
cpu3               66%

With 7.17 I get this:

# 2025-01-26 22:08:13 by RouterOS 7.17
# software id = RNLB-4V15
#
Columns: NAME, CPU, USAGE
NAME        CPU  USAGE
ppp           0  0%   
networking    0  0%   
management    0  0%   
telnet        0  0%   
console       0  0%   
routing       0  0.5% 
wireless      0  0%   
cpu0             0.5% 
networking    1  0%   
management    1  0.5% 
console       1  0%   
wireless      1  0.5% 
firewall      1  0%   
kernel        1  0%   
led           1  0%   
cpu1             1%   
ppp           2  0%   
networking    2  0%   
management    2  1.5% 
winbox        2  0.5% 
console       2  1%   
crypto        2  0%   
routing       2  0%   
wireless      2  0.5% 
profiling     2  0%   
kernel        2  0.5% 
bridge2       2  1%   
cpu2             5%   
networking    3  0%   
management    3  2.5% 
winbox        3  0%   
console       3  0.5% 
wireless      3  2%   
firewall      3  0%   
bridging      3  0%   
cpu3             5%

but no transfer is actually seen.

@mwisniewski, how do iperf3 tests for UDP or TCP with different packet sizes impact throughput and CPU usage? Have you tried different algorithms for IPsec Hardware acceleration? Is there an RB4011 on both ends?