High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

What the hell are you talking about?

You imply that Fastpath and/or Fasttrack is a hard requirement in order to do any kind of traffic forwarding at scale. When the hardware is sufficiently spec’d out, I don’t find that to be the case. I’m running plenty of x86 installations that are doing just fine without it. It’s way more important for the RouterBOARD models on the lower end with slower processors than for beefy x86 boxes.

All the guides I’ve read say that your multiqueue should be set equal to the number of CPUs, you’ve given a VM. I assume you’ve tested that on your set up?

My personal set up has eight cores on a 13900h with a multiqueue = 8. All I know is the performance increased when I said it higher. I haven’t played around with it to find an optimal value though.

I still can’t manage to hit my 5gbps

You all right. I feel stupid relying on a company whose main focus is selling hardware devices, making the software aspect kind of like Cinderella.
Unfortunately, the company hides the hardware compatibility list with Intel hardware, so I found it on an old page version and decided to try thinking the best of the company (my bad, won’t do that again).

Meanwhile, in the v7.19 changelog:

*) conntrack - improved stability on busy systems;
*) system - improved system stability when sending TCP data from the router;
*) x86 - i40e updated driver to version 2.27.8;

Microsoft is selling a software called Windows and making it run flawlessly on any hardware can be a challenge as well? So you really want to make Mikrotik responsible for not supporting every possibly known hardware combination worldwide ever custom built? But you made one point: Mikrotik should declare Hardware requirements for x86 platform.

And… nothing has changed.
image (3).png

That looks a bit weird with that steady increase between 12:30 and 14:30, almost like a memory leak. Do a full export so we can take a look, and maybe “someone” might even have time to do a quick test in the lab. What kind of traffic is going on?

Regarding hardware support, since there’s no official list of supported hardware, you pretty much have to email support and ask. As a rule of thumb, you can assume most mainstream x86 drivers are included from Linux 5.6.3, plus a few legacy drivers that have been ported over from ROS v6.

# 2025-06-11 10:27:46 by RouterOS 7.18.2
#
/interface ethernet set [ find default-name=ether1 ] disable-running-check=no name=lan
/interface ethernet set [ find default-name=ether4 ] disable-running-check=no name=wan
/interface vrrp add group-authority=self interface=wan name=vrrp-wan priority=200 vrid=10
/interface vrrp add group-authority=vrrp-wan interface=lan name=vrrp-lan vrid=20
/interface list add name=LAN
/interface list add name=WAN
/ip ipsec policy group add name=AWS
/ip ipsec profile add dh-group=modp1024 dpd-interval=10s dpd-maximum-failures=3 enc-algorithm=aes-128 lifetime=8h name=profile-vpn-xxxxxxxxxxxxxxxxxxx
/ip ipsec peer add address=aaa.aaa.aaa.aaa/32 disabled=yes exchange-mode=ike2 local-address=xxx.xxx.xxx.70 name=aaa.aaa.aaa.aaa profile=profile-vpn-xxxxxxxxxxxxxxxxxxx
/ip ipsec peer add address=bbb.bbb.bbb.bbb/32 exchange-mode=ike2 local-address=xxx.xxx.xxx.70 name=bbb.bbb.bbb.bbb profile=profile-vpn-xxxxxxxxxxxxxxxxxxx
/ip ipsec proposal add enc-algorithms=aes-128-cbc lifetime=1h name=ipsec-vpn-xxxxxxxxxxxxxxxxxxx
/port set 0 name=serial0
/port set 1 name=serial1
/queue interface set lan queue=multi-queue-ethernet-default
/queue interface set wan queue=multi-queue-ethernet-default
/ip firewall connection tracking set enabled=yes
/ip neighbor discovery-settings set discover-interface-list=LAN
/ip settings set allow-fast-path=no
/interface list member add interface=vrrp-lan list=LAN
/interface list member add interface=wan list=WAN
/interface list member add interface=lan list=LAN
/interface list member add interface=vrrp-wan list=WAN
/ip firewall filter add action=accept chain=input comment="accept established,related,untracked" connection-state=established,related,untracked
/ip firewall filter add action=drop chain=input comment="drop invalid" connection-state=invalid log-prefix=invalid
/ip firewall filter add action=accept chain=input comment="ipsec policy matcher" ipsec-policy=in,ipsec
/ip firewall filter add action=accept chain=input comment="allow local networks" src-address-list=lan-list
/ip firewall filter add action=accept chain=input comment="allow other local wan addresses" src-address-list=wan-list
/ip firewall filter add action=accept chain=input comment="allow IPSec IKE" dst-port=500,4500 in-interface-list=WAN protocol=udp
/ip firewall filter add action=accept chain=input comment="allow IPSec AH" in-interface-list=WAN protocol=ipsec-ah
/ip firewall filter add action=accept chain=input comment="allow IPSec ESP" in-interface-list=WAN protocol=ipsec-esp
/ip firewall filter add action=accept chain=input comment="allow Winbox" dst-port=8291 in-interface-list=WAN protocol=tcp src-address-list=trusted-list
/ip firewall filter add action=accept chain=input comment="allow SSH" dst-port=22 in-interface-list=WAN protocol=tcp src-address-list=trusted-list
/ip firewall filter add action=accept chain=input comment="Allow SNMP" dst-port=161 in-interface-list=WAN protocol=udp src-address-list=trusted-list
/ip firewall filter add action=accept chain=input comment="Allow ICMP" in-interface-list=WAN protocol=icmp src-address-list=trusted-list
/ip firewall filter add action=accept chain=input comment="allow VRRP" protocol=vrrp
/ip firewall filter add action=drop chain=input comment="block everything else"
/ip firewall filter add action=accept chain=forward comment="ipsec in policy matcher" ipsec-policy=in,ipsec
/ip firewall filter add action=accept chain=forward comment="ipsec out policy matcher" ipsec-policy=out,ipsec
/ip firewall filter add action=accept chain=forward comment="Established, Related" connection-state=established,related
/ip firewall filter add action=accept chain=forward comment="accept internal traffic" src-address-list=lan-list
/ip firewall filter add action=accept chain=forward comment="accept vpc traffic" src-address-list=vpc-list
/ip firewall filter add action=drop chain=forward comment="Drop invalid" connection-state=invalid log=yes log-prefix=invalid
/ip firewall filter add action=drop chain=forward comment="block everything else"
/ip firewall nat add action=accept chain=srcnat comment="ipsec no nat" ipsec-policy=out,ipsec
/ip firewall nat add action=src-nat chain=srcnat comment="src-nat non-ipsec" ipsec-policy=out,none out-interface-list=WAN to-addresses=xxx.xxx.xxx.70
/ip firewall raw add action=notrack chain=prerouting dst-address=10.0.0.0/8 src-address=10.0.0.0/8
/ip ipsec identity add peer=aaa.aaa.aaa.aaa 
/ip ipsec identity add peer=bbb.bbb.bbb.bbb
/ip ipsec policy add action=none comment="bypass encryption to local networks" dst-address=yyy.yyy.yyy.0/18 src-address=0.0.0.0/0
/ip ipsec policy add dst-address=169.254.214.201/32 peer=bbb.bbb.bbb.bbb proposal=ipsec-vpn-xxxxxxxxxxxxxxxxxxx src-address=0.0.0.0/0 tunnel=yes
/ip ipsec policy add dst-address=10.0.0.0/8 peer=bbb.bbb.bbb.bbb proposal=ipsec-vpn-xxxxxxxxxxxxxxxxxxx src-address=yyy.yyy.yyy.0/18 tunnel=yes
/ip route add gateway=xxx.xxx.xxx.65
/ip route add comment="Advertise local IPs via BGP" disabled=no distance=1 dst-address=yyy.yyy.yyy.5/32 gateway=lo routing-table=main scope=30 suppress-hw-offload=no target-scope=10
/ip route add comment="Advertise local IPs via BGP" disabled=no distance=1 dst-address=zzz.zzz.zzz.0/24 gateway=lan routing-table=main scope=30 suppress-hw-offload=no target-scope=10
/ip route add comment="Advertise local IPs via BGP" disabled=no distance=1 dst-address=yyy.yyy.yyy.3/32 gateway=lo routing-table=main scope=30 suppress-hw-offload=no target-scope=10
/ip service set telnet disabled=yes
/ip service set ftp disabled=yes
/ip service set www disabled=yes
/ip service set api disabled=yes
/ip ssh set strong-crypto=yes
/routing bgp connection add as=65570 disabled=no input.filter=bgp-in local.address=169.254.214.202 .role=ebgp name=BGP-169.254.214.201 output.filter-chain=bgp-out .network=bgp-networks remote.address=169.254.214.201/32 .as=64515 routing-table=main
/routing bgp connection add as=65570 disabled=no input.filter=bgp-in local.address=169.254.158.118 .role=ebgp name=BGP-169.254.158.117 output.filter-chain=bgp-out .network=bgp-networks remote.address=169.254.158.117/32 .as=64515 routing-table=main
/routing filter rule add chain=bgp-in-pri comment="Main link input filter" disabled=no rule="set bgp-local-pref 200;  set distance 19; jump bgp-in;"
/routing filter rule add chain=bgp-in-sec comment="Secondary link input filter" disabled=no rule="set bgp-local-pref 100; set distance 21; jump bgp-in;"
/routing filter rule add chain=bgp-in comment="Exclude local networks from AWS advertisements" disabled=no rule="if (dst in lan-list) { reject;}"
/routing filter rule add chain=bgp-in comment="Set a preferred IP for outgoing connections and accept all routes" disabled=no rule="set pref-src yyy.yyy.yyy.5; accept;"
/routing filter rule add chain=bgp-out-pri comment="Main link output filter" disabled=no rule="set bgp-out-med 50; jump bgp-out;"
/routing filter rule add chain=bgp-out-sec comment="Secondary link output filter" disabled=no rule="set bgp-out-med 100; jump bgp-out;"
/routing filter rule add chain=bgp-out comment="Announce only the approved networks" disabled=no rule="if (dst in bgp-networks) { accept; } else { reject; }"
/system logging add disabled=yes topics=ipsec
/system logging add topics=vrrp
/system logging add disabled=yes topics=ipsec,!debug
/system logging add disabled=yes topics=dns
/system logging add disabled=yes topics=bgp
/system logging add disabled=yes topics=route
/system logging add disabled=yes topics=ipsec
/system note set show-at-login=no
/system ntp client set enabled=yes
/system ntp server set enabled=yes
/system ntp client servers add address=0.us.pool.ntp.org
/tool bandwidth-server set enabled=no
/tool mac-server set allowed-interface-list=LAN
/tool mac-server mac-winbox set allowed-interface-list=LAN

Have you considered/discarded the possibility that this is being caused by an elephant flow?
https://en.wikipedia.org/wiki/Elephant_flow

TIL :slight_smile:

No, I have a bunch of clients browsing the internet, and they don’t have any long-running elephant flows.

Have you tried 7.15.3 or 7.16.x? I know my CRS300’s had issues with 7.15.3 causing random reboots due to a memory problem, so 7.16.2 fixed those, but 7.16.x on my CCR2116’s would have random BGP “stuck route” problems, so they were great on 7.15.3 (and now 7.19.1).

@pushkink, any idea what kind of traffic pattern we’re seeing here? Could be handy to know, just to match it up with the timeline.

I have also encountered your problem. When the number of routers exceeds 32 cores, various strange and unusual issues arise
Try disabling hyper threading or shutting down some cores, and most likely the problem will be resolved

Re; … I have also encountered your problem. When the number of routers exceeds 32 cores

I’ve seen this many times when I increase the CPU cores on a CHR vm ( such as increasing from 24 cores to 40 ).
I never took the time to find the magic number where a CHR starts to fall apart when increasing CPU cores.
Question - Are you finding that it happens at greater than 32 cores or somewhere around 32 cores?
Question - All CHR ROS version’s ( 6.x and 7.x ) ?

That’s an intriguing idea, thank you! I’ll definitely give it a try on Monday.

Yes, I have a large number of x86 physical machines with installed routeros environments. The basic problems occur when the single core is 100% larger than 32 cores. I also encountered machines with 72 cores and X86 cores. If there is a slight amount of traffic, the device will automatically restart immediately. Trying to disable the watchdog also did not work until the hyper threading was turned off. There are also some machines with dual 20 cores and 40 threads that cannot solve the problem of a single core being full. Changing the CPU will work normally. These problems are extremely easy to occur in machines with more than 32 cores. Until recently, I tried to test the cracked version of routeros, which has a shell that can enter the system bottom layer. Then I wrote a Python script to optimize the handling of IRQ interrupts, bind CPUs, enable XPS, and modify them. The values of net.cre.netdev-budget and net.cre.rps_stock_flow_detries in the kernel indicate that I have 72 x86 cores The router can work normally now, without the problem of CPU single core being fully occupied or automatic restart

Because I need to use the IPv6 version of router, it has been abandoned and I have only tried the V7 version. I have also encountered situations where 32 core network cards only work on some CPUs in the V7 version CHR. My environment has 32 VLAN based WAN interfaces, with 2 IXGBE 10G interfaces, and CPU0.. 32 is bound to each network card queue. Only 0-15 works normally

Disabling Hyper-Threading reduced it to 28 CPU cores.
It’s not ideal, but it’s much better.
I’ll also try to limit it to 14 CPU cores tomorrow.

Suggest testing cores 8, 16, and 32 in this way

how many cpu physical sockets the server have ?

For anyone who might be interested, here’s an update: I tried RouterOS 6.49.18, but it seems to lack IPSec support for some reason:

/ip ipsec proposal add
bad command name ipsec (line 1 column 5)

Did you add security package ? And maybe dhcp needs to be added as well.
On ROS6 there was a lot more in separate packages.