High CPU usage

Hello everyone!
At the end of September 2024, we built an x86 server in the configuration:
Motherboard - SuperMicro X10SRi-F
CPU - Xeon E5-2697A v4
RAM - Kingston 8Gb 2133MHz
NIC - Intel X520-DA2

All this time, 3-4 days after the reboot, which has to be done once a week because of this, there is a gradual loading of 1 core, with an increase to a peak close to 100% threshold. This problem persists starting with version 7.15 during the initial installation, and ending with the current 7.17

Below is the CPU load graph (few month)
graph (2).png
few days
graph (3).png

What does Profiler show?

Can you as well share your config?

/export file=anynameyoulike

Remove serial and any other private info.

The load goes through the firewall
1_firewall.png
2_firewall.png
3_firewall.png
I found a possible reason, but I didn’t understand why it was happening gradually, as an accumulation effect. It seems to be related to the connection tracker. I have used the script to clean up old connections several times, for which there was no activity for more than 60 seconds. And it showed a sharp drop in load. But what is more interesting is that the load is initially distributed across all cores, but then it seems to fall on one core and stops parallelizing. Perhaps this is a failure of the internal load balancing logic. The strangest thing is that this happens only ater 3-4 days and is increasing.

Maybe you want to see something specific, some sections like Firewall, etc.? the file is very large to clean it from excess or post the whole thing.

I would like to see both /interface and /ip/firewall (and perhaps complete /ip).

look at: rx/tx p/s in interface menu

/ip fi fi pr co

iface.png

tx-rx.png

21

check actual connections with

ip/firewall/connection/tracking/print
/ip fi co tr pr

IMO your post would be much better if you used full commands and properties instead of these obfuscated code snippets. If not for other thing, these snipets might stop working if some future ROS would add new configuration branch/command with name beginning with same two characters as existing branch/command.

I’ll take the data at peak load later in the evening.
mikrotik.png

Peek time
mikrotik.png

I want to share with you what happened during this time and whether I was able to resolve the issue.

I made changes to the hardware and also upgraded to the latest version 7.18.2 + made some minor adjustments to the firewall, but it’s just not essential.

On the hardware:

  1. Replaced X-520DA2 with Mellanox Connect-X 4LX
  2. Replaced 1 8Gb memory module with 4x8GB (maximum 4 channel)

Configured queues:
multi-queue-ethernet-default
mq pfifo
queue size - 4000 packets (was 1000)

Actually with these settings 1 week of normal flight. If there will be changes I will write here.

I was planning to use CAKE, but for now I’ll leave it that way and watch for a while.

If this is running over KVM you may use simple “Chelsio On-Chip firewall option” . Chelsio T5 and T6 adapter supports filtering on the card itself which we can setup as a hardware firewall before pass traffic to MT KVM.
Hint - cxgbe*https://doc.dpdk.org/guides-2.2/nics/cxgbe.html

BareMetal setup