Community discussions

MikroTik App
 
pushkink
just joined
Topic Author
Posts: 11
Joined: Thu May 01, 2025 5:04 pm

High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Mon May 26, 2025 1:28 pm

Hi everyone,

I'm running RouterOS 7.18.2 stable on a beefy Supermicro Super Server x86_64 with 56 cores.
My setup handles decent traffic (max inbound 1.22 Gb/s, outbound 665Mb/s), but I'm facing a weird CPU issue.

While my overall CPU usage stays around 8-11%, one core keeps hitting 100%.
After running the profiler, it points to firewall usage. Interestingly, even with filter rules disabled, the problem persists, so I suspect it's SNAT related.

A reboot temporarily fixes it (CPU drops to normal), but over time it creeps back up to the same high values on that single core.
Screenshot 2025-05-26 at 11.31.13.png
|

My current SNAT config is pretty simple (WAN list contains 2 interfaces):
/ip firewall nat
add action=accept chain=srcnat comment="ipsec no nat" ipsec-policy=out,ipsec
add action=src-nat chain=srcnat comment="src-nat non-ipsec" ipsec-policy=out,none out-interface-list=WAN to-addresses=xxx.xxx.xxx.xxx
Anyone running a similar setup who can suggest how to better distribute the load or tune the NAT configuration?

Thanks in advance!
You do not have the required permissions to view the files attached to this post.
 
User avatar
NathanA
Forum Guru
Forum Guru
Posts: 1039
Joined: Tue Aug 03, 2004 9:01 am

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Wed May 28, 2025 12:42 am

I'm running RouterOS 7.18.2 stable on a beefy Supermicro Super Server x86_64 with 56 cores.
My setup handles decent traffic (max inbound 1.22 Gb/s, outbound 665Mb/s), but I'm facing a weird CPU issue.

Holy cow, that hardware sounds way overprovisioned for your use-case! I am using much weaker CPUs for 6x that amount of NAT traffic without this kind of problem, but I'm also using RouterOS 6, not 7. So maybe something changed in 7 that is causing this issue. I have noticed in some of the testing I've done that 7 seems to handle connection tracking table purges differently than 6 did, so maybe it has something to do with that...perhaps connection tracking is itself a single-threaded process.

I'm guessing "no" if this is a production box, but as a test, it possible for you to try to disable your NAT rules and then let it run that way for a while, but force connection tracking enabled=yes (instead of =auto)? It would be interesting to see if the same single-core CPU creep happens even without any NAT happening, as long as connection tracking is doing its thing. (Maybe if you can't run this box that way for long enough, you could take some other x86 box, load another copy of RouterOS on it, put it in-line with this router, and just have it blindly forward traffic to/from the production router with connection tracking set to "yes" and no firewall or NAT rules defined. Then see if its CPU creeps up similarly.)

Instead of a full reboot, it might also be interesting to disable your NAT rules, then manually purge the entire connection tracking table, and finally turn your NAT rules back on. See if that calms things down. If it does, that would still point to connection tracking being the culprit in my book...
 
User avatar
Larsa
Forum Guru
Forum Guru
Posts: 1986
Joined: Sat Aug 29, 2015 7:40 pm
Location: The North Pole, Santa's Workshop

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Wed May 28, 2025 1:14 am

@pushkink; Most likely a NIC issue, like an interrupt storm hammering the CPU. Your setup should easily handle hundreds of gigabits, so something’s clearly off.
 
pushkink
just joined
Topic Author
Posts: 11
Joined: Thu May 01, 2025 5:04 pm

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Thu Jun 05, 2025 2:18 pm

Holy cow, that hardware sounds way overprovisioned for your use-case! I am using much weaker CPUs for 6x that amount of NAT traffic without this kind of problem, but I'm also using RouterOS 6, not 7. So maybe something changed in 7 that is causing this issue.
It is what it is. It looks like the only option may be to downgrade to 6.x, unless Support acknowledges the bug in 7.x and provides a fix, but they aren't very quick in responding to my ticket. I can't disable NAT, as I have services that heavily rely on internet access from private subnets. Additionally, disabling connection tracking would essentially disable SNAT. It's unfortunate to consider a downgrade since I'm also utilizing the nice Group VRRP feature that is only available in 7.x.
 
pushkink
just joined
Topic Author
Posts: 11
Joined: Thu May 01, 2025 5:04 pm

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Thu Jun 05, 2025 2:21 pm

@pushkink; Most likely a NIC issue, like an interrupt storm hammering the CPU. Your setup should easily handle hundreds of gigabits, so something’s clearly off.
Can you suggest commands or guides to troubleshoot this?
I know this can be done in plain Linux, but here we have a restrictive shell.

I don't see many packet drops or errors relative to the overall traffic, and they are not increasing every minute.
Screenshot 2025-06-05 at 12.23.25.png
You do not have the required permissions to view the files attached to this post.
 
User avatar
NathanA
Forum Guru
Forum Guru
Posts: 1039
Joined: Tue Aug 03, 2004 9:01 am

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Thu Jun 05, 2025 2:58 pm

It looks like the only option may be to downgrade to 6.x, unless Support acknowledges the bug in 7.x and provides a fix, but they aren't very quick in responding to my ticket.

MikroTik support do be like that sometimes. To be fair, though, we REALLY have no idea at this point if you are running into a new(ish) 7.x-only bug or not. It's merely speculation at this point. There could be something severely wrong with your set-up that would also cause 6.x to fall on its knees in a similar way. The thing that gives me pause is that gradual climb of the one CPU core over the course of an extended period of time, which makes me think the problem is connection count creeping up slowly over time & pruning of connections not happening correctly (pure conjecture).

I can't disable NAT, as I have services that heavily rely on internet access from private subnets. Additionally, disabling connection tracking would essentially disable SNAT.

Yes I figured that, but thought I'd ask just in case. Any chance you can stick another box in between this one and the rest of the network for testing purposes? We need to find a way to reproduce this on a machine that doesn't hamper your production network so we can trial-and-error things without affecting your customers. Since we know your production traffic load, though, is at least one sure-fire way to cause the problem to happen, if you could have two routers back-to-back, that opens up some testing possibilities.

Can you suggest commands or guides to troubleshoot this?

To investigate the theory he presented, you'd want to be watching the numbers under /system/resource/irq/print, specifically any that seem to have names that are clearly related to any of your ethernet interfaces. (The names presented here will vary based on what the particular driver for your NIC chip(set) decides to call them.)
 
pushkink
just joined
Topic Author
Posts: 11
Joined: Thu May 01, 2025 5:04 pm

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Thu Jun 05, 2025 4:32 pm

For now, CPU=45 is 99% busy, and /tool/profile does not display anything from the shell:
> /tool/profile cpu=45

Unlike other CPUs, such as CPU=44, where I can see what it's being used for:
> /tool/profile cpu=44
Columns: NAME, CPU, USAGE
NAME        CPU  USAGE
networking   44  1%   
management   44  0%   
crypto       44  0%   
routing      44  0%   
firewall     44  5%   
ixgbe        44  2%   
xt_misc      44  0%   
esp4         44  1%   
cpu44            9%   
What can we read from this?
> /system/resource/irq/print              
Columns: IRQ, USERS, CPU, ACTIVE-CPU, COUNT
  # IRQ  USERS            CPU   ACTIVE-CPU          COUNT
  0   4  ttyS0            auto           2              5
  1   9  acpi             auto           3              3
  2  18  usb1             auto           6             60
         usb2                                            
  3  24  dmar0            auto          13              0
  4  25  dmar1            auto           4              0
  5  27  PCIe BW notif    auto           5              0
  6  29  PCIe BW notif    auto          29              0
  7  31  PCIe BW notif    auto           7              0
  8  32  PCIe BW notif    auto           8              0
  9  33  PCIe BW notif    auto           9              0
 10  35  PCIe BW notif    auto          10              0
 11  36  megasas0-msix0   auto          11         95 448
 12  37  megasas0-msix1   auto          12          2 085
 13  38  megasas0-msix2   auto          16          1 959
 14  39  megasas0-msix3   auto          14          1 851
 15  40  megasas0-msix4   auto          15          1 607
 16  41  megasas0-msix5   auto          16            662
 17  42  megasas0-msix6   auto          17            483
 18  43  megasas0-msix7   auto          18            599
 19  44  megasas0-msix8   auto          19            507
 20  45  megasas0-msix9   auto          20            448
 21  46  megasas0-msix10  auto          21            319
 22  47  megasas0-msix11  auto          22            387
 23  48  megasas0-msix12  auto          23          1 097
 24  49  megasas0-msix13  auto          24          1 661
 25  50  megasas0-msix14  auto          25          1 004
 26  51  megasas0-msix15  auto          26            164
 27  52  megasas0-msix16  auto          27            103
 28  53  megasas0-msix17  auto          28            147
 29  54  megasas0-msix18  auto          52              4
 30  55  megasas0-msix19  auto          30            329
 31  56  megasas0-msix20  auto          31            486
 32  57  megasas0-msix21  auto          32            252
 33  58  megasas0-msix22  auto          33            260
 34  59  megasas0-msix23  auto          34            101
 35  60  megasas0-msix24  auto          35            290
 36  61  megasas0-msix25  auto          36            407
 37  62  megasas0-msix26  auto          37            195
 38  63  megasas0-msix27  auto          38            224
 39  64  megasas0-msix28  auto          39            650
 40  65  megasas0-msix29  auto          40            381
 41  66  megasas0-msix30  auto          41            251
 42  67  megasas0-msix31  auto          42            241
 43  68  megasas0-msix32  auto          43            510
 44  69  megasas0-msix33  auto          44            281
 45  70  megasas0-msix34  auto          45            249
 46  71  megasas0-msix35  auto          46            284
 47  72  megasas0-msix36  auto          47            180
 48  73  megasas0-msix37  auto          48            215
 49  74  megasas0-msix38  auto          49            168
 50  75  megasas0-msix39  auto          50            170
 51  76  megasas0-msix40  auto          51            147
 52  77  megasas0-msix41  auto          22            461
 53  78  megasas0-msix42  auto          53            812
 54  79  megasas0-msix43  auto          54            289
 55  80  megasas0-msix44  auto          55             84
 56  81  megasas0-msix45  auto           2            122
 57  82  megasas0-msix46  auto           3             29
 58  83  megasas0-msix47  auto           4          1 091
 59  84  megasas0-msix48  auto           5          2 239
 60  85  megasas0-msix49  auto           6            961
 61  86  megasas0-msix50  auto           7            612
 62  87  megasas0-msix51  auto           8            347
 63  88  megasas0-msix52  auto           9            204
 64  89  megasas0-msix53  auto          10          4 893
 65  90  megasas0-msix54  auto          11             68
 66  91  megasas0-msix55  auto          12             65
 67  92  megasas0-msix56  auto          13             42
 68  93  00:11.4]         auto          14              0
 69  94  00:1f.2]         auto          15              0
 70  95  xhci_hcd         auto          19            137
 71  96  eth0-TxRx-0      auto          17  1 460 536 389
 72  97  eth0-TxRx-1      auto          18  2 011 003 568
 73  98  eth0-TxRx-2      auto          19  1 058 238 458
 74  99  eth0-TxRx-3      auto          20  1 974 072 575
 75 100  eth0-TxRx-4      auto          21  1 847 522 050
 76 101  eth0-TxRx-5      auto          22  1 534 860 209
 77 102  eth0-TxRx-6      auto          23  1 994 255 518
 78 103  eth0-TxRx-7      auto          24  1 418 149 038
 79 104  eth0-TxRx-8      auto          25  1 043 150 712
 80 105  eth0-TxRx-9      auto          26  1 007 917 686
 81 106  eth0-TxRx-10     auto          27  1 038 770 860
 82 107  eth0-TxRx-11     auto          28  2 210 154 823
 83 108  eth0-TxRx-12     auto          29  2 139 789 519
 84 109  eth0-TxRx-13     auto          30  2 255 404 407
 85 110  eth0-TxRx-14     auto          31  1 502 529 395
 86 111  eth0-TxRx-15     auto          32  1 193 376 031
 87 112  eth0-TxRx-16     auto          33  3 559 004 657
 88 113  eth0-TxRx-17     auto          34  3 582 496 909
 89 114  eth0-TxRx-18     auto          35  3 560 685 714
 90 115  eth0-TxRx-19     auto          36  3 557 551 208
 91 116  eth0-TxRx-20     auto          37  3 558 460 219
 92 117  eth0-TxRx-21     auto          38  3 558 489 687
 93 118  eth0-TxRx-22     auto          39  3 569 257 906
 94 119  eth0-TxRx-23     auto          40  3 567 661 453
 95 120  eth0-TxRx-24     auto          41  3 558 344 822
 96 121  eth0-TxRx-25     auto          42  3 562 462 799
 97 122  eth0-TxRx-26     auto          43  3 559 491 221
 98 123  eth0-TxRx-27     auto          44  3 558 705 507
 99 124  eth0-TxRx-28     auto          45  3 384 240 298
100 125  eth0-TxRx-29     auto          46  3 560 879 900
101 126  eth0-TxRx-30     auto          47  3 554 781 524
102 127  eth0-TxRx-31     auto          48  3 557 698 825
103 128  eth0-TxRx-32     auto          49  3 558 606 534
104 129  eth0-TxRx-33     auto          50  3 560 291 509
105 130  eth0-TxRx-34     auto          51  3 558 333 700
106 131  eth0-TxRx-35     auto          52  3 555 801 863
107 132  eth0-TxRx-36     auto          53  3 553 367 271
108 133  eth0-TxRx-37     auto          54  3 555 936 290
109 134  eth0-TxRx-38     auto          55  3 555 577 595
110 135  eth0-TxRx-39     auto           2  3 558 959 972
111 136  eth0-TxRx-40     auto           3  3 557 147 352
112 137  eth0-TxRx-41     auto           4  3 557 027 473
113 138  eth0-TxRx-42     auto           5  3 555 033 752
114 139  eth0-TxRx-43     auto           6  3 557 136 467
115 140  eth0-TxRx-44     auto           7  3 552 906 258
116 141  eth0-TxRx-45     auto           8  2 447 637 918
117 142  eth0-TxRx-46     auto           9  3 556 338 593
118 143  eth0-TxRx-47     auto          10  3 556 098 483
119 144  eth0-TxRx-48     auto          11  3 555 585 157
120 145  eth0-TxRx-49     auto          12  3 560 757 480
121 146  eth0-TxRx-50     auto          13  3 561 449 536
122 147  eth0-TxRx-51     auto          14  3 560 236 318
123 148  eth0-TxRx-52     auto          15  3 559 076 418
124 149  eth0-TxRx-53     auto          16  3 561 048 937
125 150  eth0-TxRx-54     auto          17  3 556 320 373
126 151  eth0-TxRx-55     auto          18  3 559 437 362
127 152  wan              auto           1        642 699
128 154  eth1-TxRx-0      auto          20  2 043 409 648
129 155  eth1-TxRx-1      auto          21  2 059 077 705
130 156  eth1-TxRx-2      auto          22  2 050 299 220
131 157  eth1-TxRx-3      auto          23  2 062 611 232
132 158  eth1-TxRx-4      auto          24  2 064 217 875
133 159  eth1-TxRx-5      auto          25  2 063 896 375
134 160  eth1-TxRx-6      auto          26  2 062 912 179
135 161  eth1-TxRx-7      auto          27  2 040 755 124
136 162  eth1-TxRx-8      auto          28  2 058 112 172
137 163  eth1-TxRx-9      auto          29  2 094 410 699
138 164  eth1-TxRx-10     auto          30  2 057 745 812
139 165  eth1-TxRx-11     auto          31  2 092 827 542
140 166  eth1-TxRx-12     auto          32  2 062 684 231
141 167  eth1-TxRx-13     auto          33  2 093 058 355
142 168  eth1-TxRx-14     auto          34  2 057 597 129
143 169  eth1-TxRx-15     auto          35  2 027 012 472
144 170  eth1-TxRx-16     auto          36  2 212 066 328
145 171  eth1-TxRx-17     auto          37  2 017 410 179
146 172  eth1-TxRx-18     auto          38  2 265 972 201
147 173  eth1-TxRx-19     auto          39  2 241 548 440
148 174  eth1-TxRx-20     auto          40  2 218 679 584
149 175  eth1-TxRx-21     auto          41  2 192 745 668
150 176  eth1-TxRx-22     auto          42  2 194 650 410
151 177  eth1-TxRx-23     auto          43  2 186 447 262
152 178  eth1-TxRx-24     auto          44  2 216 868 956
153 179  eth1-TxRx-25     auto          45  2 109 893 285
154 180  eth1-TxRx-26     auto          46  2 233 775 108
155 181  eth1-TxRx-27     auto          47  2 223 206 742
156 182  eth1-TxRx-28     auto          48  2 231 636 526
157 183  eth1-TxRx-29     auto          49  2 230 343 315
158 184  eth1-TxRx-30     auto          50  2 221 848 326
159 185  eth1-TxRx-31     auto          51  2 232 989 033
160 186  eth1-TxRx-32     auto          52  2 224 315 139
161 187  eth1-TxRx-33     auto          53  2 221 934 100
162 188  eth1-TxRx-34     auto          54  2 217 622 314
163 189  eth1-TxRx-35     auto          55  2 206 930 594
164 190  eth1-TxRx-36     auto           2  2 210 813 719
165 191  eth1-TxRx-37     auto           3  2 218 723 962
166 192  eth1-TxRx-38     auto           4  2 221 329 084
167 193  eth1-TxRx-39     auto           5  2 251 227 715
168 194  eth1-TxRx-40     auto           6  2 193 264 827
169 195  eth1-TxRx-41     auto           7  2 248 562 129
170 196  eth1-TxRx-42     auto           8  2 215 783 567
171 197  eth1-TxRx-43     auto           9  2 232 674 192
172 198  eth1-TxRx-44     auto          10  2 211 269 824
173 199  eth1-TxRx-45     auto          11  1 928 383 584
174 200  eth1-TxRx-46     auto          12  2 199 906 786
175 201  eth1-TxRx-47     auto          13  2 222 640 360
176 202  eth1-TxRx-48     auto          14  2 209 900 371
177 203  eth1-TxRx-49     auto          15  2 215 964 223
178 204  eth1-TxRx-50     auto          16  2 200 805 808
179 205  eth1-TxRx-51     auto          17  2 190 402 415
180 206  eth1-TxRx-52     auto          18  2 214 648 886
181 207  eth1-TxRx-53     auto          19  2 227 922 640
182 208  eth1-TxRx-54     auto          20  2 226 647 222
183 209  eth1-TxRx-55     auto          21  2 209 216 774
184 210  lan              auto           0        947 597
 
pushkink
just joined
Topic Author
Posts: 11
Joined: Thu May 01, 2025 5:04 pm

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Thu Jun 05, 2025 4:39 pm

Regarding connection tracking, I agree this could be an issue. When I tried to move traffic to another router, the CPU returned to standard load after 5-10 minutes. Therefore, I attempted to decrease the timeouts, and my current settings are as follows, but the issue still exists:
/ip firewall connection tracking
set enabled=yes generic-timeout=5m loose-tcp-tracking=no tcp-established-timeout=1h tcp-max-retrans-timeout=1m tcp-unacked-timeout=1m udp-stream-timeout=1m
 
User avatar
Larsa
Forum Guru
Forum Guru
Posts: 1986
Joined: Sat Aug 29, 2015 7:40 pm
Location: The North Pole, Santa's Workshop

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Thu Jun 05, 2025 5:44 pm

Can you suggest commands or guides to troubleshoot this? I know this can be done in plain Linux, but here we have a restrictive shell.

Since you are running bare metal, you’re pretty much on your own because the usual troubleshooting tools aren’t available, which means troubleshooting ROS is basically pointless. I’d recommend moving to virtualization in an environment you actually control. That way, you get all the standard system tools and troubleshooting issues like this becomes much easier.

Also, double-check that your NIC drivers actually support features like IRQ moderation, L2/L3 offload, and most importantly SR-IOV, PCI passthrough, or DirectPath I/O, depending on your virtualization platform. You also need the corresponding drivers in ROS, of course. If you don’t have proper support for these, you will run into serious trouble with network-related performance issues.
 
User avatar
Larsa
Forum Guru
Forum Guru
Posts: 1986
Joined: Sat Aug 29, 2015 7:40 pm
Location: The North Pole, Santa's Workshop

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Thu Jun 05, 2025 5:50 pm

Regarding connection tracking, I agree this could be an issue.

It’s most likely not. We run plenty of high-end, albeit virtualized, platforms and have never had any issues with connection tracking or fast track on any of them.
 
User avatar
TomjNorthIdaho
Forum Guru
Forum Guru
Posts: 1614
Joined: Mon Oct 04, 2010 11:25 pm
Location: North Idaho
Contact:

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Thu Jun 05, 2025 5:57 pm

Re: .. I'm running RouterOS 7.18.2 stable on a beefy Supermicro Super Server x86_64 with 56 cores. ...

Are you running ROS x86 on bare metal ( ISO install ) ?
Or , are you running ROS x86 or CHR on a hypervisor such as VmWare or Proxmox ?

Many times I have seen that when running ROS CHR ( v6 and v7 ) on VmWare and/or Proxmox and I have lots of CPUs assigned to the CHR vm , that the CHR actually runs much slower than when I use only 20 to 24 CPUs. When running 32 to 50+ CPUs , the profile shows that one CPU gets maxed out and other CPUs are almost idle. However - when I lower the virtual CPUs to about 20 something , then all CPUs appear to be equally sharing the loads and no single CPU gets maxed out.

I've wondered is there some kind of a ROS CPU counter/driver in ROS x86 and/or CHR that that has issues when a high number of CPUs are presented to the ROS ?

FYI - my CHRs are sustaining about 3-Gig and bursting to about 6+ Gig during peak usage hours.
FYI - with and without hyperthreading , same CPU results in the ROS Profile.
Note - In my case , CHR on Proxmox , 20-CPUs, HyperThreading disabled , two interfaces , I find that MultiQueue=2 or 4 on both vm interfaces delivers the fastest throughput.

North Idaho Tom Jones
 
pushkink
just joined
Topic Author
Posts: 11
Joined: Thu May 01, 2025 5:04 pm

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Fri Jun 06, 2025 12:59 pm

I’m currently using RouterOS on bare metal to simplify the setup and maximize performance. However, the lack of Mikrotik support and built-in troubleshooting tools has rendered this approach less effective for me. If these issues persist, I believe I could persuade my bosses to consider switching to a more established vendor.
 
User avatar
Larsa
Forum Guru
Forum Guru
Posts: 1986
Joined: Sat Aug 29, 2015 7:40 pm
Location: The North Pole, Santa's Workshop

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Fri Jun 06, 2025 1:49 pm

”If these issues persist, I believe I could persuade my bosses to consider switching to a more established vendor virtualised environment, like what most professionals usually do."

Hmm, yet another bare metal idiocracy again. If you can’t admit your own mistakes by not checking the hardware compatibility beforehand, you might as well blame the vendor as usual…
 
User avatar
infabo
Forum Guru
Forum Guru
Posts: 1746
Joined: Thu Nov 12, 2020 12:07 pm

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Fri Jun 06, 2025 10:19 pm

Sounds like same symptom: viewtopic.php?t=214297
 
User avatar
NathanA
Forum Guru
Forum Guru
Posts: 1039
Joined: Tue Aug 03, 2004 9:01 am

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Sat Jun 07, 2025 4:27 am

Sounds like same symptom: viewtopic.php?t=214297

Why, so it does...good find.

From reading through that thread, it also sounds like connection tracking was indeed at least somewhat implicated...in an attempt to troubleshoot the problem, the OP from that thread actually did what I suggested that the OP of THIS thread do; namely, "manually purge the entire connection tracking table", and it seemed he got results.

But OP of other thread also then later went to the trouble of changing a few things, one of which included swapping out the NIC he was using for an entirely different brand. He seemed to indicate that a week following these changes, the problem hadn't returned, and then he never came back to the thread after that to provide us with a longer-term report.

What's interesting to me is that his original NIC was an Intel X520, which is a NIC we use tons of with no problems...though, again, on ROS 6. Though we don't know what NIC @pushkink is using, I will note that the names assigned to the per-thread IRQs ("ethX-TxRx-#") are identical to Intel's typical convention. So that is interesting.

Given all of that, it does seem like it would be at least reasonable as a test to break out some spare hardware, load a modern hypervisor on it, and run ROS 7 as a guest on it with the exact same config he's using on the bare-metal install on the Supermicro. Given he's barely cracking a gigabit, I personally would recommend that *at least for this test*, don't use SR-IOV as even those drivers could conceivably have bugs (e.g., if you have X520, ixgbevf could be plagued by similar problems as ixgbe could). Instead, just use virtio, or whatever paravirtualized adapter option your hypervisor of choice offers. Whether the problem does or doesn't happen, it will at least then be pretty darn conclusive whether the NIC is implicated whatsoever in the problem or not.
 
User avatar
Paternot
Forum Guru
Forum Guru
Posts: 1110
Joined: Thu Jun 02, 2016 4:01 am
Location: Niterói / Brazil

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Sat Jun 07, 2025 4:49 am

What's interesting to me is that his original NIC was an Intel X520, which is a NIC we use tons of with no problems...though, again, on ROS 6. Though we don't know what NIC @pushkink is using, I will note that the names assigned to the per-thread IRQs ("ethX-TxRx-#") are identical to Intel's typical convention. So that is interesting.
I may be wrong, but I think I've read around about people having some problems with the X520. Not RoS related - just Linux. If memory serves me right, it was something about IRQs...

I know it was an Intel NIC. I think it was an X520 - but not 100% sure. Maybe it was just one revision? Don´t remember details, it was some time ago.
 
User avatar
NathanA
Forum Guru
Forum Guru
Posts: 1039
Joined: Tue Aug 03, 2004 9:01 am

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Sat Jun 07, 2025 5:00 am

I may be wrong, but I think I've read around about people having some problems with the X520. Not RoS related - just Linux. If memory serves me right, it was something about IRQs...

I don't doubt that the ixgbe driver has had many, many bugs over the course of its life, just like many drivers have. However, that doesn't mean the particular bugs that you might have seen talked about in years past on Linux have any relevance, or still exist, within whatever version of the driver ROS 7.x bundles with itself.

If there is a CPU-load-balancing issue with ixgbe that somehow also only gets triggered when paired with connection tracking, it doesn't seem to be getting triggered on ROS 6, at least as far as my experience with it goes. And that would be a much older version of the driver. It could be a regression in a newer version.

But at this point we don't even know for sure that OP of this thread is even using X520 (though the indications are that he's at least using *some* Intel NIC), or that the problem is actually with the NIC and/or its driver. The responsibility now sits on OP's shoulders to do more A/B testing to prove or disprove that theory.

(FWIW, I *am* aware of an ROS 6 bug with this driver that causes it to simply not balance frame transmissions across more than one thread, IF those frames are tagged with an 802.1ad / "S-tag" (0x88a8) header. That was annoying to discover...)
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 13812
Joined: Thu Mar 03, 2016 10:23 pm

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Sat Jun 07, 2025 10:23 am

If there is a CPU-load-balancing issue with ixgbe that somehow also only gets triggered when paired with connection tracking ...

Just a random idea: could it be triggered (or made much worse) due to fasttracking? Having it active does affect how NIC drivers work ...
 
User avatar
NathanA
Forum Guru
Forum Guru
Posts: 1039
Joined: Tue Aug 03, 2004 9:01 am

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Sat Jun 07, 2025 11:57 am

Just a random idea: could it be triggered (or made much worse) due to fasttracking? Having it active does affect how NIC drivers work ...

I have yet to find a single person who has reported that Fastpath/Fasttrack works for them on their non-RouterBOARD hardware. I don't believe it works on x86, period, and given the diversity of third-party hardware out there, I don't believe MikroTik has any interest in making it work or supporting it on anything but MikroTik-branded products, and I believe that they have signalled as much in the past. You can add the rule, and the little checkbox under IP > Settings will show it is enabled, but the counters will not tick up, nor will there be a measurable performance difference.

Neither OPs of either thread posted a config export, but I would be surprised if either of them blindly added a Fasttrack action to their firewall rules, especially since the one in the other thread who hinted that the problem stopped happening for him never mentioned anything about Fasttrack. I suppose anything is possible, though, and that perhaps both of these people did so, and even that perhaps enabling Fasttrack on x86 does *something*, but being that its use is unsupported, the "something" it is doing is bad/buggy/"undefined behavior".

EDIT: I was just reading through the CHR help page for an unrelated reason, and came upon this interesting note:

Fast Path is supported in RouterOS v7 for "vmxnet3" and "virtio-net" adapters.

RouterOS v6 does not support Fast Path.

This surprised me, since I could swear I'd tested this in the past.

So I decided to re-test it (with vmxnet3), and sure enough, Fastpath/Fasttrack do work with the PV adapters (ROS 7.19.1).

Got me thinking that the last time I tested this, I must've had the VM configured with virtual E1000 adapters instead of vmxnet3 ones. So I changed my test VM interfaces from vmxnet3 to E1000, and FP/FT did indeed stop working at that point.

Anyway, the new(ish) Fastpath support with at least those two PV adapters is a pleasant surprise. But as far as I know, those are the only two adapters on x86 that Fastpath works with...which means you have to be running ROS virtualized instead of bare-metal *and* use PV drivers instead of either PCIe passthrough or SR-IOV to achieve it (short of somebody going through and testing a bunch of other supported adapters with SR-IOV support to see if any of them have undocumented Fastpath support).
 
User avatar
Larsa
Forum Guru
Forum Guru
Posts: 1986
Joined: Sat Aug 29, 2015 7:40 pm
Location: The North Pole, Santa's Workshop

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Mon Jun 09, 2025 3:29 pm

I have yet to find a single person who has reported that Fastpath/Fasttrack works for them on their non-RouterBOARD hardware. I don't believe it works on x86, period, and given the diversity of third-party hardware out there, I don't believe MikroTik has any interest in making it work or supporting it on anything but MikroTik-branded products, and I believe that they have signalled as much in the past. You can add the rule, and the little checkbox under IP > Settings will show it is enabled, but the counters will not tick up, nor will there be a measurable performance difference.

Oh, you’re obviously right and all the rest of us, with thousands of setups, are wrong for running virtualized x86 environments that aren’t supposed to work, yet somehow manage to push hundreds of Gbps without throttling the CPU, even on mid-range boxes.

But I’m really pleased you cleared that up. We’ll launch an immediate investigation to locate the issue and report back to MikroTik support. But I’m guessing they still won’t have time to register the case, since so many people keep reporting the same issue all the time. 😉
 
User avatar
NathanA
Forum Guru
Forum Guru
Posts: 1039
Joined: Tue Aug 03, 2004 9:01 am

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Tue Jun 10, 2025 2:58 am

Oh, you’re obviously right and all the rest of us, with thousands of setups, are wrong for running virtualized x86 environments that aren’t supposed to work, yet somehow manage to push hundreds of Gbps without throttling the CPU, even on mid-range boxes.

But I’m really pleased you cleared that up. We’ll launch an immediate investigation to locate the issue and report back to MikroTik support. But I’m guessing they still won’t have time to register the case, since so many people keep reporting the same issue all the time. 😉

What the hell are you talking about?

You imply that Fastpath and/or Fasttrack is a hard requirement in order to do any kind of traffic forwarding at scale. When the hardware is sufficiently spec'd out, I don't find that to be the case. I'm running plenty of x86 installations that are doing just fine without it. It's way more important for the RouterBOARD models on the lower end with slower processors than for beefy x86 boxes.
 
hapoo
Frequent Visitor
Frequent Visitor
Posts: 68
Joined: Wed Apr 24, 2019 1:35 am

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Tue Jun 10, 2025 3:41 am

Note - In my case , CHR on Proxmox , 20-CPUs, HyperThreading disabled , two interfaces , I find that MultiQueue=2 or 4 on both vm interfaces delivers the fastest throughput.

North Idaho Tom Jones
All the guides I’ve read say that your multiqueue should be set equal to the number of CPUs, you’ve given a VM. I assume you’ve tested that on your set up?

My personal set up has eight cores on a 13900h with a multiqueue = 8. All I know is the performance increased when I said it higher. I haven’t played around with it to find an optimal value though.

I still can’t manage to hit my 5gbps
 
pushkink
just joined
Topic Author
Posts: 11
Joined: Thu May 01, 2025 5:04 pm

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Wed Jun 11, 2025 12:10 pm

You all right. I feel stupid relying on a company whose main focus is selling hardware devices, making the software aspect kind of like Cinderella.
Unfortunately, the company hides the hardware compatibility list with Intel hardware, so I found it on an old page version and decided to try thinking the best of the company (my bad, won't do that again).

Meanwhile, in the v7.19 changelog:
*) conntrack - improved stability on busy systems;
*) system - improved system stability when sending TCP data from the router;
*) x86 - i40e updated driver to version 2.27.8;
 
User avatar
infabo
Forum Guru
Forum Guru
Posts: 1746
Joined: Thu Nov 12, 2020 12:07 pm

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Wed Jun 11, 2025 12:20 pm

You all right. I feel stupid relying on a company whose main focus is selling hardware devices, making the software aspect kind of like Cinderella.
Microsoft is selling a software called Windows and making it run flawlessly on any hardware can be a challenge as well? So you really want to make Mikrotik responsible for not supporting every possibly known hardware combination worldwide ever custom built? But you made one point: Mikrotik should declare Hardware requirements for x86 platform.
 
pushkink
just joined
Topic Author
Posts: 11
Joined: Thu May 01, 2025 5:04 pm

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Wed Jun 11, 2025 4:34 pm

Meanwhile, in the v7.19 changelog:
*) conntrack - improved stability on busy systems;
*) system - improved system stability when sending TCP data from the router;
*) x86 - i40e updated driver to version 2.27.8;
And... nothing has changed.
image (3).png
You do not have the required permissions to view the files attached to this post.
 
User avatar
Larsa
Forum Guru
Forum Guru
Posts: 1986
Joined: Sat Aug 29, 2015 7:40 pm
Location: The North Pole, Santa's Workshop

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Wed Jun 11, 2025 5:56 pm

That looks a bit weird with that steady increase between 12:30 and 14:30, almost like a memory leak. Do a full export so we can take a look, and maybe “someone” might even have time to do a quick test in the lab. What kind of traffic is going on?

Regarding hardware support, since there’s no official list of supported hardware, you pretty much have to email support and ask. As a rule of thumb, you can assume most mainstream x86 drivers are included from Linux 5.6.3, plus a few legacy drivers that have been ported over from ROS v6.
 
pushkink
just joined
Topic Author
Posts: 11
Joined: Thu May 01, 2025 5:04 pm

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Wed Jun 11, 2025 6:36 pm

# 2025-06-11 10:27:46 by RouterOS 7.18.2
#
/interface ethernet set [ find default-name=ether1 ] disable-running-check=no name=lan
/interface ethernet set [ find default-name=ether4 ] disable-running-check=no name=wan
/interface vrrp add group-authority=self interface=wan name=vrrp-wan priority=200 vrid=10
/interface vrrp add group-authority=vrrp-wan interface=lan name=vrrp-lan vrid=20
/interface list add name=LAN
/interface list add name=WAN
/ip ipsec policy group add name=AWS
/ip ipsec profile add dh-group=modp1024 dpd-interval=10s dpd-maximum-failures=3 enc-algorithm=aes-128 lifetime=8h name=profile-vpn-xxxxxxxxxxxxxxxxxxx
/ip ipsec peer add address=aaa.aaa.aaa.aaa/32 disabled=yes exchange-mode=ike2 local-address=xxx.xxx.xxx.70 name=aaa.aaa.aaa.aaa profile=profile-vpn-xxxxxxxxxxxxxxxxxxx
/ip ipsec peer add address=bbb.bbb.bbb.bbb/32 exchange-mode=ike2 local-address=xxx.xxx.xxx.70 name=bbb.bbb.bbb.bbb profile=profile-vpn-xxxxxxxxxxxxxxxxxxx
/ip ipsec proposal add enc-algorithms=aes-128-cbc lifetime=1h name=ipsec-vpn-xxxxxxxxxxxxxxxxxxx
/port set 0 name=serial0
/port set 1 name=serial1
/queue interface set lan queue=multi-queue-ethernet-default
/queue interface set wan queue=multi-queue-ethernet-default
/ip firewall connection tracking set enabled=yes
/ip neighbor discovery-settings set discover-interface-list=LAN
/ip settings set allow-fast-path=no
/interface list member add interface=vrrp-lan list=LAN
/interface list member add interface=wan list=WAN
/interface list member add interface=lan list=LAN
/interface list member add interface=vrrp-wan list=WAN
/ip firewall filter add action=accept chain=input comment="accept established,related,untracked" connection-state=established,related,untracked
/ip firewall filter add action=drop chain=input comment="drop invalid" connection-state=invalid log-prefix=invalid
/ip firewall filter add action=accept chain=input comment="ipsec policy matcher" ipsec-policy=in,ipsec
/ip firewall filter add action=accept chain=input comment="allow local networks" src-address-list=lan-list
/ip firewall filter add action=accept chain=input comment="allow other local wan addresses" src-address-list=wan-list
/ip firewall filter add action=accept chain=input comment="allow IPSec IKE" dst-port=500,4500 in-interface-list=WAN protocol=udp
/ip firewall filter add action=accept chain=input comment="allow IPSec AH" in-interface-list=WAN protocol=ipsec-ah
/ip firewall filter add action=accept chain=input comment="allow IPSec ESP" in-interface-list=WAN protocol=ipsec-esp
/ip firewall filter add action=accept chain=input comment="allow Winbox" dst-port=8291 in-interface-list=WAN protocol=tcp src-address-list=trusted-list
/ip firewall filter add action=accept chain=input comment="allow SSH" dst-port=22 in-interface-list=WAN protocol=tcp src-address-list=trusted-list
/ip firewall filter add action=accept chain=input comment="Allow SNMP" dst-port=161 in-interface-list=WAN protocol=udp src-address-list=trusted-list
/ip firewall filter add action=accept chain=input comment="Allow ICMP" in-interface-list=WAN protocol=icmp src-address-list=trusted-list
/ip firewall filter add action=accept chain=input comment="allow VRRP" protocol=vrrp
/ip firewall filter add action=drop chain=input comment="block everything else"
/ip firewall filter add action=accept chain=forward comment="ipsec in policy matcher" ipsec-policy=in,ipsec
/ip firewall filter add action=accept chain=forward comment="ipsec out policy matcher" ipsec-policy=out,ipsec
/ip firewall filter add action=accept chain=forward comment="Established, Related" connection-state=established,related
/ip firewall filter add action=accept chain=forward comment="accept internal traffic" src-address-list=lan-list
/ip firewall filter add action=accept chain=forward comment="accept vpc traffic" src-address-list=vpc-list
/ip firewall filter add action=drop chain=forward comment="Drop invalid" connection-state=invalid log=yes log-prefix=invalid
/ip firewall filter add action=drop chain=forward comment="block everything else"
/ip firewall nat add action=accept chain=srcnat comment="ipsec no nat" ipsec-policy=out,ipsec
/ip firewall nat add action=src-nat chain=srcnat comment="src-nat non-ipsec" ipsec-policy=out,none out-interface-list=WAN to-addresses=xxx.xxx.xxx.70
/ip firewall raw add action=notrack chain=prerouting dst-address=10.0.0.0/8 src-address=10.0.0.0/8
/ip ipsec identity add peer=aaa.aaa.aaa.aaa 
/ip ipsec identity add peer=bbb.bbb.bbb.bbb
/ip ipsec policy add action=none comment="bypass encryption to local networks" dst-address=yyy.yyy.yyy.0/18 src-address=0.0.0.0/0
/ip ipsec policy add dst-address=169.254.214.201/32 peer=bbb.bbb.bbb.bbb proposal=ipsec-vpn-xxxxxxxxxxxxxxxxxxx src-address=0.0.0.0/0 tunnel=yes
/ip ipsec policy add dst-address=10.0.0.0/8 peer=bbb.bbb.bbb.bbb proposal=ipsec-vpn-xxxxxxxxxxxxxxxxxxx src-address=yyy.yyy.yyy.0/18 tunnel=yes
/ip route add gateway=xxx.xxx.xxx.65
/ip route add comment="Advertise local IPs via BGP" disabled=no distance=1 dst-address=yyy.yyy.yyy.5/32 gateway=lo routing-table=main scope=30 suppress-hw-offload=no target-scope=10
/ip route add comment="Advertise local IPs via BGP" disabled=no distance=1 dst-address=zzz.zzz.zzz.0/24 gateway=lan routing-table=main scope=30 suppress-hw-offload=no target-scope=10
/ip route add comment="Advertise local IPs via BGP" disabled=no distance=1 dst-address=yyy.yyy.yyy.3/32 gateway=lo routing-table=main scope=30 suppress-hw-offload=no target-scope=10
/ip service set telnet disabled=yes
/ip service set ftp disabled=yes
/ip service set www disabled=yes
/ip service set api disabled=yes
/ip ssh set strong-crypto=yes
/routing bgp connection add as=65570 disabled=no input.filter=bgp-in local.address=169.254.214.202 .role=ebgp name=BGP-169.254.214.201 output.filter-chain=bgp-out .network=bgp-networks remote.address=169.254.214.201/32 .as=64515 routing-table=main
/routing bgp connection add as=65570 disabled=no input.filter=bgp-in local.address=169.254.158.118 .role=ebgp name=BGP-169.254.158.117 output.filter-chain=bgp-out .network=bgp-networks remote.address=169.254.158.117/32 .as=64515 routing-table=main
/routing filter rule add chain=bgp-in-pri comment="Main link input filter" disabled=no rule="set bgp-local-pref 200;  set distance 19; jump bgp-in;"
/routing filter rule add chain=bgp-in-sec comment="Secondary link input filter" disabled=no rule="set bgp-local-pref 100; set distance 21; jump bgp-in;"
/routing filter rule add chain=bgp-in comment="Exclude local networks from AWS advertisements" disabled=no rule="if (dst in lan-list) { reject;}"
/routing filter rule add chain=bgp-in comment="Set a preferred IP for outgoing connections and accept all routes" disabled=no rule="set pref-src yyy.yyy.yyy.5; accept;"
/routing filter rule add chain=bgp-out-pri comment="Main link output filter" disabled=no rule="set bgp-out-med 50; jump bgp-out;"
/routing filter rule add chain=bgp-out-sec comment="Secondary link output filter" disabled=no rule="set bgp-out-med 100; jump bgp-out;"
/routing filter rule add chain=bgp-out comment="Announce only the approved networks" disabled=no rule="if (dst in bgp-networks) { accept; } else { reject; }"
/system logging add disabled=yes topics=ipsec
/system logging add topics=vrrp
/system logging add disabled=yes topics=ipsec,!debug
/system logging add disabled=yes topics=dns
/system logging add disabled=yes topics=bgp
/system logging add disabled=yes topics=route
/system logging add disabled=yes topics=ipsec
/system note set show-at-login=no
/system ntp client set enabled=yes
/system ntp server set enabled=yes
/system ntp client servers add address=0.us.pool.ntp.org
/tool bandwidth-server set enabled=no
/tool mac-server set allowed-interface-list=LAN
/tool mac-server mac-winbox set allowed-interface-list=LAN
 
User avatar
chechito
Forum Guru
Forum Guru
Posts: 3316
Joined: Sun Aug 24, 2014 3:14 am
Location: Bogota Colombia
Contact:

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Wed Jun 11, 2025 7:13 pm

Have you considered/discarded the possibility that this is being caused by an elephant flow?
https://en.wikipedia.org/wiki/Elephant_flow
 
pushkink
just joined
Topic Author
Posts: 11
Joined: Thu May 01, 2025 5:04 pm

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Wed Jun 11, 2025 7:17 pm

Have you considered/discarded the possibility that this is being caused by an elephant flow?
https://en.wikipedia.org/wiki/Elephant_flow
TIL :)

No, I have a bunch of clients browsing the internet, and they don't have any long-running elephant flows.
 
User avatar
sirbryan
Long time Member
Long time Member
Posts: 524
Joined: Fri May 29, 2020 6:40 pm
Location: Utah
Contact:

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Wed Jun 11, 2025 7:43 pm

Have you tried 7.15.3 or 7.16.x? I know my CRS300's had issues with 7.15.3 causing random reboots due to a memory problem, so 7.16.2 fixed those, but 7.16.x on my CCR2116's would have random BGP "stuck route" problems, so they were great on 7.15.3 (and now 7.19.1).
 
User avatar
Larsa
Forum Guru
Forum Guru
Posts: 1986
Joined: Sat Aug 29, 2015 7:40 pm
Location: The North Pole, Santa's Workshop

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Wed Jun 11, 2025 9:11 pm

@pushkink, any idea what kind of traffic pattern we’re seeing here? Could be handy to know, just to match it up with the timeline.
 
peerlnk
just joined
Posts: 4
Joined: Wed Sep 27, 2023 4:39 am

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Thu Jun 12, 2025 5:41 pm

I have also encountered your problem. When the number of routers exceeds 32 cores, various strange and unusual issues arise
Try disabling hyper threading or shutting down some cores, and most likely the problem will be resolved
 
User avatar
TomjNorthIdaho
Forum Guru
Forum Guru
Posts: 1614
Joined: Mon Oct 04, 2010 11:25 pm
Location: North Idaho
Contact:

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Thu Jun 12, 2025 5:57 pm

I have also encountered your problem. When the number of routers exceeds 32 cores, various strange and unusual issues arise
Try disabling hyper threading or shutting down some cores, and most likely the problem will be resolved
Re; ... I have also encountered your problem. When the number of routers exceeds 32 cores ...

I've seen this many times when I increase the CPU cores on a CHR vm ( such as increasing from 24 cores to 40 ).
I never took the time to find the magic number where a CHR starts to fall apart when increasing CPU cores.
Question - Are you finding that it happens at greater than 32 cores or somewhere around 32 cores?
Question - All CHR ROS version's ( 6.x and 7.x ) ?
 
pushkink
just joined
Topic Author
Posts: 11
Joined: Thu May 01, 2025 5:04 pm

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Thu Jun 12, 2025 7:10 pm

I have also encountered your problem. When the number of routers exceeds 32 cores, various strange and unusual issues arise
Try disabling hyper threading or shutting down some cores, and most likely the problem will be resolved
That's an intriguing idea, thank you! I'll definitely give it a try on Monday.
 
peerlnk
just joined
Posts: 4
Joined: Wed Sep 27, 2023 4:39 am

Re: High CPU usage on single core (Supermicro Server, ROS 7.18.2) - likely SNAT issue

Thu Jun 12, 2025 7:21 pm

I have also encountered your problem. When the number of routers exceeds 32 cores, various strange and unusual issues arise
Try disabling hyper threading or shutting down some cores, and most likely the problem will be resolved
Re; ... I have also encountered your problem. When the number of routers exceeds 32 cores ...

I've seen this many times when I increase the CPU cores on a CHR vm ( such as increasing from 24 cores to 40 ).
I never took the time to find the magic number where a CHR starts to fall apart when increasing CPU cores.
Question - Are you finding that it happens at greater than 32 cores or somewhere around 32 cores?
Question - All CHR ROS version's ( 6.x and 7.x ) ?
Yes, I have a large number of x86 physical machines with installed routeros environments. The basic problems occur when the single core is 100% larger than 32 cores. I also encountered machines with 72 cores and X86 cores. If there is a slight amount of traffic, the device will automatically restart immediately. Trying to disable the watchdog also did not work until the hyper threading was turned off. There are also some machines with dual 20 cores and 40 threads that cannot solve the problem of a single core being full. Changing the CPU will work normally. These problems are extremely easy to occur in machines with more than 32 cores. Until recently, I tried to test the cracked version of routeros, which has a shell that can enter the system bottom layer. Then I wrote a Python script to optimize the handling of IRQ interrupts, bind CPUs, enable XPS, and modify them. The values of net.cre.netdev-budget and net.cre.rps_stock_flow_detries in the kernel indicate that I have 72 x86 cores The router can work normally now, without the problem of CPU single core being fully occupied or automatic restart

Because I need to use the IPv6 version of router, it has been abandoned and I have only tried the V7 version. I have also encountered situations where 32 core network cards only work on some CPUs in the V7 version CHR. My environment has 32 VLAN based WAN interfaces, with 2 IXGBE 10G interfaces, and CPU0.. 32 is bound to each network card queue. Only 0-15 works normally