High CPU Load on MikroTik CCR2216 with PPPoE and NAT

I have a MikroTik CCR2216 setup with the following configuration:

Two CCR2216 devices are used for PPPoE, each handling approximately 3000 customers.
NAT is managed by one CCR2216.

Issue:

I am experiencing high CPU load (>70%) on the CCR2216 that is handling NAT, which is impacting performance.

Questions:

1.What might be causing the high CPU load in this configuration?
2.Are there any recommended configurations or optimizations for handling such a large number of NAT connections?
3.Would it be advisable to distribute the NAT load between additional devices?

Additional Information:

Current RouterOS version.
Any relevant logs or CPU usage statistics.

1 Like

If you’re not a bot, look for a certified consultant available in your area.

Dealing with 3000 6000 customers shouldn’t be done by amateurs and forum trolls.

2 Likes

Just move NAT on x86/amd machine.

@Znevna is right, professionals should know that PPPoE and NAT are never done on the same machine... it only takes one user losing connection, for whatever reason, and the whole machine slows down to clear the connection tables... that's why separate machines are used, so that a trivial flickering of some Ethernet doesn't block the connection of 6000 people for minutes...

While reading that post again, he does seem to have 2x CCR2116 for PPPoE and one extra CCR2116 for NAT. Either way, those customers deserve professional help.

Written like that, as we are used to speaking in Italy, it seems like he has two machines, and "only one" (implicit: of the two) does also the NAT.

In Italy we would have said that we have two machines for PPPoE and another that does the NAT

But then I notice this:

which makes me think of a bot that forgot to paste the data in the right place...

I am looking for a solution to reduce CPU utilization. When CPU load increases, I observe packet loss and ping timeouts, which indicate network performance degradation.

Yep, but - at the moment - you are failing to clearly describe your setup, your post is ambiguous, from what you wrote you may have:

  1. TWO CCR2216 devices in total, one managing 3000 users PPPoE and the other one managing 3000 PPPoE users AND managing NAT
  2. THREE CCR2216 devices in total, one managing 3000 users PPPoE, the second one managing 3000 PPPoE users and the third one managing NAT
  3. something else

Back to your questions:
Q1. What might be causing the high CPU load in this configuration?
A1. Something in configuration, assumed that it is not normal with so many customers.

Q2.Are there any recommended configurations or optimizations for handling such a large number of NAT connections?
A2. Maybe yes, maybe no.

Q3. Would it be advisable to distribute the NAT load between additional devices?
A3. Yes, meaning no :astonished_face:, a machine doing NAT should ONLY do NAT (and needs to be powerful enough for the amount of all customers).

s

Show outputs under load on NAT router

/tool/profile cpu=all

/tool profile cpu=total

/system resource print

/interface print stats

How may NAT and filter rules, show them
Do you have fasttrack enabled ?

When you test from internal network/user machine where is packet loss and ping timeouts, BRAS ,core NAT, uplink ?

Any fping / mtr output may help …

p.s L3 Hardware Offloading - RouterOS - MikroTik Documentation

  • I have not yet done detailed customer-side ping checks.

  • When I do test during peak hours, I see some ping timeouts (packet loss).

  • My uplink capacity is 25 Gbps, so the issue is not due to uplink overload.

  • FastTrack is disabled. I tried enabling it once, but it caused problems — some customers experienced reduced download and upload speeds.

  • I do not apply any firewall filter rules

You must play with FastTrack - Connection tracking - RouterOS - MikroTik Documentation

MT claim CCR2216 have partial HW offload/nat ,try this - L3 Hardware Offloading - RouterOS - MikroTik Documentation

Print nat rule/s .

If you are familiar with Linux ,migrate NAT to Vyos (nftables/VPP).

Official test results indicate that without L3 HW offload and without very carefully configured firewall things, this device is capable of dealing with anything between 11Gbps and 30Gbps. Could be you're hitting current performance plateau?

When thinking about L3HW, beware of limitations: 4.5k fasttrack connections and 4k NAT entries ... if only IPv4 is used. These numbers don't seem enough in your use case (2x3k subscribers).

Why dont you post a full config ? And we stop guessing?

still you can guess!

4 Likes

Some of this might be useful.

I dont think it is not polite at all wasting our time that are here to help for free. If you think that people will stay here GUESSING because you dont post your config… you are wrong. See ya.

1 Like

I dont think CCR2216 can handle ~10K concurrent connections. They do market 3-X0Gbps throughput but in real world its totally different.