Community discussions

MikroTik App
 
AlexX9
just joined
Topic Author
Posts: 5
Joined: Sun Apr 14, 2024 2:20 am

Low performance on RB5009 with machine behind NAT

Sun Apr 14, 2024 2:58 am

I just got a RB5009 and I'm trying to figure out how I can get decent performance when connecting to many different hosts/ports. I have a server behind NAT that needs to check for open ports on a large subnet, but I'm only reaching around ~150K pps/~80 Mbps.

Using /tool/profile shows that firewall is around ~75 % and networking around ~15 % CPU at 300K pps in from the server.

The interface the server is connected to and that are a part of the bridge reports: RX 300K pps, FP RX 150K pps
The bridge: RX 150k pps, FP RX 150K pps
The WAN interface: TX 150K pps

Why is only 50 % of the packets showing up as FP and on the bridge? Is there any way to make it faster?
 
tangent
Forum Guru
Forum Guru
Posts: 1411
Joined: Thu Jul 01, 2021 3:15 pm
Contact:

Re: Low performance on RB5009 with machine behind NAT

Sun Apr 14, 2024 5:07 am

I have a server behind NAT

Why isn't it bridged to the LAN it needs to examine instead?

Why is only 50 % of the packets showing up as FP

Because that decision can't be made until after the first SYN is seen, where the default firewall applies the fasttrack-connection flag.

This is one of many costs of running traffic through a firewall on the CPU instead of bridging it directly.
 
AlexX9
just joined
Topic Author
Posts: 5
Joined: Sun Apr 14, 2024 2:20 am

Re: Low performance on RB5009 with machine behind NAT

Sun Apr 14, 2024 7:41 am

Why isn't it bridged to the LAN it needs to examine instead?
The interface the server is connected to is part of the default LAN bridge, if that's what you mean. I'm using defconf and have only enabled IPv6, added Hairpin NAT and some port forwards. If what you're asking is why I don't scan from within the same network then that's because I have firewalls upstream, so that would not give me the correct results.

Because that decision can't be made until after the first SYN is seen, where the default firewall applies the fasttrack-connection flag.
Yeah, that makes sense, so only the SYN-ACK package uses fast path. Thank you for the explanation. :)

This is one of many costs of running traffic through a firewall on the CPU instead of bridging it directly.
What do you mean by bridging it directly? I'd love if I could somehow speed up forwarding of theese packets.
 
tangent
Forum Guru
Forum Guru
Posts: 1411
Joined: Thu Jul 01, 2021 3:15 pm
Contact:

Re: Low performance on RB5009 with machine behind NAT

Sun Apr 14, 2024 9:35 am

I have firewalls upstream, so that would not give me the correct results.

So you're measuring the speed of the firewalls, not the speed of the network.

Take a look at the RB5009 test results. Your application is the lower rightmost number in the first table, tiny packet sizes, so that almost nothing gets fast-tracked, and you have as close to 100% packet overhead as possible.

Now yes, that same table claims the RB5009 can do better than this, but that's an aggregate multi-port test. Atop that, do you expect that this effect doesn't apply to these other firewalls?

In debugging, you simplify the problem to the point that you have one variable at a time, even if that makes the result "inaccurate" by some standard. If the RB5009 bridged to the test LAN is fast, the problem isn't the RB5009. If you back it off one level and now it's slow, you have the culprit.

Bisect and conquer!
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 11673
Joined: Thu Mar 03, 2016 10:23 pm

Re: Low performance on RB5009 with machine behind NAT

Sun Apr 14, 2024 10:47 am

Take a look at the RB5009 test results. Your application is the lower rightmost number in the first table, ...

Not even that. Tests are using normal long-living connections, so even tests which use tiny packets, can benefit of fast-tracking.

OP is doing port scanning, which means that every third packet (or so) means a new connection. Not only this skips fast-tracking, connection-tracking machinery has much more to do (first much more work to find out that it's a new connection compared to finding existing connection) and additionally allocating structure for new connection. And optionally dropping some old structure for another port scan which was done a few tens of seconds ago because connection tracking table gets huge, consuming too much of limited router's resources.

So yes, it's no wonder things are not going wirespeed.
 
AlexX9
just joined
Topic Author
Posts: 5
Joined: Sun Apr 14, 2024 2:20 am

Re: Low performance on RB5009 with machine behind NAT

Sun Apr 14, 2024 12:36 pm

In debugging, you simplify the problem to the point that you have one variable at a time, even if that makes the result "inaccurate" by some standard. If the RB5009 bridged to the test LAN is fast, the problem isn't the RB5009. If you back it off one level and now it's slow, you have the culprit.
Considering the CPU is pinned and the RB5009 becoming effectively unresponsive, the problem is clearly the routing performance of the RB5009, and that is why I created this post so somebody who has knowledge about Mikrotik hopefully can explain why the performance is so bad.

Not even that. Tests are using normal long-living connections, so even tests which use tiny packets, can benefit of fast-tracking.
From what I can see, the tests are not at all using long-living connections. The test seems to be using UDP. I guess that makes things quite a bit easier to handle. However, I don't need connection tracking for this traffic. The test results states 761k pps at 64B packets with 25 ip filter rules, without fast path, and thats around twice the amount of what I'm seeing, even when using fast path for half the packages.

Is it possible to disable connection tracking for the scanner, while still swapping the LAN IP with WAN IP? If I use Raw in the firewall to set "no track" on this traffic then NAT rules doesn't seem to be touched, but I need netmap to swap the IPs back and forth.

Is there some other way to swap the addresses that works without connection tracking?
 
tangent
Forum Guru
Forum Guru
Posts: 1411
Joined: Thu Jul 01, 2021 3:15 pm
Contact:

Re: Low performance on RB5009 with machine behind NAT

Sun Apr 14, 2024 2:08 pm

I created this post so somebody who has knowledge about Mikrotik hopefully can explain why the performance is so bad.

You already got that, to a lesser extent from me, and then mkx, who's about as knowledgeable as it gets around here.

The test seems to be using UDP. I guess that makes things quite a bit easier to handle.

No, worse. mkx's point about long-lived connections is that the longer a TCP connection lasts, the closer to zero the overhead of setting up the Fast-Tracking amortizes. With UDP and no predictable flows following, the router becomes 100% CPU bound.

The test results states 761k pps

Yes, and as I pointed out, that's a multi-port aggregate test, not a single-stream single-port test. mkx's point builds atop that.
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 19475
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: Low performance on RB5009 with machine behind NAT

Sun Apr 14, 2024 3:19 pm

First mistake, not using IPV4 :-) ( Dark Nate is going to crucify me )
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 11673
Joined: Thu Mar 03, 2016 10:23 pm

Re: Low performance on RB5009 with machine behind NAT

Sun Apr 14, 2024 3:48 pm

Is it possible to disable connection tracking for the scanner, while still swapping the LAN IP with WAN IP?

Nope, NAT relies on connection tracking. So no connection tracking, no NAT. At least in ROS.
 
AlexX9
just joined
Topic Author
Posts: 5
Joined: Sun Apr 14, 2024 2:20 am

Re: Low performance on RB5009 with machine behind NAT

Mon Apr 15, 2024 1:45 am

The test results states 761k pps
Yes, and as I pointed out, that's a multi-port aggregate test, not a single-stream single-port test. mkx's point builds atop that.
What you're saying makes no sense. It's not like each interface is dedicated to it's own single CPU core, so using more ports won't make the CPU process the packets any faster.

Is it possible to disable connection tracking for the scanner, while still swapping the LAN IP with WAN IP?
Nope, NAT relies on connection tracking. So no connection tracking, no NAT. At least in ROS.
Well, that sucks, guess this was a bad choice then. It's kinda weird that it's not possible to swap the address of packets without using connection tracking. I guess I can find some other use for it.
 
tangent
Forum Guru
Forum Guru
Posts: 1411
Joined: Thu Jul 01, 2021 3:15 pm
Contact:

Re: Low performance on RB5009 with machine behind NAT

Mon Apr 15, 2024 4:37 am

It's not like each interface is dedicated to it's own single CPU core

You’re presuming an implementation. I thought you came here to ask how RouterOS works, not tell us.

We forum denizens are fellow end users for the most part, not RouterOS software engineering insiders, but one thing I can confidently predict from past experience as an outsider is that the more independent packet flows, the greater freedom you give the routing engine to divide the work among the cores. We see that in independent testing again and again.

Easy disconfirming test for you to go and prove me soundly wrong: try running this scanner of yours from a second host connected to a second port, in parallel with the first. Try it both against the same target and an independent target. Does the aggregate PPS rate go up, down, or stay the same?

I do not expect to be alone in my interest to read your results.

It's kinda weird that it's not possible to swap the address of packets without using connection tracking.

Even with UDP, changing a packet’s destination address requires changing its checksum field, which requires CPU resources unless you’re on a CRS3xx class device and can make use of IPv4 NAT offloading, neither of which is true in your case.

(Replacing your RB5009 with a CRS309 fixes only one out of the three obstacles you’ve set for yourself, the others being IPv6 and the inability to get flows into the fast-track path.)

What might work for you is to put an Ethernet switch rule ahead of the NAT rule, matching your scanner's packets and preempting the routing layer with a “new-dst-ports” directive.

Note that “ports” in this instance refers to one or more physical device ports to copy the packet to, not to rewriting the UDP destination port, contravening my earlier point.
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 11673
Joined: Thu Mar 03, 2016 10:23 pm

Re: Low performance on RB5009 with machine behind NAT

Mon Apr 15, 2024 8:54 am


Yes, and as I pointed out, that's a multi-port aggregate test, not a single-stream single-port test. mkx's point builds atop that.
What you're saying makes no sense. It's not like each interface is dedicated to it's own single CPU core, so using more ports won't make the CPU process the packets any faster.

Packet processing (e.g. NAT) adds some latency to end-to-end packet flow. So if you do the port scanning in sequence, the end-to-end delay will severely reduce pace of scanning.
Due to problems that true parallel processing of packets might induce (e.g. out-of-order delivery), in ROS same CPU core does processing of all packets belonging to same connection. Which means that with multi-core CPU you may see a single core to get hit the most while other cores might be almost idle. The core being "hammered" will likely change, but the pattern will remain. If you'd run port scanning in parallel (e.g. sending probes to multiple ports on remote side), then traffic would be handled by multiple CPU cores in parallel, somehow increasing overall throughput (interrupt handling would then become a bottleneck - I'm talking about CPU interrupt per packet received by CPU, with small packets there are many interrupts at relatively low bps).
 
Moba
Member Candidate
Member Candidate
Posts: 211
Joined: Sun Sep 27, 2020 6:15 pm

Re: Low performance on RB5009 with machine behind NAT

Tue Apr 16, 2024 12:19 am

Even high-end firewall devices can be overwhelmed without mitigation under attack once cores are loaded/buffers are full. The 5009 is a small wonder router, but it's still a low power ARM device. Basically, if it can't be done by the ASIC, it has to be done on the CPU. If you know how to run software another way, I'd like to learn.

The knowledgeable users have already given you sound answers...
 
AlexX9
just joined
Topic Author
Posts: 5
Joined: Sun Apr 14, 2024 2:20 am

Re: Low performance on RB5009 with machine behind NAT

Tue Apr 16, 2024 3:06 pm

It's not like each interface is dedicated to it's own single CPU core
You’re presuming an implementation. I thought you came here to ask how RouterOS works, not tell us.
What's up with this toxicity? I'm not presuming, I checked, and it's true. I am here to ask, but when you say the reason for the problem is something that is not true and doesn't make any sense then it must be possible to say so.
 
tangent
Forum Guru
Forum Guru
Posts: 1411
Joined: Thu Jul 01, 2021 3:15 pm
Contact:

Re: Low performance on RB5009 with machine behind NAT

Tue Apr 16, 2024 3:29 pm

What's up with this toxicity?

That's not the intent. I'm reacting to a combination of things. You currently have a post count of five, and yet you are insisting that you know how RouterOS works internally. I believe my years of experience counts for something here, but at the same time, I've taken a properly scientifically skeptical position above. I'm willing to be swayed.

To that end, I asked you for a test result. What happened when you doubled the number of scanning hosts?

Argument from incredulity is not science.
 
DarkNate
Forum Guru
Forum Guru
Posts: 1032
Joined: Fri Jun 26, 2020 4:37 pm

Re: Low performance on RB5009 with machine behind NAT

Mon Apr 29, 2024 8:47 pm

OP is yet another victim of the configuration abstraction complexity of MikroTik, again.

Root cause can't be determined without config dump, but this is screaming typical Linux bridge misconfiguration. But OP is clearly an expert in switchdev/Linux DSA paradigm, so I'll leave it here.
 
pe1chl
Forum Guru
Forum Guru
Posts: 10263
Joined: Mon Jun 08, 2015 12:09 pm

Re: Low performance on RB5009 with machine behind NAT

Mon Apr 29, 2024 11:12 pm

One problem with the RB5009 you need to be aware of is that it has 4 cores and variable clock speed.
It will normally run at 350 MHz but it can kick up to 1400 MHz when the OS decides that this is required.

Unfortunately the mechanisms used to do this speed governing seem to be not optimal for routers, and certainly not for the test being done:
1. the governor seems to be working with total system load. so when a task loads only one core, it sees a system load of 25% max, and is reluctant to increase the clock
2. the switch up/down is quite rapid, it seems that when there is bursty load it does not remain at high frequency for the duration of the test

One way to circumvent this is to just set the CPU speed to 1400 MHz instead of "auto". In theory it will run hotter, in practice there does not seem to be nearly as much difference as there is for the Intel processors.
 
tangent
Forum Guru
Forum Guru
Posts: 1411
Joined: Thu Jul 01, 2021 3:15 pm
Contact:

Re: Low performance on RB5009 with machine behind NAT

Tue Apr 30, 2024 12:13 am

configuration abstraction complexity of MikroTik

The way I summarize that thread's application to this one is that there is some RouterOS configuration change that would somehow cause the OP's application to proceed much faster, and the only reason it isn't being done is that there are too many possible ways to do it, and the OP has hit on the wrong one. Have I misapplied your other thread's thesis here, @DarkNate?

I'll grant that I may be overlooking something due to not having studied a config /export, but I don't see any possible configuration change you could in principle make to overcome the combination of software NAT (RB5009 can't HW offload NAT), tiny packet sizes, a single packet source, and no predictable connection flows, thus no fast-tracking.

The OP scarcely could've invented a worse torture test for a NAT router, with intent and malice aforethought.
 
DarkNate
Forum Guru
Forum Guru
Posts: 1032
Joined: Fri Jun 26, 2020 4:37 pm

Re: Low performance on RB5009 with machine behind NAT

Wed May 01, 2024 12:10 pm

The way I summarize that thread's application to this one is that there is some RouterOS configuration change that would somehow cause the OP's application to proceed much faster, and the only reason it isn't being done is that there are too many possible ways to do it, and the OP has hit on the wrong one. Have I misapplied your other thread's thesis here, @DarkNate?

I'll grant that I may be overlooking something due to not having studied a config /export, but I don't see any possible configuration change you could in principle make to overcome the combination of software NAT (RB5009 can't HW offload NAT), tiny packet sizes, a single packet source, and no predictable connection flows, thus no fast-tracking.

The OP scarcely could've invented a worse torture test for a NAT router, with intent and malice aforethought.
That thread linked of mine isn't a Thesis, it's about UI/UX design flaws of RouterOS, and this includes the complex Linux bridging concept itself. It is complex at its roots to begin, Linux bridge doesn't have good UI/UX design nor good docs on the Linux man pages to begin with, and this trickled down to impact MikroTik as well, which relies on the original Linux kernel "data plane" for layer 2 switching (bridge, VLANs, STP, VLAN filtering etc).

It's not just about L3HW offloading of this or that, but also involved the intricate and complex layer 2 offloading AND layer 2 fast-path (single bridge + VLAN filtering on most MikroTik hardware).

There's no one-size fits all for Mikrotik, but that's also the problem with Linux switchdev/dsa/bridging itself.

In Juniper, Nokia and Cisco or Huawei, this issue doesn't exist because the network software programmers in those companies decided to build their own platform-specific layer 2 source code and implementation that from day one didn't have this complexity. That is why on Cisco/Juniper whatever, the concept of bridging (called IRB) is more or less unified and configured similarly on all of their modern hardware.

MikroTik cannot fix this design flaw without a complete overhaul of their source code, which obviously isn't going to happen, maybe it'll happen on ROSv8, but it also means a complete re-design of their CLI to be modern declarative config (Juniper type) and modernised API/rest API, also not cheap, and I doubt MikroTik has the financial resources to hire such expert network programmers who in the USA market are worth like $500k a year base comp to begin with (I have some friends in this segment of the industry and that's how I know how much they get paid). MikroTik is a small European company, and pretty much zero European companies in tech can afford to pay a network programmer lead $500k or more.
 
tangent
Forum Guru
Forum Guru
Posts: 1411
Joined: Thu Jul 01, 2021 3:15 pm
Contact:

Re: Low performance on RB5009 with machine behind NAT

Wed May 01, 2024 6:39 pm

That thread linked of mine isn't a Thesis

I’m using that word in the “proposition stated as the basis of an argument to be proven” sense, not the “doctoral dissertation” sense.

I do assume you are interested in reasoned argumentation over mere argumentativeness, yes?

Linux bridge doesn't have good UI/UX design nor good docs on the Linux man pages to begin with, and this trickled down to impact MikroTik as well

If we’re agreeing that nothing the OP can do with the stated configuration will get the packets off the CPU, then I don’t see how MT can fix this thread’s symptom with a better software bridge design. The hardware’s PPS rate limitations are fixed at design time, modulo details like the clock rate setting pe1chl brought up.

If instead you’re suggesting that a better software design would somehow offload a configuration like this to the RB5009’s preexisting switch chip and allow it to proceed at line rate, I think you’re overlooking the heterogeneous nature of MT hardware. Unlike the big guys you revere, MT doesn’t get to design custom ICs that support their idealized software designs. The only way to prevent the plumbing from poking up through the porcelain in places when you use this many different COTS chip designs is to nerf all designs to a least-common-denominator level. Under that type of design, the only HW features exposed are those present in all chips used.

MT took the opposite path: expose all chip features, requiring the user to know what those are and avoid designs that require RouterOS to activate one of the abstractions you rail against, in order to emulate a missing ASIC feature in software.

MikroTik cannot fix this design flaw without a complete overhaul of their source code

If you add “custom ASICs” to support that software, then yes, I agree that would result in a cleaner implementation…

…but you then wouldn’t have a $59 hEX on offer.
 
DarkNate
Forum Guru
Forum Guru
Posts: 1032
Joined: Fri Jun 26, 2020 4:37 pm

Re: Low performance on RB5009 with machine behind NAT

Wed May 01, 2024 6:55 pm

If we’re agreeing that nothing the OP can do with the stated configuration will get the packets off the CPU, then I don’t see how MT can fix this thread’s symptom with a better software bridge design. The hardware’s PPS rate limitations are fixed at design time, modulo details like the clock rate setting pe1chl brought up.
OP needs to export his config dump for us to review. It's likely misconfig and/or broken L2 fastpath/L3 fastpath (different from offloading to the switch chip).
If instead you’re suggesting that a better software design would somehow offload a configuration like this to the RB5009’s preexisting switch chip and allow it to proceed at line rate, I think you’re overlooking the heterogeneous nature of MT hardware. Unlike the big guys you revere, MT doesn’t get to design custom ICs that support their idealized software designs. The only way to prevent the plumbing from poking up through the porcelain in places when you use this many different COTS chip designs is to nerf all designs to a least-common-denominator level. Under that type of design, the only HW features exposed are those present in all chips used.
Cut the simping for MikroTik. MikroTik relies on merchant silicon (Marvell is their choice) just like Cisco, Juniper and Nokia do for MOST of their products. Let's give a more extreme example, ALL whitebox hardware vendors like Ufi Space or EdgeCore, ALL use merchant silicon.

They all have their own NOS, for example OcNOS in the case of whitebox, and these NOSes decided NOT to use any Linux conceptualisation for the data plane and therefore no Linux-like complexity.

MikroTik code design, UI/UX is so terrible that v7 still doesn't have "long-term" and still they can't offload EVPN/VXLAN/MPLS even though the Marvell ASICs supports it at hardware level.
MT took the opposite path: expose all chip features, requiring the user to know what those are and avoid designs that require RouterOS to activate one of the abstractions you rail against, in order to emulate a missing ASIC feature in software.
MikroTik didn't expose jack. Where's EVPN? The Marvell ASICs on CCR2k supports it, where's the “exposé”?
If you add “custom ASICs” to support that software, then yes, I agree that would result in a cleaner implementation…
Again, NOPE. Cut the simping for MikroTik. ALL vendors rely on Merchant silicon for the MAJORITY of their products, most commonly that's Broadcom.

But you have Marvell, Centec, etc as alternatives, all of which works fine on non-RouterOS NOSes in the market like SONiC or OcNOS.

What can MikroTik do? Cut the shite and allow official ONIE flashing, and let us install our own NOS.
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 11673
Joined: Thu Mar 03, 2016 10:23 pm

Re: Low performance on RB5009 with machine behind NAT

Wed May 01, 2024 7:05 pm

Cut the shite and allow official ONIE flashing, and let us install our own NOS.

If you don't want to use ROS ... and you're saying other vendors provide whitebox devices with similar hardware ... so why would you want to use anything by Mikrotik?

I'm guessing you're still intrigued by MT's price tag ... and I guess you'll just have to deal with current reality (which is ROS + nice prices VS. great NOS + not so nice prices). I don't think any of your (extensive) ranting will change that.
No, I'm not MT's advocate or apologist ... I'm simply accepting reality (and I'm making my choices without letting the whole world knowing about them).
 
DarkNate
Forum Guru
Forum Guru
Posts: 1032
Joined: Fri Jun 26, 2020 4:37 pm

Re: Low performance on RB5009 with machine behind NAT

Wed May 01, 2024 7:21 pm

If you don't want to use ROS ... and you're saying other vendors provide whitebox devices with similar hardware ... so why would you want to use anything by Mikrotik?

I'm guessing you're still intrigued by MT's price tag ... and I guess you'll just have to deal with current reality (which is ROS + nice prices VS. great NOS + not so nice prices). I don't think any of your (extensive) ranting will change that.
No, I'm not MT's advocate or apologist ... I'm simply accepting reality (and I'm making my choices without letting the whole world knowing about them).
MikroTik hardware is great, MikroTik hardware+price is great. MikroTik hardware+price+ROS is okay.
MikroTik hardware + price + ONIE = fantastic.

MikroTik could be generating more revenue too if their boxes were ONIE-native, more people would buy them in bulk and nobody would complain about ROS.

Whether my rant makes an impact or not, I don't really care. I got time to kill now and then.

Whiteboxes have dropped to sub $3k pricing now, give it a few more years before we can get sub $1k whiteboxes.
 
tangent
Forum Guru
Forum Guru
Posts: 1411
Joined: Thu Jul 01, 2021 3:15 pm
Contact:

Re: Low performance on RB5009 with machine behind NAT

Wed May 01, 2024 8:23 pm

MikroTik didn't expose jack. Where's EVPN? The Marvell ASICs on CCR2k supports it, where's the “exposé”?

We're arguing two separate points. You're welcome to demand every single feature of the chip in RouterOS, but MT has finite resources, and their priorities likely differ from yours atop that.

My point is merely that if you don't read and take the time to understand documents like this one, you're likely to run into one of these "abstraction layers" you speak of in the other thread.

In any case, I don't see how the lack of EVPN and such affects this thread's topic.

Here's an idea: point me to a Cisco/Juniper/Whitebox/whatever third-party product also using the same 88E6393X CPU chipset as the OP's RB5009, and let's redo the test. Does it get meaningfully more PPS in the same setup? If so, then you've got something you can take to MT and ask, "Why?"

Who is online

Users browsing this forum: No registered users and 33 guests