RB3011 cannot reach 500mb/s troughput

Hi, i have a setup where i cannot make RB3011’s troughput go over 400mbps

Something is keeping cpu1 always busy with IRQ calls, and the software workload is also not symmetrical between the cores, making the situation even worse.

I have my WAN coming from the SFP port, and flowing into CPU-0, according to the block diagram.

I have my LAN coming from ports 3 and 5, into a bonding, and into CPU1 (i tried using a direct connection without the bonding, and it had no effect in cpu usage.)

I have Allow estabilished/related rules in firewall/filter, and about 20 filter rules. Disabling my entire filter stack has little effect in total CPU usage. about 5%

There are a bunch of NAT rules, i tried a “generic single rule to NAT them all”, and disabling all of my dnat rules.
Doing that had almost no effect. Peak troughput stays unaltered, limited by cpu1 reaching 95% usage while cpu1 is at 20%.

Here is a commented screenshot of the device:
screenshot-3011.png
In this scenario, what else can i try to squeeze some extra performance out of this RB3011?

even my old 2011 can easily reach 500mbps without any filter, nor complex nat rule

and if your sfp+ is an 10G port, you may hit the mikrotik 10G->1G port buffer bloat issue

you mentioned you have disabled the entire filter stack, just some FYI

mikrotik’s official brutal force login prevention wiki show this particular filter

add chain=output action=add-dst-to-address-list protocol=tcp content="530 Login incorrect" \
address-list=ftp_blacklist address-list-timeout=3h

this string match will eat up all your cpus, even on a monster x86 setup, this filter limits nat speed to 500mbps

Hi, i have a setup where i cannot make RB3011’s troughput go over 400mbps…

It doesn’t look like you can specify which switch ports will work with particular CPU. Both CPUs have connection to both switches.

And It looks like all the traffic is going through CPU1. The SFP is connected to CPU1. And ports connected to switch1 also have a lane to CPU1. The shortest path for the traffic between these does not involve CPU0. In these situations, usually the CPU that is not involved with the raw traffic, gets to be more involved with the software stuff (firewall, NAT, vpn, etc). This is somewhat backed by your stats - cpu usage by firewall rules is not that much, and CPU0 is not so loaded.
You should be able to push about double on what you are seeing. Do you see any events in the log regarding the SFP?

P.S. There is no speed gain in using bonding, since the lane from the switch to each CPU is 1Gbit. Only redundancy. Though, enabling bonding will turn off the hardware offload on QCA8337 switch:
https://wiki.mikrotik.com/wiki/Manual:Switch_Chip_Features#Bridge_Hardware_Offloading

  • bonding is used becase this router feeds two 360mb radio links. H/W offloading is not possible because this router must NAT all connections

-i have no complex match in the filters, no interface lists, nothing of the sort. (but in doubt, all filters are disabled for testing)

-there are 16 srcnat rules, that map 192.168.x.x/24 subnets to an external IP each, nothing else.

What is the output of:

/queue interfaces print

all ethernets are “only hardware queue”;
the bonding interface shows as “no queue”.

bonding is round-robin with a ccr on the other side (ccr1009 is at 1% CPU).

I also added a fasttrack rule, and lowered the conntrack timeout from 24 to 4 hours. (number of tracked connections went from 50000+ down to about 25000).

Still, CPU1 bears all the load, while CPU0 stays between 0 and 5% usage. /o\


cpu-3011.png

What about the SFP port?

SFP is also “only hardware queue”
interface-queues-3011.png

Ran some tests on my RB3011.

Bonding ether2 & ether3, run packet generator on other device, one (dstnat) rule:

explorer_2018-12-05_17-33-02.png
Max traffic ~970Mbps, cpu1 maxed out.

Same scenario but with ether2 & ether7 bonded:

explorer_2018-12-05_17-45-15.png
Most Tx/Rx rates are incorrect, but RB3011 reaches some 1950 Mbps up & down via bonding rr as shown by packet generator:

explorer_2018-12-05_17-39-05.png
rb3011.rsc (821 Bytes)

Thank-you @nescafe2002, for taking the time and effort to reproduce this setup!

It appears then that this concentration of load on cpu1 is inherent to the RB3011 hardware, and my specific combination of ports. (i was thinking i did something terribly wrong in the configuration…)

With a fasttrack rule, i managed to get to my 510mb limit on the link.(@80%cpu1, and 0%cpu0), and that’s what i need for now.

thanks for the help

i also have the same problem
when i use rb3011 to download and upload
when the throughput is 100m/100m
one of the cpu is very high
maybe it’s the cpu’s problem,the cpu is arm

The problem could be related to your configuration. Post here ( /export hide-sensitive ) to confirm.

there you go:
i changed the real IP parts to 1.1.1, 2.2.2, 3.3.3… etc
rb3011-high-cpu.rsc (42.1 KB)

Can you show the profiler running while the device is processing traffic?

I am getting a lot of firewall usage, but that is because SFP is not used and I am testing non-tcp packets.
LINQPad_2018-12-06_14-27-01.png

If you have to use sfp1, then use eth6-10 as bonding interfaces. It should reduce load on CPU1.

You can also work on your firewall rules:

Use Address-Lists on your FU**ERS rules. Addresslists can contain whole networks just as well as single addresses
Right now this would save you 4 lines
Same for the SRC-NAT rules. Just go with address lists and save almost 50 rules.

Use JUMPS. Seriously, use them. Especially in your NAT rules.
Split off by chains, and in DSTNAT chain again by dst-address.
For the Rule “add action=dst-nat chain=dstnat dst-address=1.1.1.21 dst-port=32006 protocol=tcp to-addresses=192.168.9.136 to-ports=32006” to apply the packet has to be matched against >130 other DSTNAT rules, not counting the 58 SRCNAT Rules before that. That takes time and eats up CPU cycles.

I’ll look into the lists/jumps optimizations! Thanks!

I’m quite new with mikrotik products and i have read a couple of times about blockdiagrams, where can i find those? are paid? free?

Block diagrams can be found in the product page, for all mikrotik products.

For RB3011, the product page is:
https://mikrotik.com/product/RB3011UiAS-RM

you then open the “suport and downloads section”, and there you have it: Block diagram.
https://mikrotik.com/product/RB3011UiAS-RM#fndtn-downloads