Hi, i have a setup where i cannot make RB3011’s troughput go over 400mbps
Something is keeping cpu1 always busy with IRQ calls, and the software workload is also not symmetrical between the cores, making the situation even worse.
I have my WAN coming from the SFP port, and flowing into CPU-0, according to the block diagram.
I have my LAN coming from ports 3 and 5, into a bonding, and into CPU1 (i tried using a direct connection without the bonding, and it had no effect in cpu usage.)
I have Allow estabilished/related rules in firewall/filter, and about 20 filter rules. Disabling my entire filter stack has little effect in total CPU usage. about 5%
There are a bunch of NAT rules, i tried a “generic single rule to NAT them all”, and disabling all of my dnat rules.
Doing that had almost no effect. Peak troughput stays unaltered, limited by cpu1 reaching 95% usage while cpu1 is at 20%.
Here is a commented screenshot of the device:
In this scenario, what else can i try to squeeze some extra performance out of this RB3011?
Hi, i have a setup where i cannot make RB3011’s troughput go over 400mbps…
It doesn’t look like you can specify which switch ports will work with particular CPU. Both CPUs have connection to both switches.
And It looks like all the traffic is going through CPU1. The SFP is connected to CPU1. And ports connected to switch1 also have a lane to CPU1. The shortest path for the traffic between these does not involve CPU0. In these situations, usually the CPU that is not involved with the raw traffic, gets to be more involved with the software stuff (firewall, NAT, vpn, etc). This is somewhat backed by your stats - cpu usage by firewall rules is not that much, and CPU0 is not so loaded.
You should be able to push about double on what you are seeing. Do you see any events in the log regarding the SFP?
all ethernets are “only hardware queue”;
the bonding interface shows as “no queue”.
bonding is round-robin with a ccr on the other side (ccr1009 is at 1% CPU).
I also added a fasttrack rule, and lowered the conntrack timeout from 24 to 4 hours. (number of tracked connections went from 50000+ down to about 25000).
Still, CPU1 bears all the load, while CPU0 stays between 0 and 5% usage. /o\
Thank-you @nescafe2002, for taking the time and effort to reproduce this setup!
It appears then that this concentration of load on cpu1 is inherent to the RB3011 hardware, and my specific combination of ports. (i was thinking i did something terribly wrong in the configuration…)
With a fasttrack rule, i managed to get to my 510mb limit on the link.(@80%cpu1, and 0%cpu0), and that’s what i need for now.
i also have the same problem
when i use rb3011 to download and upload
when the throughput is 100m/100m
one of the cpu is very high
maybe it’s the cpu’s problem,the cpu is arm
If you have to use sfp1, then use eth6-10 as bonding interfaces. It should reduce load on CPU1.
You can also work on your firewall rules:
Use Address-Lists on your FU**ERS rules. Addresslists can contain whole networks just as well as single addresses
Right now this would save you 4 lines
Same for the SRC-NAT rules. Just go with address lists and save almost 50 rules.
Use JUMPS. Seriously, use them. Especially in your NAT rules.
Split off by chains, and in DSTNAT chain again by dst-address.
For the Rule “add action=dst-nat chain=dstnat dst-address=1.1.1.21 dst-port=32006 protocol=tcp to-addresses=192.168.9.136 to-ports=32006” to apply the packet has to be matched against >130 other DSTNAT rules, not counting the 58 SRCNAT Rules before that. That takes time and eats up CPU cycles.