I’m looking to move internet providers and they use PPPoE for auth, if I remember correctly this was restrained to a single core and would impact performance is this still the case with the latest version of RouterOS?
Most SOCs used by microtik do not have built-in hardware accelerators.
For example, Amazon Annapurna, their main market is NAS, small servers rather than routing, and hardware NAT is not required.
ROS also doesn’t use the hardware NAT acceleration built into the SOC, even if it does.
Both are EN7562CT, TP-LINK TL-R5010PE-EN can achieve 3.594Mpps 64Byte packet forwarding using hardware NAT acceleration, but Mikrotik hEX refresh (50UG) can only reach 342.2Kpps using FastPath.
Both are IPQ8072A, TP-LINK TL-ER2260T can achieve 9Mpps 64Byte packet forwarding using NSS NAT acceleration, but Mikrotik Chateau PRO ax can only reach 891.2Kpps using FastPath.
The huge performance gap illustrates the point that Mikrotik does not use the SOC’s built-in NAT acceleration.
How does NAT acceleration relate to a query on PPPoE? I understood these were different levels of the stack. Would not RSS support for multiple cores be the issue, reliant on the NICs?
The hardware NAT acceleration of most routine-dedicated SOCs supports PPPOE acceleration, which greatly reduces CPU usage.
As I mentioned above, Mikrotik does not use hardware NAT acceleration, so PPPOE can only be done by software.
Using Torch in ROS to view WAN port details shows only one 8864 protocol stream. The PPPOE protocol packages data into a single stream, and RSS cannot be balanced over a five-tuple load. Therefore, PPPOE downlink can only be processed by one CPU core (upstream data does not have this problem).
Agree, each time when they talk about L3 offload, they aim to switch chip.
But the SoC, most of the new RB device’s CPU has hardware accelerator that can speed up NAT/PPPoE/QoS. including the HEX MT7621A, which is released serval years ago.
PPPoE on RouterOS uses single core (per PPPoE connection) but for your use case (PPPoE to ISP as WAN line) the performance overhead is negligible. The single thread processing only applies to the encapsulation/decapsulation of the PPPoE header, the rest like routing, NAT, firewall, etc,… are still distributed to multiple cores if you have multiple connections (connections inside the PPPoE tunnel).
Below are speed tests done over a single PPPoE WAN line with the old hEX RB750Gr3, fasttrack for both IPv4 (1st screenshot) and IPv6 (2nd screenshot) are in used. You can see that 985Mbps can be pulled over ether1 (the ethernet interface underneath pppoe-out1) and that two cores of the CPU are fully loaded, and it’s not limited to a single core. The hEX has only two real CPU cores. Four CPUs are listed because it supports hyperthreading.
Which means you’ll need something barely (5%) faster than the RB750Gr3 to saturate a Gbps PPPoE WAN line if you can use fasttrack (with is now available for IPv6 too).
Your ax³ should have 4× the performance of the hEX RB750Gr3 (according to MikroTik’s figures). When you did your test with iperf3, did you use the -P parameter, for example specifying -P 4? Because even without using PPPoE, packets of a single connection are only processed by a single core in RouterOS. I think with -P 4 you should see the load being distributed to all the cores, even if you only use one PPPoE interface. The PPPoE packets will be handled by one core, but that affects encapsulation/decapsulation as well as routing of those packets. Before/after that the 4 connections inside the tunnel (if you run iperf3 with -P 4) should normally be processed by multiple cores (processing includes firewall, routing, bridge/vlan too in case of ax³).
Now that you mention it:
I forgot to say that infact I was using a single connection with Iperf. (I agree that a single connection for traffic to wan is quite rare nowadays)
Also makes sense that routing would be single threaded (I think) but didn’t take that into account.
Some SOCs Mikrotik uses don’t have built-in hardware NAT acceleration, such as TILERA. ROS initially focused on soft acceleration solutions. NAT hardware acceleration is hardly open source, you can only use the kernel version corresponding to the official SDK, even though OPENWRT has some open source versions, but the performance is still very different from the official. Hardware NAT acceleration also lacks sufficient flexibility, such as the inability to enable FQ_CODEL flow control. I don’t think Mikrotik has the resources to fit that many platforms.
The CCR2216 hardware NAT entry is only 4K, which is too little, and the IPQ8072 NSS hardware NAT entry is 500K, which is not a level at all