Community discussions

MikroTik App
 
jackrabbit
just joined
Topic Author
Posts: 11
Joined: Tue Jul 07, 2020 1:28 pm

MPLS/VPLS decapsulation locked to single CPU core on ARM/ARM64 (CCR2004, CCR2116)

Thu Jan 18, 2024 5:59 am

It appears that ROSv7 on ARM/ARM64 locks MPLS decapsulation to cpu0, resulting in a major bottleneck for MPLS/VPLS networks using ARM64 routers such as the CCR2004 and CCR2116.

For example, on our network, we have CCR2004s as our core and site routers. When a device connected to a site router uploads traffic, throughput caps around 500-700 Mbps, cpu0 of the core router hits 95%+, and significant packet loss occurs. Download traffic is mostly unaffected by this issue.

We have spoken to 3 other network operators who use VPLS and who are experiencing this. It is reproducible with both the CCR2116 and CCR2004.

We recently interacted with an ISP that expressed frustration with the asymmetrical VPLS performance issues with the CCR2116/CCR2004 routers. They ended up switching to Arista for their core router, leaving their tower sites as CCR2116s. Interestingly, they are now able to pull nearly line rate (9800/9800 Mbps) over VPLS from the tower site CCRR2116s, whereas before they were limited to ~500 Mbps upload.

Another ISP acquaintance tested and found that the single-core bottleneck occurs primarily during decapsulation, not during encapsulation.

As a temporary workaround, we have overlaid VXLAN, which is correctly multi-threaded on ROSv7 for ARM/ARM64. However, fixing the MPLS/VPLS issue on ROSv7 for ARM64 would be ideal given all of MikroTik's flagship routers are ARM64.

If you have experienced this issue, please chime in.

Issue reported to MikroTik support on 12/06/2023 as SUP-136817.
 
User avatar
sirbryan
Member
Member
Posts: 316
Joined: Fri May 29, 2020 6:40 pm
Location: Utah
Contact:

Re: MPLS/VPLS decapsulation locked to single CPU core on ARM/ARM64 (CCR2004, CCR2116)

Thu Jan 18, 2024 7:43 am

That was me. :D Well, I was one ISP who spent some time tonight testing from a 2116 to a 1036 through two 2004's (as well as with another ISP's two 2004's from a 1036 the other night) and determined it was single-core on egress. (All boxes under test are on 7.13.2.)

According to the WISP Talk Facebook discussion about this, someone else submitted a ticket about it a while back, but it appears yet to have been dealt with.

Meanwhile, I also tested with CRS309's as VXLAN and VPLS encap/decap endpoints, and maxed out the CPU's with only 1200Mbps. But on those ARM boxes, it at least spread the load.

I am certainly looking forward to hardware-offloaded MPLS and/or VXLAN. While I don't use either in my network (in any meaningful amount), I hope to be able to someday.
 
User avatar
clambert
Member Candidate
Member Candidate
Posts: 122
Joined: Wed Jun 12, 2019 5:04 am

Re: MPLS/VPLS decapsulation locked to single CPU core on ARM/ARM64 (CCR2004, CCR2116)

Thu Jan 18, 2024 1:18 pm

I have observed this behavior for all types of MPLS traffic (both VPLS and VPNv4) on both ARM (RB1100AHx4) and ARM64 (CCR2004) devices. It is interesting to note that if the interface to the MPLS core is a VLAN interface, the incoming VPNv4 traffic is distributed between two CPU cores in the CCR2004 (not so in the RB1100Hx4).

I generated a case to Support for this behavior in the RB1100AHx4 [SUP-108967]. According to the 7.13 changelog, this behavior was improved (I have not been able to test it yet in production):
*) ethernet - improved packet CPU core classifier for Alpine CPUs for non IPv4/IPv6 traffic;

I suspect that this behavior is based on the fact that the hash that is responsible for distributing incoming traffic to the CPU between the different cores only analyzes the Ethernet header (source and destination MAC address) and the IP header (source and destination IP address). However, for VPLS traffic, the hash can only parse the Ethernet header (the IP header, if exists, is part of the MPLS payload).
 
User avatar
nz_monkey
Forum Guru
Forum Guru
Posts: 2104
Joined: Mon Jan 14, 2008 1:53 pm
Location: Over the Rainbow
Contact:

Re: MPLS/VPLS decapsulation locked to single CPU core on ARM/ARM64 (CCR2004, CCR2116)

Thu Jan 18, 2024 1:27 pm

This is quite interesting as this problem did not occur in RouterOS v6
 
User avatar
clambert
Member Candidate
Member Candidate
Posts: 122
Joined: Wed Jun 12, 2019 5:04 am

Re: MPLS/VPLS decapsulation locked to single CPU core on ARM/ARM64 (CCR2004, CCR2116)

Thu Jan 18, 2024 1:42 pm

This is quite interesting as this problem did not occur in RouterOS v6
This problem was also pesent in ROSv6.
Last edited by clambert on Thu Jan 18, 2024 6:13 pm, edited 1 time in total.
 
User avatar
sirbryan
Member
Member
Posts: 316
Joined: Fri May 29, 2020 6:40 pm
Location: Utah
Contact:

Re: MPLS/VPLS decapsulation locked to single CPU core on ARM/ARM64 (CCR2004, CCR2116)

Thu Jan 18, 2024 5:20 pm

I will add that even though it's single-core bound, the throughput on 2004 was still 8Gbps full duplex, or 9Gbps one-way (due to testing on 10Gbps ports, of course) with the router bridging an entire interface with the VPLS, routing it over the second to the second router, which passed the traffic back out on the first SFP+ port. No VLANs involved in my tests.

One of the guys on WiSP Talk was running into 500Mbps limits in one direction with 2116's or 2216's in the core. Mine had no queues or filtering rules, so I'm wondering if there was some other stuff going on making it worse.

My tests were also on the 16GBE version of the 2004, whereas other testing I did with another ISP was on the 12SFP+ version, which has its own set of quirks. We were running into limits around 3Gbps, which more closely matches limits I ran into when using that particular 2004 as a border BGP router with a handful of rules, no MPLS/VPLS. I couldn't push more than 3Gbps during speed tests, and finally swapped it out for a 2116, which handily passes whatever I throw at it before enabling L3HW offload.
 
User avatar
clambert
Member Candidate
Member Candidate
Posts: 122
Joined: Wed Jun 12, 2019 5:04 am

Re: MPLS/VPLS decapsulation locked to single CPU core on ARM/ARM64 (CCR2004, CCR2116)

Thu Jan 18, 2024 6:15 pm

The behavior I described above is on the CCR2004 with 12 SFP+ ports.
 
DarkNate
Forum Guru
Forum Guru
Posts: 1017
Joined: Fri Jun 26, 2020 4:37 pm

Re: MPLS/VPLS decapsulation locked to single CPU core on ARM/ARM64 (CCR2004, CCR2116)

Fri Jan 19, 2024 12:55 pm

Very strange issue.

I have a CCR2004 in production running VPLS as a PE router, and I'm unable to see single CPU core choking, CPU cores all are engaged pretty much evenly. Maybe it's config related?
ROS version 7.12.1, firmware version 7.12.1 as well (if I use 7.13.x, it reboots every 15 minutes), I did 7.12.1 netinstall before racking this device and putting it to work, possibly that played a role in the issue/solution.

Edit:
After benchmarking thoroughly, same problem as everybody
 
khaleel
just joined
Posts: 1
Joined: Mon Jan 01, 2024 2:46 pm

Re: MPLS/VPLS decapsulation locked to single CPU core on ARM/ARM64 (CCR2004, CCR2116)

Mon Jan 29, 2024 12:27 pm

90% to 100% on one core while the rest is 30% on CCR2116
 
User avatar
nz_monkey
Forum Guru
Forum Guru
Posts: 2104
Joined: Mon Jan 14, 2008 1:53 pm
Location: Over the Rainbow
Contact:

Re: MPLS/VPLS decapsulation locked to single CPU core on ARM/ARM64 (CCR2004, CCR2116)

Tue Feb 06, 2024 7:22 am

It seems that Mikrotik need to write a new software based MPLS data plane (FastPath module in Mikrotik terminology) that can distribute load across multiple cores.

And while the code and protocols are fresh in the developers minds, having a VPLS FastPath module would be great too.

I know that elsewhere on the forum there are discussions about MPLS Hardware Forwarding. This is very important for platforms like the CCR2216, CCR2116 and the CRS's but will be quite restricted in the label depth and number of VLAN TAG's that can be decapsulated. It will also not benefit any of the other Mikrotik platforms that are in widespread use within ISP's.

In my opinion Mikrotik need both hardware accelerated forwarding and a software "FastPath" that can spread the load across multiple cores.
 
DarkNate
Forum Guru
Forum Guru
Posts: 1017
Joined: Fri Jun 26, 2020 4:37 pm

Re: MPLS/VPLS decapsulation locked to single CPU core on ARM/ARM64 (CCR2004, CCR2116)

Tue Feb 06, 2024 10:18 am

FastPath isn't going to cut it in 2024.

They need to use DPDK/VPP for line-rate software dataplane. Maybe XDP for ingress filtering.
 
User avatar
nz_monkey
Forum Guru
Forum Guru
Posts: 2104
Joined: Mon Jan 14, 2008 1:53 pm
Location: Over the Rainbow
Contact:

Re: MPLS/VPLS decapsulation locked to single CPU core on ARM/ARM64 (CCR2004, CCR2116)

Tue Feb 06, 2024 1:22 pm

FastPath isn't going to cut it in 2024.

They need to use DPDK/VPP for line-rate software dataplane. Maybe XDP for ingress filtering.
They can use whatever they want, as long as it gets the job done.
 
jackrabbit
just joined
Topic Author
Posts: 11
Joined: Tue Jul 07, 2020 1:28 pm

Re: MPLS/VPLS decapsulation locked to single CPU core on ARM/ARM64 (CCR2004, CCR2116)

Mon Mar 25, 2024 4:16 am

MikroTik support asked me to try v7.14.1 to see if it resolved the VPLS issues. Unfortunately, it did not. 1200 Mbps is the most I can push between two CCR2116s. On the plus side, VXLAN is doing great, moving over 9Gbps between two CCR2116s. VXLAN seems to be the best bandaid solution for now until Mikrotik can implement multi-threaded MPLS/VPLS encap/decap. :/
You do not have the required permissions to view the files attached to this post.

Who is online

Users browsing this forum: Amazon [Bot], Bing [Bot], gotsprings, johnson73, jurajhampel and 34 guests