Feature Request - CHR - VPP & ISO version CHR ROS

Feature Request - CHR - VPP & ISO version CHR ROS

I’ve been using thousands of Mikrotik devices and dozens of virtual VM CHRs in my ISP networks for many years now. So far , all of my Mikrotik products have stood up well against the test of time and they have easily handled my ISPs leaps in network throughput growth.

However , I am bumping up against the CHR’s upper-end throughput ( which is normally anywhere from 3-Gig to 6-Gig ) where I am unable to achieve 10-Gig+ routing ( I do not see the current CHR having the ability to route at 25-Gig to 100-Gig wire-speed throughputs ).

So , I have two requests I would like to suggest to Mikrotik:
1), Create a CHR ISO , where the CHR can be directly installed on bare-metal hardware ( and/or ISO installed in a hyper-visor ).
2), Create a new CHR ISO image that default utilizes VPP ( Vector Packet Processing ).

Re #1, ISO ; this has been suggested many times by myself and others. This paves the way to install a CHR ISO version on a bare-metal server and eliminates the overhead throughput delays of a hypervisor.

Re #2, VPP ( Vector Packet Processing ) ; This is a new much faster method of processing packets. The routing routines & drivers are re-written to keep software in CPU cache ( greatly increases CPU cache hits by 4 fold and more - where as the current older drivers mostly have CPU cache misses ). VPP is designed to use CPU cache hits - which greatly speeds up throughput and is multiple magnitudes faster.

Note: Other non-Mikrotik VPP software routers ( Vyoxxx, PfSexxxx, Linuxxx ) are now close to or aiming for, or hitting the 100-Gig wire-speed routing throughputs on normal bare-metal x86 Xeon hardware. Recently I have been testing out some of these ISO installable router products - and they are fast ( much faster than my VM CHR routers ). I am now getting ready to start testing out the currently-available various VPP software router products. However , it makes me feel a little sad that for what I know that Mikrotik has zero plans for ISO and VPP router products.

Years-ago , there was 10-Meg, then years later 100-Meg, then years later 1-Gig
Yesterday , I only needed to route at less than 10-Gig
Today , I need to route at 10 to 50-Gig
Tomorrow , I need to route 100-Gig and possibly faster
On the horizon, I see 400-Gig being the ISP’s in-house standard in routing

What is Mikrotik planning to support as we move to tomorrow’s network throughput demands ?

North Idaho Tom Jones

  1. CHR is designed to run in virtual environments and can easily handle Tbps without any issues. But if you really want to run bare-metal, go for ROS x86_64. But why? Properly set up virtual environments are just as fast as bare metal and are way easier to manage.

  2. VPP is a user-space solution and doesn’t run within an embedded Linux kernel like ROS because of resource constraints.

@ OP / Tom

We are in the same boat as you we are growing but in a much slowly pace as a side effect I think we can still wait at least a few more years to wait for VYoS or Bison Router to mature, any chance you can share other cost and effective solution you are looking into?, I hate to ask this since this is MT forum but it seems like they are not interested in competing with SP space

VPP is being mostly developed for the Linux kernel and has had some headway in those VPP routines being ported to BSD kernels.
As for embedded Linux kernels, VPP is almost the same as other software drivers/routines that can be compiled into a Linux kernel.

As for the current ROS … can you show me any ROS product that can route and/or nat at 40-Gig to 100-Gig line-speed rates ? There are none. However , in the Linux VPP world , these throughput rates are ( or almost … ) possible right now.

Re: Properly set up virtual environments are just as fast as bare metal and are way easier to manage.
Well … almost.
I think what you are saying is more like :
Properly set up non-VPP virtual machines are just as fast as bare metal and are way easier to manage.
IMO

  • VPP routing is magnitudes faster than non-VPP routers
  • VPP CGN-nat is magnitudes faster than non-VPP CGN-nat routers
  • VPP is approaching ASIC line-speed rates and has the potential to outrun some older ASICs.
  • VPP is less jittery under heavy load than a non VPP router under heavy sustain loads
  • With VPP , there is normally one cache pre-load then run in CPU cache and that same CPU cache is repeatedly used for as long as possible. Without VPP, there are no cache pre-loads that are re-used because the CPU cache is re-loaded with other parts of the driver - which results in CPU cache-misses - which results in much slower CPU throughput.
  • With VPP, even if you have a CPU interrupt and the CPU re-writes the cache , when it returns there is one pre-load and the CPU cache is re-used again and again over and over again.

A couple of things for you.

  • PfSense currently have a VPP software router ( TNSR on a Linux kernel ). I have heard that it is near 100-Gig wire-speed on good/modern x86 bare-metal hardware with newer PCIe # 100-Gig network interfaces.
  • Linux already has VPP options you can package install.
  • VyOS already has VPP options you can package install.

When I have some free time , I want to build and test some Linux VPP routers on some newer bare-metal servers. If they are faster than my other software routers, then I might put them into production use at my ISP.

  • PfSense currently have a VPP software router ( TNSR on a Linux kernel ). I have heard that it is near 100-Gig wire-speed on good/modern x86 bare-metal hardware with newer PCIe # 100-Gig network interfaces.
  • Linux already has VPP options you can package install.
  • VyOS already has VPP options you can package install.

This is interesting I hope VyOS will mature much faster it is pretty much streamline for us, rather that rolling out my own linux solution TM, thanks for the input

VPP is user-space software, used in projects like FD.io, DPDK and similar. It’s not suitable for embedded network OS environments like ROS.

EDIT:
The above applies to OSes like pfSense, VyOS, BSD, and others.

If I read the point correctly …

Why do you think that embedded NOS can’t run user space stuff? Running something inside kernel space doesn’t make it any faster or smaller or anything … it’s just running with higher privileges (so it can crash the whole device even easier) and that’s it. But it’s quite much easier to introduce (purpose-built) code which handles some data effectively if it’s run in user-land (no need to pursue Linus or one of his “henchmen” to approve code push into kernel source code), kernel space only has to provide API (ABI) which allows user-land code to pass the data left and right.

Well, it won’t work with the current product line. Study the basics and you’ll understand why.

Well, I’ve read some (general) articles on VPP … and I still don’t get it: why is it orthogonal to embedded NOS such as ROS?

re: … Well, it won’t work with the current product line. …
Are you referring to it wont work with current CHR ? ( if so , CHR does not run on bare-metal ). CHRs don’t have direct access to all of the physical hardware.
Are you referring to ROS on current hardware ? ( well , the current hardware does not have any CPU cache that I know of ).

However some years old x86 Xeon motherboards with VPP enabled Linux are sustaining 100-Gig network routing throughputs ( measured and verified ).

However some years old x86 Xeon motherboards with VPP enabled Linux are sustaining 100-Gig network routing throughputs ( measured and verified ).

are you referring to FRR + VPP or something else care to elaborate more please?

i think the current path is towards ASIC forwarding as other vendors do since 20 years ago, letting general purpose CPU mostly for control plane

aka L3 Hardware Offloading

under the “right” circunstances (including some limitations on features) is feasible to do 200gbps of forwarding on a CCR2216 using only 1 Rack Unit of physical space and around 100 watt of power consumption at a very reasonable price

with future ASIC’s is feasible to expect this forwarding numbers go scale

i understand that for some use cases Forwarding using General Purpose CPU’s can be a reasonable choice but supporting the wide spectrum of hardware posibilities and combinations come with its own challenges

To fully utilize VPP (What is VPP?) as a fully-fledged router you need to pair it with a user-space network stack that has all the necessary capabilities using frameworks like FD.io or DPDK and DPDK enabled drivers. This also requires an additional user management interface to orchestrate the control plane similar to how ROS interacts with the Linux network stack (ie netfilter/nftables). Although some people are experimenting with a possible kernel module as a workaround, VPP fundamentally operates in user space to maximize performance by avoiding the overhead of kernel context switches. Anyhow, all these developer frameworks are pretty resource-intensive.

A similar solution in the Linux kernel space is eBPF/XDP which normally has lower resource consumption compared to user space networking.

Re: … i think the current path is towards ASIC

Well crafted ASICs can and do often improve specific types of I/O throughput on specialized hardware.

But what about x86 and CHR ROS on hypervisors and bare-metal machines. There are no router I/O ASICs on common generic motherboards.

IMO, if there are no tik plans to utilize newer faster router I/O software/driver router routines , then that might possibly be an end end-of-life for a product. All hardware and software manufacturers should always be researching and testing methods to improve their products to not only keep up with todays routing throughput demands , but also be working on faster products that can meet the increasing I/O throughput demands for future I/O throughput demands. Most consumers are not willing to stay with products that do not keep up with their current and future needs - the same applies to routers. If an ISP has a demand/need to support 1,000+ customers with 100-Meg to 1,Gig accounts, they will find a way , or that ISP will loose customers to another ISP.

well , just my 2-cents of what I would like to see.

North Idaho Tom Jones

Isn’t it counterintuitive for MT to push their hardware sales rather than CHR + VPP that’s why it appears they are not interested to make this happen?
I hope this is not the case I think most of the SP guys here including us is willing to pay for a reasonable price just to make this happen anyone care to share their thoughts?

loloski,

I look at it this way:
If you manufacture and sell blue widgets, but customers are also wanting to buy and asking for green widgets - where do those customers go to buy green widgets.

Being in the computer communications industry for almost 50 years now , I’ve seen many companies slowly loose their customers because somebody is offering a better/cheaper mouse trap.
If somebody wants a better/faster/cheaper mouse trap , then make it and sell it to them and create more loyal customers in multiple markets.
It’s better to have more customers in multiple markets than only target specific customers in common narrow vertical markets.

@Tom,

I do agree with you I hope whoever decides on the other side of the aisle also think the same way as you do, but evidently this is not the case here I think and firmly believed they have their own winning formula that they believed to make them thrived and Service Provider oriented product is not among them being on this space I personally feel we are less important for the lack of better word :slight_smile: I hope i’m wrong hahaha

VPP is just a frontend towards Intel DPDK which means that you set aside cores from your onboard CPU to not be part of the kernel processing.

This means that alot of the kernel and userspace overhead is removed for these cores which means that they will maximize performance for a single task such as routing packets.

The performance boost is in the range of give or take:

  • Regular interruptbased processing: >250kpps per core.

  • Pollbased processing: >1Mpps per core.

  • Intel DPDK (VPP and the other frontends): >10Mpps per core.

So yes this will add to the size of the image but at the same time the CHR is meant to be runned on x86 without any additional switchchips (Marvell, Broadcom etc) for offloading so having the image increase by another 10-100MB or so shouldnt be an issue (compared to the ROS images used on Mikrotik hardware who have a flash limit of about 16MB).

On the other hand I have always seen Mikrotik as this cheaper alternative to get switchchip offloaded hardware (with RouterOS or SWOS) but for a true software solution I would personally go for VyOS (routing) or OPNsense (firewalling) depending on the task.

I would prefer if Mikrotik fix their current issues (VRF, MLAG etc doesnt work when L3-offloading is enabled along with DNS and logging-services are broken when it comes to using them with VRF) before trying to add time and resources to relase a VPP edition of CHR.

They are going to be slave on fixing does you mentioned issues indefinitely if MT continue on their journey for what appears of not giving much attention to unit testing and somehow luck of leader / visionary for much of the codebase instead they just let the individual programmer to be the king of their own not in concert/symmetry to others that’s why the codebase is so fragile if someone change something there’s a big chances that other part will not work.

I’ve seen this situation in other project like asterisk where they get tired of fixing chan_sip that push them to replace it with much better alternative, in MT case they are buried on fixing basic things and having a hard time of innovating and step up their game like this VPP + CHR, I hope i’m wrong honestly it’s very hard for me to criticized MT if I had a chance and I don’t like it, but we are married to their ecosystem because of the perceived benefit it’s true we can’t deny that MT is really a compelling alternative to the big guns like juniper and cisco and that’s something I have to live and hate.

I’m just speaking with the hard truth I don’t mean to antagonize MT I’m just voicing my observation as being an outsider to their company hoping that they improved someday