[Feature Request] Data Center Bridge support

galvesribeiro · July 11, 2022, 12:54am

Hello!

With the advent of the new 100GbE line of products (CCR2216-1G-12XS-2XQ / CRS504-4XQ-IN / CRS518-16XS-2XQ) being advertised as Enterprise and Data Center products, companies with serious Enterprise and Data Center workloads (like ourselves) started to using those devices to drive their business.

However, even though those devices and RouterOS has an exceptional raw performance (for the price bang), they lack Enterprise and Data Center features. The most prominent of those is Data Center Bridge (DCB) which is nowadays an extreme/useful requirement for any Hyperconverged Infrastructure (HCI) deployment. After doing some research it seems that RouterOS only support IEEE 802.1Q (VLANs) being all other common requirements left out.

Here are a list of the most important standards that are required in order to make router/switches support to DCB:

1. IEEE 802.1Qbb - Priority Flow Control (PFC): PFC is a requirement where DCB is used so it can support both RoCE and iWARP types of RDMA. At least 3 Class of Services (CoS) priorities are required without downgrading switch capabilities or port speeds and one of these classes must provide lossless comms.
1. IEEE 802.1Qaz - Enhanced Transmission Select (ETS): Like PFC, ETS is also a DCB requirement for RDMA types.
1. IEEE 802.1AB - Link Layer Discovery Protocol (LLDP): LLDP is required in order to support TLVs dynamically without requiring any additional configuration. For example, enabling the subtype 3 should automatically advertise all VLANs available on a switch port.

So, is there any plans from Mikrotik to implement/enable those features at least on those very powerful high end devices which has switch chips? The switch chips on those devices seems to have support to SAFE (Marvel Prestera RDMA implementation) so it seems that it is a matter of implement the support on RouterOS.

I’m sure we would see many more companies dropping usage of Dell/Cisco/Huawei switches/routers in favor of Mikrotik if those Enterprise/Data Center features would become available. With this new line up of 100G devices, the only reason I would go for those traditional brand switches/routers Today, is because of the lack of those standards on Mikrotik devices and RouterOS.

Thank you!

nz_monkey · July 11, 2022, 1:23am

I agree with this request, this is something that Mikrotik need to take seriously.

RoCE is becoming dominant in the Data Center and Enterprise and requires the requested features.

Largelos · August 28, 2022, 1:07am

Modern RoCEv2 from nvidia is not require any of this, but lack of any QoS shapinig functions for top series switches and routers is confusing.
Why smaller CRS1xx/2xx have QoS functions, and CRS3xx, CRS5xx, CCR2116, CCR2216 not? Is it chip limitation or software? Any plan to add QoS functions later?
Don’t understand how to use switch without any ability to set QoS priority, for me all of modern series looking useless.

domrockt · November 7, 2023, 11:57pm

same ^^

"I agree with this request, this is something that Mikrotik need to take seriously.

RoCE is becoming dominant in the Data Center and Enterprise and requires the requested features."

iam going to sell my CRS504-4XQ-IN when i get the chance to switch to an propper switch

lejfi · May 2, 2024, 4:49am

I agree, i bought 2x CRS504-4XQ-IN in hope that Mikrotik get support for RoCE but it seems that it will not happened in the nere future after wait for it since the release of the switch and it’s really an disappointed since Mikrotik looked promising in the beginning but missing the most important thing i needed from a switch RoCE/RoCE v2. I’m also selling of my Mikrotik switches and getting enterprise network brand with support for RoCE/RoCE v2 from ebay.

galvesribeiro · May 2, 2024, 6:20am

Yeah… Frustrating. I sent an email to the support which, unfortunately, ghosted and never replied. Opened a ticket thru the support system and guess what? “Disappeared”.

Just frustrating to say at least…

I think it would be better if they just clearly stated that they will not support this. Mikrotik is starting to become Ubiquiti in the way they do business and not replying to customers.

Amm0 · May 2, 2024, 6:42am

I believe RouterOS 7.15rc add some HW QoS, see https://help.mikrotik.com/docs/pages/viewpage.action?pageId=189497483

Larsa · May 2, 2024, 10:39am

@galvesribeiro - as you pointed out, “Enterprise and Data Center products” is a marketing term and can mean anything. If you are in the data storage business, it’s probably wise to assess your technical requirements before making a purchase.

RoCE traffic can be transported over any standard switch or router and might actually work perfectly well for dedicated traffic up to a certain level of throughput using regular QoS.

However, for truly demanding applications such as massive central data storage which requires high throughput and extremely low latency, flow control, packet prioritization, and buffering must be handled by specialized hardware (ASIC) to achieve so called “Lossless Ethernet”. This is normally not supported by the SoC in standard switches. The only task Lossless Ethernet has, is to minimize NIC-to-NIC retransmissions to ensure optimal performance during high loads otherwise the RoCE endpoint itself need to perform the actuall retransmission and point to point congestion control (much like TCP).

For more detailed information, please read:

CXL will eventually replace RoCE/IB, especially in data centers.

galvesribeiro · May 2, 2024, 11:44am

Marketing terms usually mislead customers. It is well known that DCB means “Enterprise/Datacenter Gear” for anyone in the industry.

Although it is true that RoCE v2 can be carried over regular switches/routers on commodity hardware, that doesn’t mean it will work. Any application or OS that validates the RDMA support will check the support of the very specific features/standards on both ends in order to enable that feature. An example of this is Windows Server Storage Spaces Direct, VMWare vSAN, Open-Source Ceph, etc. Those are a non-exhaustive list of services and OSes which will not enabled RDMA unless those requirements are clearly tested against the hardware end-to-end. If any of those fail to be detected, they will not enable it. I’ve tried already with Mikrotik switches and 2 machines that has Mellanox NICs on them with RDMA support. Neither Windows Server or VMWare ESXI enable the RDMA features when plugged on the Mikrotik switches. If I change it to Dell PowerON switch and enable/configure RDMA on the connected ports, it immediately show up on both OSes the support.

It doesn’t need to be massive. Any modern file share with SMB can have RDMA enabled on Windows Server. Also, the SoC on Mikrotik devices is the CPU. What they call “Switch Chip” on their block diagrams is in fact an ASIC and according to the chip manufacturer, they do support RDMA.

That is not true. It is not only to save on retransmissions. RDMA allows a wide variety of scenarios in the field like remote GPU direct access for example. Also, RDMA is a benefit not for the middle-man equipments like switches/routers. It removes all the networking process from the CPU down to the NIC with huge performance gains and cost savings. An application can allocate buffers straight on the “network card memory” allowing one to bypass the whole kernel and any CPU-related stack giving the overall application orders of magnitude of performance since they won’t suffer from I/O scheduling problems as CPU is removed from the equation.

Not true either (probably IB). Large cloud service providers like AWS and Azure have heavily invested in RDMA with RoCE given the benefits of it. In Azure for example, there are papers which show that 80%+ of their network traffic is driven by RDMA with RoCE which represented to them 60%+ cost savings in CPU resources and power usage on the workloads that use it. I really doubt it will be replaced in the next decade or so by anything. Besides, CXL state of affairs right now is a big mess with no considerable manufacturer of NICs wanting to support it. Support on major OSes is just non-existent. Don’t hold your breath on this.

Nice sources

I could believe on you if you shared some credible sources but… Broadcom?

Jokes aside, anyway, again, hardware wise, Mikrotik has everything they need to implement it. Market wise, they have a whole new industry from small to midsize business which can leverage the savings and power of RDMA network but often have to rely on switchless deployments (which is bad overall) or purchase switches like Dell Power which sometimes are more expensive than their own storage or compute gear, making them either move to switchless deployments (again, bad) or have to abort the usage of RDMA and keep untapped power because of this.

This is not a niche or “massive” data feature as you said. It is just a very optimized way of networking which customers of any size can leverage, specially because those don’t require any license or patent fees for anyone.

DarkNate · May 2, 2024, 12:06pm

I’ve worked with a large scale DC network that adopted a design similar to Google and Meta hyperscaler network design.

We strictly moved everything to layer 3 with eBGP, similar to this:
https://www.rfc-editor.org/rfc/rfc7938.html

“Ethernet” was simply limited only to the direct interconnection of devices. It never spanned beyond that, and if it did, it rode over layer 4 UDP of VXLAN encapsulation with EVPN control plane.

No FibreChannel, no STP, no VLANs (VXLAN/VNI/EVPN only).

Storage networking was 9k MTU over IP using Ceph.

For flexibility in TCP performance, you can opt for SCTP driven software applications instead. With the right dev team in-house, this shouldn’t be too difficult.

I’m surprised people are still moving backwards from layer 3 and still augmenting Ethernet.

galvesribeiro · May 2, 2024, 12:22pm

Thanks for the reply @DarkNate.

The thing is we can’t hand roll everything and change things like Windows Server or ESXI. They do support RDMA as it is a widely used standard and applications rely on it. Also, note that even tho the feature is called “Datacenter Bridge”, it does not mean necessarily large scale only scenarios. As I just mentioned, small and mid size business can (and want) to leverage it due to the cost savings and relatively cheap investment on NICs. The only expensive part right now is the freaking switch which is why I think that Mikrotik switches would be a great enabler for those scenarios since they are relatively cheap when compared with Dell, Cisco and other big brand switches.

But again, it is not a matter of “networking” per se. I understand that VXLAN would be awesome from a networking/infrastructure perspective. But the benefits of RDMA goes far beyond that. The Application have benefits of the NICs supporting it as I mentioned. Also, yeah, it is UDP as well in the end of the day with no retransmission and expect to be lossless.

DarkNate · May 2, 2024, 12:27pm

Like you said, it’s UDP at the end of the day. With a proper network architecture, you don’t need to bend over backwards for these “special” Ethernet NICs or switches or software.

Move to layer 3 driven networking, move to eBGP, move to VXLAN/EVPN and move your Ceph storage to end-to-end 9k MTU over IP for initial phases.

You can in fact go beyond UDP, if your in-house dev team or software vendor permitted true UDP-Lite protocol with partial check-sum. The partial check-sum permits layer 7 application to make decisions, making the network stateless as the network will forward even damaged payload, where the damage may be irrelevant as determined by the layer 7 application.

As someone else pointed out, the proper solution is a modern-day motherboard with modern-day interconnects for the CPU, PCIe etc, and CXL seems to be exactly just that.

galvesribeiro · May 2, 2024, 12:31pm

Oh sure, I know of the alternatives. All I’m saying is that there are widely adopted OSes that uses/require it.

The benefit of those special NICs goes beyond networking as I said. There are (big) CPU benefits on using it.

All I’m saying is that we can’t change this for a decade or so as those OSes will not move away for it and even if they do, they will still lose the biggest benefit (at least for me) which is the CPU/power savings.

DarkNate · May 2, 2024, 12:39pm

All OSes support layer 3 networking over IP with 9k MTU, I don’t see the problem. Windows, macOS, Linux based OS, *BSD based OS, what more do you need? UDP-Lite is native on macOS, Linux. Not sure about Windows or *BSD.

And for “special NICs”, what you’d want is a SmartNIC with XDP or DPDK hardware offloading support. Zero CPU usage, then.

Large companies all make use of DPDK/VPP and SmartNICs to keep CPU usage 0% for networking AND data transfers from host to host. No “Data Centre Bridge” crap at all. Again, with a proper network architecture, you don’t need to bend over for Ethernet.

I’ve said what I have to say to help you move away from Ethernet. But if you insist on Ethernet magic, then that is your choice. Good luck getting these for cheap. Even in SP networking, we’ve dumped Ethernet decades ago, everything is now MPLS with IP underlay and L2VPN.

Larsa · May 2, 2024, 1:54pm

@galvesribeiro

RoCE does work with any regular switch/router. However as I pointed out previously, efficiency regarding latency, flow control and buffering will of course vary depending on the environment.
RoCE simply transports regular Ethernet frames to another NIC using L2/L3. The receiving NIC’s device driver must be RDMA-aware possible using hardware for optimal performance.
SMB-Direct is protocol agnostic and doesn’t require RoCE.
Remote DMA (RDMA) is a general concept that involves extending DMA over an interconnect. It can be implemented using various technologies at L1/L2/L3 such as IB, OP, CXL, RoCE, iWARP, SMB-Direct, etc.
The InfiniBand consortium, Gen-Z, OpenCAPI and many others have joined forces to create a unified standard for ultra-high-performance interconnects called CXL. High-end products using CXL are already available on the market. While RoCE/IB will likely remain in use for some time to protect customer investments, CXL will proably supersede them in the long run.
Could you please specify the SoC (switch chip) you’re referring to that supposedly supports “RDMA”? Mikrotik might only implement flow control and congestion control mechanisms (PFC, ECN, ETS, DCTCP, etc) for RoCE if the SoC supports these features and the manufacturer provides the necessary drivers.

galvesribeiro · May 2, 2024, 1:56pm

@darknate

Ok. Thank you for the “help”.

What you are suggesting is not what I asked tbh. I understand the “alternative”, and highly agree with you on using NICs that support XDP or DPDK. However, this is not the case of the original question.

No “Data Centre Bridge” crap at all.

Not sure where the insults started but I’ll stop you right there. Respect is good and everyone likes it. The question was simple and directed to Mikrotik as a feature request. No need to be offensive.

Again, although I agree with you that there are alternatives for general purpose high performance networking, you can’t change the fact that extremely widely deployed software worldwide like Windows Server Storage Space Direct and VMWare ESXI with vSAN (“just” the biggest hypervisor deployed outside cloud worldwide), and dedicated SAN appliances does not use any of this but instead, uses the “crap” RDMA which btw, is what is used by major cloud service providers like AWS and Microsoft Azure.

What you are asking is the same as kill all RDBMS available or avoid it at all cost just because NoSQL exists. Makes no sense.

Again, Mikrotik hardware support it on most of their modern switch chips. A NIC starting with $15 Connect-X 3 all the way to a $10k Bluefield DPU NIC can do RDMA (being that RoCE/RoCEv2 or iWARP depending on the manufacturer). No need for special drivers and be a “SmartNIC” is not a requirement. It is extremely widely available.

There is no reason for Mikrotik to not support it on their software and would only benefit them, again, for small and medium business which want to have high speed networking without have a datacenter-level network engineer to setup eBGP, MPLS, VXLan you name it. On all the mentioned software it is nothing more than a toggle to leverage the benefits of RDMA and all it needs is the switch(es) to be compliant.

DarkNate · May 2, 2024, 2:00pm

I’m clearly “insulting” the “Data Centre Bridge” standard itself (which is not even human) and not YOU the person, but YOU seem self-centred enough to think it was about you.

Have fun with layer 2 networking and refusal to adopt proper network architecture because you refuse to have in-house network engineers or an MSP managing the “data centre level network engineer” network.

DarkNate · May 2, 2024, 2:01pm

RoCE does work with any regular switch/router. However as I pointed out previously, efficiency regarding latency, flow control and buffering will of course vary depending on the environment.

RoCE simply transports regular Ethernet frames to another NIC using L2/L3. The receiving NIC’s device driver must be RDMA-aware possible using hardware for optimal performance.

SMB-Direct is protocol agnostic and doesn’t require RoCE.

Remote DMA (RDMA) is a general concept that involves extending DMA over an interconnect. It can be implemented using various technologies at L1/L2/L3 such as IB, OP, CXL, RoCE, iWARP, SMB-Direct, etc.

The InfiniBand consortium, Gen-Z, OpenCAPI and many others have joined forces to create a unified standard for ultra-high-performance interconnects called CXL. High-end products using CXL are already available on the market. While RoCE/IB will likely remain in use for some time to protect customer investments, CXL will proably supersede them in the long run.

Could you please specify the SoC (switch chip) you’re referring to that supposedly supports “RDMA”? Mikrotik might only implement flow control and congestion control mechanisms (PFC, ECN, ETS, DCTCP, etc) for RoCE if the SoC supports these features and the manufacturer provides the necessary drivers.

@larsa I think you’re wasting your time. He’s clearly not even interested in basic network re-design with VXLAN/EVPN (as we are talking about data centre networks). What makes you think he’ll even consider 9k MTU with SMB-Direct?

Larsa · May 2, 2024, 2:14pm

@galvesribeiro

Well, it’s more like MikroTik hardware supports the most cost-effective chips. Which router/switch SoCs supports flow and congestion control like PFC, ECN, ETS, DCTCP, etc?

RDMA hardware accelaration in a NIC is simply a performance optimization. It can be implemented in the driver using pure software if the NIC lacks hardware support for it.

galvesribeiro · May 2, 2024, 2:31pm

@galvesribeiro

RoCE does work with any regular switch/router. However as I pointed out previously, efficiency regarding latency, flow control and buffering will of course vary depending on the environment.

SMB-Direct is protocol agnostic and doesn’t require RoCE.

Remote DMA (RDMA) is a general concept that involves extending DMA over an interconnect. It can be implemented using various technologies at L1/L2/L3 such as IB, OP, CXL, RoCE, iWARP, SMB-Direct, etc.

RoCE simply transports regular Ethernet frames to another NIC using L2/L3. The receiving NIC’s device driver must be RDMA-aware possible using hardware for optimal performance.

The InfiniBand consortium, Gen-Z, OpenCAPI and many others have joined forces to create a unified standard for ultra-high-performance interconnects called CXL. High-end products using CXL are already available on the market. While RoCE/IB will likely remain in use for some time to protect customer investments, CXL will proably supersede them in the long run.

Could you please specify the SoC (switch chip) you’re referring to that supposedly supports “RDMA”? Mikrotik might only implement flow control and congestion control mechanisms (PFC, ECN, ETS, DCTCP, etc) for RoCE if the SoC supports these features and the manufacturer provides the necessary drivers.

@larsa thanks for the reply again.

True, it does. But we lose the benefits of RDMA that I mentioned.

True. But nowadays when we talk RDMA we are basically talking about RoCE(v2) or perhaps (the old) iWARP for the majority of cases.

Look, let me rephrase what I said about CXL. I agree with you that it is “future”. What I don’t agree is that current deployments will change anytime soon for the next decade or so in order to make it mainstream. PCIe 4 was announced on 2011, released on 2017, and only got mainstream in 2021-2022. Now we are starting to see PCIe 5 servers but there are shenanigans all over the place with lots of production hardware being recalled due to signal integrity issues which causes the most variety of problems in production. Those consortiums take A LOT of time to make a unified standard, this is not news. Once they come up with CXL, and the industry adopt it, it will be 5-10 years from now before it becomes mainstream. I really hope it was sooner but we have to stick to the reality. There are billions of dollars of investment in Infiniband/RDMA networking and people will not just throw it away, not at least 1 deprecation cycle after CXL get mainstream (~5 years).

The Marvell DX4310 family/line of chips support all the standards required for RDMA with RoCE (not sure about iWARP tho). The 3 required standards (all other are optional) are on the original ports.

Just to give an example of the requirements on both popular Hypervisor/Storage/HCI solutions widely used and their requirements:

VMWare ESXI/vSphere and vSAN: https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.networking.doc/GUID-E4ECDD76-75D6-4974-A225-04D5D117A9CF.html
Windows Server Storage Space Direct/Azure Stack HCI: https://learn.microsoft.com/en-us/azure-stack/hci/concepts/physical-network-requirements?tabs=overview%2C23H2reqs

Both of them perform tests which requires the switch to be properly configured. Believe me, I tried use CRS518-16XS-2XQ-RM, CRS510-8XS-2XQ-IN and CRS504-4XQ-IN in the past with both MSFT and VMware solutions and both fail at the first test which is the PFC and nothing else is even tested so the feature is disabled. I truly wished that it would “just work” with commodity hardware as you said but in reality it doesn’t. I know the theory is that is should work, but it just doesn’t in practice, unfortunately.

Look, I’m not here for a fight and not trying to be offensive against anyone. I’m just asking for a feature request that would benefit a lot of people AND make Mikrotik to tackle a market which they currently are not in since people prefer to do switchless deployments when possible rather than buy really expensive Dell/Cisco switches. They are just leaving money on the table.