QoS Hardware Offloading (QoS-HW)

Greetings, fellow community members!

We are glad to announce the beginning of a new project - Quality of Service Hardware Offloading (QoS-HW), introduced in RouterOS v7.10. The goal of the project is to perform QoS packet marking (VLAN PCP, IP DSCP, and in the future - MPLS EXP), traffic shaping, congestion avoidance/resolution, lossless forwarding, etc. - on the hardware level, which, in turn, means near-to-wire-speed performance.

Documentation: https://help.mikrotik.com/docs/pages/viewpage.action?pageId=189497483

The target devices are those based on Marvell Prestera® DX switch chips: MikroTik CRS3xx, CRS5xx series, CCR2116, and CCR2216. In other words, the devices that support L3HW will eventually support QoS-HW.

Your feedback is more than welcome! Please share your vision of QoS enforcement in RouterOS, use-cases, and setups. While the project is in the Beta phase, it is very flexible to adjust to community demand. Also, our QA engineers would like to perform QoS testing based on real setups rather than artificial test cases.

RouterOS v7.15 UPDATE: many QoS features have been implemented, including QoS enforcement, scheduling, active queue management, traffic shaping, etc. Check the documentation for details.

RouterOS v7.17 UPDATE: PFC is fixed, ECN enhanced, added DCB/LLDP. The compatible devices are RoCE-ready now.

while we are at it could you please confirm if 802.1ad tag stacking offloading in the switch chip is in the pipeline?, great news by the way

thanks you

thanks you

Very nice! The Marvell 90DX chips have QoS feature that will make a great feature for ROS. Is there a page in documentation for the setup? I didn’t see one.

Being able to classify traffic and leverage the switch chip queues would be beneficial. Prioritizing voice/video traffic is a good use case. I noticed the CRS309 has 8 hardware TX queues (tx-queue0-packet) but I didn’t see how to classify traffic to use each queue.

 /interface/ethernet/switch/port> print stats        
                 name:  sfp-sfpplus1  sfp-sfpplus2 sfp-sfpplus3 sfp-sfpplus4 sfp-sfpplus5 sfp-sfpplus6 sfp-sfpplus7  sfp-sfpplus8  ether1 switch1-cpu
       driver-rx-byte:             0   452 976 691            0            0            0            0            0 1 086 224 642 130 375
     driver-rx-packet:             0       773 965            0            0            0            0            0       945 811     689
       driver-tx-byte:             0 1 188 556 474            0            0            0            0            0   453 023 932  39 675
     driver-tx-packet:             0     1 055 283            0            0            0            0            0       675 260     214
             rx-bytes:             0   527 365 820            0            0            0            0            0 1 369 898 664 137 181           0
         rx-too-short:             0             0            0            0            0            0            0             0       0           0
          rx-too-long:             0             0            0            0            0            0            0             0       0           0
           rx-unicast:             0       912 412            0            0            0            0            0     1 185 318       7           0
         rx-broadcast:             0         1 120            0            0            0            0            0         1 218     209           0
             rx-pause:             0             0            0            0            0            0            0             0       0           0
         rx-multicast:             0         7 748            0            0            0            0            0         1 908     518           0
      rx-error-events:             0             0            0            0            0            0            0             0       0           0
         rx-fcs-error:             0             0            0            0            0            0            0             0       0           0
          rx-fragment:             0             0            0            0            0            0            0             0       0           0
          rx-overflow:             0             0            0            0            0            0            0             0       0           0
            rx-jabber:             0             0            0            0            0            0            0             0       0           0
             tx-bytes:             0 1 473 609 464            0            0            0            0            0   523 059 415  40 531           0
           tx-unicast:             0     1 295 077            0            0            0            0            0       818 688      12           0
         tx-broadcast:             0           115            0            0            0            0            0             3      99           0
             tx-pause:             0             0            0            0            0            0            0             0       0           0
         tx-multicast:             0         1 967            0            0            0            0            0             2     103           0
          tx-underrun:             0             0            0            0            0            0            0             0       0           0
         tx-collision:             0             0            0            0            0            0            0             0       0           0
    tx-late-collision:             0             0            0            0            0            0            0             0       0           0
              tx-drop:             0             0            0            0            0            0            0             0       0           0
             tx-rx-64:             0        39 293            0            0            0            0            0       101 104      63           0
         tx-rx-65-127:             0       714 147            0            0            0            0            0       432 308     101           0
        tx-rx-128-255:             0        71 389            0            0            0            0            0       152 504     647           0
        tx-rx-256-511:             0        29 741            0            0            0            0            0        23 809     137           0
       tx-rx-512-1023:             0        26 816            0            0            0            0            0        19 827       0           0
       tx-rx-1024-max:             0     1 337 053            0            0            0            0            0     1 277 585       0           0
     tx-queue0-packet:             0     1 297 159            0            0            0            0            0       807 319     214           0
     tx-queue1-packet:             0             0            0            0            0            0            0        11 374       0           0
     tx-queue2-packet:             0             0            0            0            0            0            0             0       0           0
     tx-queue3-packet:             0             0            0            0            0            0            0             0       0           0
     tx-queue4-packet:             0             0            0            0            0            0            0             0       0           0
     tx-queue5-packet:             0             0            0            0            0            0            0             0       0           0
     tx-queue6-packet:             0             0            0            0            0            0            0             0       0           0
     tx-queue7-packet:             0             0            0            0            0            0            0             0       0           0

i was tracking that for a while, now we know this is real

keep in mind

The current implementation is for QoS Phase 1 - QoS Marking (introduced in RouterOS v7.10).

so maybe we will have to wait some versions for that queues to be available

QoS Hardware Offloading (QoS-HW) will work in concur with L3 Hw Offload ?? or only in L2 setups?

From what I understand QoS-HW will work with both L2 and L3.

/interface ethernet switch qos map
add name=classify

/interface ethernet switch qos profile
add dscp=8 name=scavanger
add name=best-effort priority=1
add name=class1 priority=1
add name=class2 priority=2
add name=class3 priority=3
add name=class4 priority=4
add name=class5 priority=5
add name=class6 priority=6
add name=class7 priority=7

/interface ethernet switch qos map vlan
add qos-map=classify qos-profile=scavanger
add priority=1 qos-map=classify qos-profile=class1
add priority=2 qos-map=classify qos-profile=class2
add priority=3 qos-map=classify qos-profile=class3
add priority=4 qos-map=classify qos-profile=class4
add priority=5 qos-map=classify qos-profile=class5
add priority=6 qos-map=classify qos-profile=class6
add priority=7 qos-map=classify qos-profile=class7

/interface ethernet switch qos map ip
add dscp=0 qos-map=classify qos-profile=best-effort
add dscp=2 qos-map=classify qos-profile=class1
add dscp=4 qos-map=classify qos-profile=class1
add dscp=6 qos-map=classify qos-profile=class1
add dscp=8 qos-map=classify qos-profile=scavanger
add dscp=10 qos-map=classify qos-profile=scavanger
add dscp=12 qos-map=classify qos-profile=class1
add dscp=14 qos-map=classify qos-profile=class1
add dscp=16 qos-map=classify qos-profile=class1
add dscp=18 qos-map=classify qos-profile=class2
add dscp=20 qos-map=classify qos-profile=class2
add dscp=22 qos-map=classify qos-profile=class2
add dscp=24 qos-map=classify qos-profile=class2
add dscp=26 qos-map=classify qos-profile=class3
add dscp=28 qos-map=classify qos-profile=class3
add dscp=30 qos-map=classify qos-profile=class3
add dscp=32 qos-map=classify qos-profile=class3
add dscp=34 qos-map=classify qos-profile=class3
add dscp=36 qos-map=classify qos-profile=class4
add dscp=38 qos-map=classify qos-profile=class4
add dscp=40 qos-map=classify qos-profile=class4
add dscp=42 qos-map=classify qos-profile=class4
add dscp=44 qos-map=classify qos-profile=class4
add dscp=46 qos-map=classify qos-profile=class5
add dscp=48 qos-map=classify qos-profile=class5
add dscp=50 qos-map=classify qos-profile=class5
add dscp=52 qos-map=classify qos-profile=class5
add dscp=54 qos-map=classify qos-profile=class5
add dscp=56 qos-map=classify qos-profile=class6
add dscp=58 qos-map=classify qos-profile=class7
add dscp=60 qos-map=classify qos-profile=class7
add dscp=62 qos-map=classify qos-profile=class7

/interface/ethernet/switch
port/set sfp-sfpplus1 qos-map=classify qos-trust-l2=keep qos-trust-l3=keep
port/set sfp-sfpplus2 qos-map=classify qos-trust-l2=keep qos-trust-l3=keep
...

I tested in a CRS 317 Version 7.6 with L3 HW Offload, then Ingress ACL for rate limiting does not work, but in L2 Bridge Vlan Filtering Ingress ACL for rate limiting works OK

that’s why i’m asking

QoS-HW is compatible with L3HW. You can use both features together.

Every supported device has 8 TX queues per port, and users will be able to assign QoS profiles to TX queues: either grant a QoS profile exclusive access to a queue or share a queue (or group of queues) between multiple profiles. The feature is still in development and not available yet. Meanwhile, all QoS profiles share all TX queues, i.e., there is no QoS enforcement yet.

What about the ability to do hardware QoS at arbitrary rates less than line rate? I’m thinking specifically about situations where one might buy a circuit from an upstream provider that is something that isn’t a normal Ethernet speed like 200Mbps or something.

and what about VXLAN en/decapsulation in hardware? Marvell chips support it.

Hello, this is interesting news. Are there plans to use the new feature for dynamic queues, e.g. with a PPPoE server on a CCR2216?

Thanks

When you are working on this anyway, please consider the following:

  1. allow to set software queue priority directly from DSCP. now, there is a detour required: set “priority” from DSCP, then set “packet mark” from priority, then assign queue priority based on packet mark. However, Linux can directly assign queue priority from the “priority” field, possibly gaining speed and also freeing the packet mark for other use.
  2. implement a table for DSCP to priority mapping. “set priority from DSCP high 3 bits” usually is OK, but due to the strange design of DSCP it is not always correct. especially the AFxx DSCP values often map incorrectly.

i think this features are more focused towards industry switching qos functionalities

most equipment is not able to inspect or remark encapsulated PPPoE traffic, yes is another disadvantage of PPPoE

aditionally i am not sure PPPoE Server can benefit of L3 Hardware Offload at all

Hello,

I’m also curious about the answer to this question.

What are your plans in this regard? This is very important.

Maybe not this hardware, but actually some of the SoC do implement PPPoE hardware offloading, as it is a common use case for consumer routers.
However, it does not look like RouterOS uses it.

Using hardware QoS for bandwidth limitation is a natural next step for the project, so the feature will likely be implemented in the future. However, the current goals must be met first. The main goal of QoS HW is to provide lossless audio/video switching and (together with L3HW) routing at near-to-wire-speed rates. However, we want to think “out of the box” and make QoS HW as flexible as possible to cover a wider range of possible usage scenarios.

Keep in mind that Hardware QoS enforcement has nothing in common with software queues! That’s why we specifically put the entire QoS section under the Switch menu (currently - in CLI, later - in WinBox/WebFig) as a hint that the feature is hardware-only. QoS HW has limited functionality but provides near-to-wire-speed performance, while software queues have greater flexibility at the cost of CPU power. While you can use both features (by redirecting some traffic to the CPU for software processing), those are two entirely different things.

P.S. Please stay on the topic! If you have a question about other hardware features, create a separate thread.

Using hardware QoS for bandwidth limitation is a natural next step for the project, so the feature will likely be implemented in the future. However, the current goals must be met first. The main goal of QoS HW is to provide lossless audio/video switching and (together with L3HW) routing at near-to-wire-speed rates. However, we want to think “out of the box” and make QoS HW as flexible as possible to cover a wider range of possible usage scenarios.

Keep in mind that Hardware QoS enforcement has nothing in common with software queues! That’s why we specifically put the entire QoS section under the Switch menu (currently - in CLI, later - in WinBox/WebFig) as a hint that the feature is hardware-only. QoS HW has limited functionality but provides near-to-wire-speed performance, while software queues have greater flexibility at the cost of CPU power. While you can use both features (by redirecting some traffic to the CPU for software processing), those are two entirely different things.

P.S. Please stay on the topic! If you have a question about other hardware features, create a separate thread.

If only MikroTik would do that themselves. I.e. finish the unfinished features in v7 before starting a new one.