I have CRS504-4XQ-IN with 4x25 NICs (on port 1 with DAC breakout) (CX4121A-ACAT) and 3x100 NICs (on ports 2,3,4) CX455A-ECAT.
Currently i am running it with RouterOS 7.13, which does not officially support RDMA QoS, but everything runs fine on all NICs.
Recently i’ve decided to upgrade to RouterOS 7.20/7.21 in order to use newly announced QoS support (PFC, etc.) for RDMA/ROCE v2.
After the upgrade i’ve implemented the basic settings from here.
The issue is: the connection between 100G NICs is not working properly. Basically, when i perform disk writes from one of the two ESXi hosts to NVMe storage, the traffic just dies.
I tried to disable completely QoS, disable/enable some of the settings in PFC section - no change.
Strangely enough, connections on 25G ports (port 1 breakouts) work without any issues (only between 25G ports).
Initially i thought the issue is related to NICs or truenas box but after many hours testing different configurations i’ve concluded the issue is related to RouterOS version upgrade.
I had to revert to 7.13 in order to have my network working again.
So, is there anyone who already tried QoS in 7.20/7.21 with success?
/interface ethernet switch qos port
set qsfp28-1-1 egress-rate-queue3=25.0Gbps pfc=pfc-tc3 trust-l3=keep
set qsfp28-1-2 egress-rate-queue3=25.0Gbps pfc=pfc-tc3 trust-l3=keep
set qsfp28-1-3 egress-rate-queue3=25.0Gbps pfc=pfc-tc3 trust-l3=keep
set qsfp28-1-4 egress-rate-queue3=25.0Gbps pfc=pfc-tc3 trust-l3=keep
set qsfp28-2-1 egress-rate-queue3=100.0Gbps pfc=pfc-tc3 trust-l3=keep
set qsfp28-3-1 egress-rate-queue3=100.0Gbps pfc=pfc-tc3 trust-l3=keep
set qsfp28-4-1 egress-rate-queue3=100.0Gbps pfc=pfc-tc3 trust-l3=keep
/interface ethernet switch
set switch1 qos-hw-offloading=yes
/ip neighbor discovery-settings
set lldp-dcbx=yes
/interface ethernet
set [find switch=switch1] l2mtu=9500