CRS504-4XQ-IN - iscsi over rdma (iser) not working properly after upgrade to 7.20/7.21

Hi guys,

I have CRS504-4XQ-IN with 4x25 NICs (on port 1 with DAC breakout) (CX4121A-ACAT) and 3x100 NICs (on ports 2,3,4) CX455A-ECAT.

Currently i am running it with RouterOS 7.13, which does not officially support RDMA QoS, but everything runs fine on all NICs.

Recently i’ve decided to upgrade to RouterOS 7.20/7.21 in order to use newly announced QoS support (PFC, etc.) for RDMA/ROCE v2.

After the upgrade i’ve implemented the basic settings from here.

The issue is: the connection between 100G NICs is not working properly. Basically, when i perform disk writes from one of the two ESXi hosts to NVMe storage, the traffic just dies.

I tried to disable completely QoS, disable/enable some of the settings in PFC section - no change.

Strangely enough, connections on 25G ports (port 1 breakouts) work without any issues (only between 25G ports).

Initially i thought the issue is related to NICs or truenas box but after many hours testing different configurations i’ve concluded the issue is related to RouterOS version upgrade.

I had to revert to 7.13 in order to have my network working again.

So, is there anyone who already tried QoS in 7.20/7.21 with success?

Forgot to add error logs from both ESXi 8.0.3 and truenas community 25.04.2.4.

ESXi 8.0.3:

2025-10-10T07:49:50.758Z cpu26:2097478)iser: iser_ScsiTaskMgmt: path vmhba68:0:1:41 TaskMgmt 0x45394a31bed8 invoked abort on CmdSN 0x4c90
2025-10-10T07:49:50.758Z cpu26:2097478)iser: iser_AbortCommands: session 0x4320cd62e6a0 taskMgmt 0x45394a31bed8
2025-10-10T07:49:50.759Z cpu3:2097993)iser: iser_ScsiTaskMgmt: path vmhba68:0:1:41 TaskMgmt 0x4308b927e0d0 invoked virt reset on CmdSN 0x0
2025-10-10T07:49:50.759Z cpu3:2097993)iser: iser_AbortCommands: session 0x4320cd62e6a0 taskMgmt 0x4308b927e0d0
2025-10-10T07:49:50.759Z cpu3:2097993)iser: iser_ScsiTaskMgmt: path vmhba68:0:1:41 TaskMgmt 0x4308b927e118 invoked virt reset on CmdSN 0x0
2025-10-10T07:49:50.759Z cpu3:2097993)iser: iser_AbortCommands: session 0x4320cd62e6a0 taskMgmt 0x4308b927e118
2025-10-10T07:49:50.759Z cpu3:2097993)iser: iser_AbortCommands: session 0x4320cd62e6a0 failing sc 0x45b9ed355bc0 itt 0x64 state 3 status 1
2025-10-10T07:49:50.759Z cpu3:2097993)iser: iser_SendMgmtTask: session: 0x4320cd62e6a0 tmf set timeout

Truenas 25.04.2.4:

[ 549.075421] [4432]: scst: TM fn 0 (mcmd 00000000bc9e8834) finished, status -1
[ 549.075428] [7314]: scst: Aborted cmd 000000003cf81aa5 finished (tag 21, ref 1)
[ 549.075435] [4432]: iscsi-scst: iSCSI TM fn 1 finished, status 0, dropped 0
[ 549.075512] [411]: iscsi-scst: iSCSI TM fn 1

[ 549.075548] [411]: scst: Aborted cmd 00000000832e648f finished (tag 2, ref 3)
[ 549.075560] [4432]: scst: TM fn 0 (mcmd 00000000d95594bd) finished, status 0
[ 549.075564] [4432]: iscsi-scst: iSCSI TM fn 1 finished, status 0, dropped 0
[ 549.075619] [411]: iscsi-scst: iSCSI TM fn 1

Have you set up your switch like so: Quality of Service - RouterOS - MikroTik Documentation ?

Yes, my configuration is as follows:

/interface ethernet switch qos profile
add name=roce traffic-class=3
add name=cnp traffic-class=6

/interface ethernet switch qos map ip
add dscp=26 profile=roce
add dscp=48 profile=cnp

/interface ethernet switch qos tx-manager queue
set 1 schedule=high-priority-group weight=1
set 3 schedule=high-priority-group weight=1 shared-pool-index=1 ecn=yes
set 6 schedule=strict-priority

/interface ethernet switch qos priority-flow-control
add name=pfc-tc3 rx=yes traffic-class=3 tx=yes

/interface ethernet switch qos port
set qsfp28-1-1 egress-rate-queue3=25.0Gbps pfc=pfc-tc3 trust-l3=keep
set qsfp28-1-2 egress-rate-queue3=25.0Gbps pfc=pfc-tc3 trust-l3=keep
set qsfp28-1-3 egress-rate-queue3=25.0Gbps pfc=pfc-tc3 trust-l3=keep
set qsfp28-1-4 egress-rate-queue3=25.0Gbps pfc=pfc-tc3 trust-l3=keep
set qsfp28-2-1 egress-rate-queue3=100.0Gbps pfc=pfc-tc3 trust-l3=keep
set qsfp28-3-1 egress-rate-queue3=100.0Gbps pfc=pfc-tc3 trust-l3=keep
set qsfp28-4-1 egress-rate-queue3=100.0Gbps pfc=pfc-tc3 trust-l3=keep

/interface ethernet switch
set switch1 qos-hw-offloading=yes

/ip neighbor discovery-settings
set lldp-dcbx=yes

/interface ethernet
set [find switch=switch1] l2mtu=9500

I’m running a similar setup here:

config
/interface ethernet switch qos priority-flow-control
add name=pfc-tc3 rx=yes traffic-class=3 tx=yes
/interface ethernet switch qos port
set qsfp28-1-1 egress-rate-queue3=100.0Gbps pfc=pfc-tc3 trust-l3=keep
set qsfp28-2-1 egress-rate-queue3=100.0Gbps pfc=pfc-tc3 trust-l3=keep
set qsfp28-3-1 egress-rate-queue3=25.0Gbps pfc=pfc-tc3 trust-l3=keep
set qsfp28-3-2 egress-rate-queue3=25.0Gbps pfc=pfc-tc3 trust-l3=keep
/interface ethernet switch qos profile
add name=roce traffic-class=3
add name=cnp traffic-class=6
/interface ethernet switch
set 0 qos-hw-offloading=yes
/interface ethernet switch qos map ip
add dscp=26 profile=roce
add dscp=48 profile=cnp
/interface ethernet switch qos tx-manager queue
set 1 schedule=high-priority-group weight=1
set 3 ecn=yes shared-pool-index=1 weight=1

currently on 7.20 so far without any problems. I don’t however have any ESXi in my network.

best

1 Like