[Feature request] Backport upstream Linux DSA-roaming patch for Marvell Prestera CAM (CRS3xx 98DX3236)

Dear MikroTik team,

I would like to request that RouterOS backport the well-known Linux kernel patch series that addresses the Marvell DSA "MAC learning from CPU-injected frames" limitation. This issue is empirically reproducible on my CRS326-24G-2S+RM (Marvell Prestera 98DX3236, RouterOS 7.22.1) and severely degrades WiFi-roaming UX in any setup where clients move between bridge ports.

This is not a new issue

The same symptom has been reported on this forum at least since 2017 in the context of CARP-failover behind a CRS326 (CRS326-24G - Dual BSD firewalls, CARP, mac address learning and host timeout issues - General - MikroTik community forum), where users observed:

"the switch keeps sending the data with the shared MAC address to the original port (even though the IP has changed) until the host entry for that MAC expires. Since this can take around 5 minutes, it's obviously a huge problem."

What I am adding to that long-standing report is (a) confirmation that the same chip behavior bites WiFi-roaming clients, not only CARP-failover scenarios, and (b) the upstream Linux fix that addresses precisely this hardware limitation.

Symptom (reproduced with /tool sniffer)

When a client (WiFi station) roams between two bridge ports (e.g., between two access points connected on different physical ports of the CRS326), the chip's hardware FDB retains a stale entry pointing to the previous port for ~5–8 minutes. During this window:

  • The client's broadcast ARP requests for the gateway reach the router CPU normally.
  • The CPU generates ARP replies (visible via /tool sniffer on the bridge interface).
  • However, only ~5 of ~300 unicast ARP replies actually exit the chip toward the new port. The rest follow the stale CAM entry and are silently lost.
  • Result: the client cannot reach the default gateway (and therefore the internet) until the chip's hardware aging eventually clears the entry.

The classic case: a phone roaming between two APs on the same SSID experiences ~5 minutes of "no internet" after each roam. Other LAN clients can still reach the phone (because their unicast egress goes through the chip's normal physical-port lookup, which the daemon-side arping can refresh), but CPU-originated unicast (e.g., the router's own ARP reply) keeps following the stale entry.

Linux upstream fix

This exact behavior is documented and resolved upstream in February 2021:

The patch description states verbatim:

"Marvell Link Street switch series cannot perform MAC learning from CPU-injected (FROM_CPU) DSA frames, which results in [excessive flooding and] the risk of stale routes, which can lead to temporary packet loss."

The fix listens to switchdev FDB notifications for addresses learned on foreign interfaces bridged with the switch and synchronizes the relevant MAC addresses to the hardware FDB with the CPU port as destination, bypassing the hardware learning limitation through software-assisted programming. It also sets the default VID to 1 in port_fdb_{add,del} operations.

While the upstream patch lives in the mv88e6xxx driver, the underlying chip behavior — and therefore the need for a software-side FDB sync — applies equally to the Marvell Prestera DX series (98DX3236 and friends) used in the CRS3xx product line. Several community references confirm this:

What we tried in RouterOS (and why it does not help)

I attempted several RouterOS-side workarounds; none of them solve the CPU-egress side because RouterOS' bridge software FDB is not synced into the chip's hardware FDB:

  • Adding static /interface bridge host add entries on the new port (entries land in software FDB but the chip CAM keeps the stale dynamic entry).
  • Setting hw=no on individual or several bridge ports (chip continues to make forwarding decisions for the still-hw=yes ports).
  • Setting hw=no on all bridge ports (breaks ICMP/TCP after a few minutes — separate issue, not a fix).
  • External arping from a third LAN host to the roamed client (this updates the chip's ingress learning correctly, restoring LAN→client unicast, but does not affect the CPU-egress lookup the router itself uses to send its ARP reply).
  • Broadcast /tool ping to subnet broadcast (wrong primitive — sends ICMP broadcast, does not constitute gratuitous ARP).
  • Custom raw-packet gratuitous-ARP injection from another LAN host claiming the router's identity (frame is on the wire but does not solve the chip's CAM staleness).

Request

Please consider porting the upstream Linux DSA FDB-sync logic — in particular the "sync software-bridge-learned addresses to hardware FDB with CPU port as destination" mechanism — into RouterOS' Marvell Prestera bridge offload. This would close the only remaining significant blocker for reliable WiFi roaming on CRS3xx hardware.

I am happy to provide:

  • Full /tool sniffer PCAPs of the broken state (showing CPU-generated replies that never exit).
  • Comparison PCAPs from the AP-side eth0 (showing what does and does not arrive).
  • Configuration export of the affected setup.

Thank you for considering this.

Best regards,
Jan Rhebergen
CRS326-24G-2S+RM, RouterOS 7.22.1

2 Likes

Mikrotik guys only look at the forum from time to time, and not necessarily consider these kind of suggestions, you should make the request on the support portal:

or however contacting them directly:support@mikrotik.com-

2 Likes

u sure?

Thanks pal! I have since posted this message there as well. This bug/feature with the Marvell chip has bitten me and took me forever to research. Finally noted the patch that's available! :smiley:

That post is obviously heavily AI assisted, and that patch has little meaning in RouterOS since I highly doubt that RouterOS even uses that specific driver.

Did you try to report the bug the usual way to the MikroTik team (in this "since forever" timeline) so that they could take a look at it internally?

Can you describe in simple terms what the problem is?

I understand you use your CRS326 as router “CPU” and in this specific condition around 5 of 300 ARP-requests are sent out?! End-devices have 5 minutes no internet after each roam. I would think MT would have received thousands of reports of this issue.

1 Like

I doubt this!

1 Like