Dear MikroTik team,
I would like to request that RouterOS backport the well-known Linux kernel patch series that addresses the Marvell DSA "MAC learning from CPU-injected frames" limitation. This issue is empirically reproducible on my CRS326-24G-2S+RM (Marvell Prestera 98DX3236, RouterOS 7.22.1) and severely degrades WiFi-roaming UX in any setup where clients move between bridge ports.
This is not a new issue
The same symptom has been reported on this forum at least since 2017 in the context of CARP-failover behind a CRS326 (CRS326-24G - Dual BSD firewalls, CARP, mac address learning and host timeout issues - General - MikroTik community forum), where users observed:
"the switch keeps sending the data with the shared MAC address to the original port (even though the IP has changed) until the host entry for that MAC expires. Since this can take around 5 minutes, it's obviously a huge problem."
What I am adding to that long-standing report is (a) confirmation that the same chip behavior bites WiFi-roaming clients, not only CARP-failover scenarios, and (b) the upstream Linux fix that addresses precisely this hardware limitation.
Symptom (reproduced with /tool sniffer)
When a client (WiFi station) roams between two bridge ports (e.g., between two access points connected on different physical ports of the CRS326), the chip's hardware FDB retains a stale entry pointing to the previous port for ~5–8 minutes. During this window:
- The client's broadcast ARP requests for the gateway reach the router CPU normally.
- The CPU generates ARP replies (visible via
/tool snifferon the bridge interface). - However, only ~5 of ~300 unicast ARP replies actually exit the chip toward the new port. The rest follow the stale CAM entry and are silently lost.
- Result: the client cannot reach the default gateway (and therefore the internet) until the chip's hardware aging eventually clears the entry.
The classic case: a phone roaming between two APs on the same SSID experiences ~5 minutes of "no internet" after each roam. Other LAN clients can still reach the phone (because their unicast egress goes through the chip's normal physical-port lookup, which the daemon-side arping can refresh), but CPU-originated unicast (e.g., the router's own ARP reply) keeps following the stale entry.
Linux upstream fix
This exact behavior is documented and resolved upstream in February 2021:
- Patch series: "DSA roaming fix for Marvell Link Street switch series"
- Mailing-list URL: https://lists.openwrt.org/pipermail/openwrt-devel/2021-February/033620.html
- Patchwork: https://patchwork.ozlabs.org/comment/2624496/
The patch description states verbatim:
"Marvell Link Street switch series cannot perform MAC learning from CPU-injected (FROM_CPU) DSA frames, which results in [excessive flooding and] the risk of stale routes, which can lead to temporary packet loss."
The fix listens to switchdev FDB notifications for addresses learned on foreign interfaces bridged with the switch and synchronizes the relevant MAC addresses to the hardware FDB with the CPU port as destination, bypassing the hardware learning limitation through software-assisted programming. It also sets the default VID to 1 in port_fdb_{add,del} operations.
While the upstream patch lives in the mv88e6xxx driver, the underlying chip behavior — and therefore the need for a software-side FDB sync — applies equally to the Marvell Prestera DX series (98DX3236 and friends) used in the CRS3xx product line. Several community references confirm this:
- https://lists.dent.dev/g/discuss-dev/topic/marvell_prestera_98dx3236/76402976
- https://forum.openwrt.org/t/support-for-mikrotik-switching-hardware-1g-10-crs-series-with-marvell-arm-32bit-98dx3236-soc-prestera/48977
What we tried in RouterOS (and why it does not help)
I attempted several RouterOS-side workarounds; none of them solve the CPU-egress side because RouterOS' bridge software FDB is not synced into the chip's hardware FDB:
- Adding static
/interface bridge host addentries on the new port (entries land in software FDB but the chip CAM keeps the stale dynamic entry). - Setting
hw=noon individual or several bridge ports (chip continues to make forwarding decisions for the still-hw=yesports). - Setting
hw=noon all bridge ports (breaks ICMP/TCP after a few minutes — separate issue, not a fix). - External arping from a third LAN host to the roamed client (this updates the chip's ingress learning correctly, restoring LAN→client unicast, but does not affect the CPU-egress lookup the router itself uses to send its ARP reply).
- Broadcast
/tool pingto subnet broadcast (wrong primitive — sends ICMP broadcast, does not constitute gratuitous ARP). - Custom raw-packet gratuitous-ARP injection from another LAN host claiming the router's identity (frame is on the wire but does not solve the chip's CAM staleness).
Request
Please consider porting the upstream Linux DSA FDB-sync logic — in particular the "sync software-bridge-learned addresses to hardware FDB with CPU port as destination" mechanism — into RouterOS' Marvell Prestera bridge offload. This would close the only remaining significant blocker for reliable WiFi roaming on CRS3xx hardware.
I am happy to provide:
- Full
/tool snifferPCAPs of the broken state (showing CPU-generated replies that never exit). - Comparison PCAPs from the AP-side
eth0(showing what does and does not arrive). - Configuration export of the affected setup.
Thank you for considering this.
Best regards,
Jan Rhebergen
CRS326-24G-2S+RM, RouterOS 7.22.1