MLAG - The traffic can no longer be rerouted after the cable outage

Hello,
Feel free to ask questions about the issue if I have forgotten to provide any important information.
Thank you in advance for your help and suggestions.

Infrastructure:

I have 2 switchs with two MLAGS.

  • Switch A (SA): CRS312-4C+8XG - version: 7.19.1 (stable)
  • Switch B (SB): CRS326-24S+2Q+ - version: 7.20.4 (stable)

I have two Ubuntu servers that act as client A and client B. (CA and CB)

My Peer link: SA: ether1 ←> SB: sfp-sfpplus1

CA is connected to SA and SB: SA: ether3 ; SB: sfp-sfpplus2 (client-bond1)
CB is connected to SA and SB: SA: ether4; SB: sfp-sfpplus10 (client-bond2)

I disabled hardware offload on the LACP ports on the switches in order to be able to see the ARP requests. (ether3, ether4, sfp-sfpplus2,sfp-sfpplus10)

What I expect:

When I disconnect the cable between CA and SA, which is the cable used by my ping initiated from CB to CA, the ping should be continue seamlessly.

What I get:

When I disconnect the cable between CA and SA, my ping is interrupted and I get a timeout. The ARP Requests sent by CB for CA do reach CA, but the ARP Reply cannot find a path back.

With reference to the diagram in the infrastructure section, CB sends an ARP Request on the CB→SA link. SA then forwards this request over the peer link SA→SB. SB broadcasts the request on its bridge and it enters the SB→CA bond. The CA server replies with an ARP Reply over the CA→SB link. In its Forwarding Database, SB sees the MAC address of CB’s LACP interface on its local port. Therefore, it forwards the ARP Reply from the CA–SB bond to the local CB–SB bond, but SB drops the packet before sending it out on the physical interface of the CB–SB bond.

From the servers CA and CB, I use tcpdump to see the traffics.
From the switch SA and SB I use the tool sniffer: /tool/sniffer/quick mac-protocol=arp interface=ether1,client-bond,client-bond2,ether3,ether4,bridge1

I have already tried setting the Spanning Tree protocol to None, but nothing changed.

Here the information I get from the sniffer: (We can wee the ARP Reply is drop after client-bond2 logical interface and doesn’t go on the physical interface sfp-sfpplus10)

INTERFACE     TIME   NUM  DIR  SRC-MAC            DST-MAC            VLAN  SRC-ADDRESS                          PROTOCOL  SIZE  CPU
client-bond   0.361    3  ->   D4:AE:52:92:ED:B7  FF:FF:FF:FF:FF:FF        192.168.42.2: who has 192.168.42.1?  arp         60    0
sfp-sfpplus2  0.361    5  ->   D4:AE:52:92:ED:B7  FF:FF:FF:FF:FF:FF        192.168.42.2: who has 192.168.42.1?  arp         60    0
bridge1       0.361    1  <-   D4:AE:52:92:ED:B7  FF:FF:FF:FF:FF:FF    42  192.168.42.2: who has 192.168.42.1?  arp         64    0
sfp-sfpplus2  0.361    6  <-   D4:AE:52:92:ED:BA  D4:AE:52:92:ED:B7        192.168.42.1: at D4:AE:52:92:ED:BA   arp         60    0
client-bond   0.361    4  <-   D4:AE:52:92:ED:BA  D4:AE:52:92:ED:B7        192.168.42.1: at D4:AE:52:92:ED:BA   arp         60    0
client-bond2  0.361    2  ->   D4:AE:52:92:ED:BA  D4:AE:52:92:ED:B7        192.168.42.1: at D4:AE:52:92:ED:BA   arp         60    0
sfp-sfpplus1  1.385   13  <-   D4:AE:52:92:ED:B7  FF:FF:FF:FF:FF:FF    42  192.168.42.2: who has 192.168.42.1?  arp         64    0
client-bond   1.385    9  ->   D4:AE:52:92:ED:B7  FF:FF:FF:FF:FF:FF        192.168.42.2: who has 192.168.42.1?  arp         60    0
sfp-sfpplus2  1.385   11  ->   D4:AE:52:92:ED:B7  FF:FF:FF:FF:FF:FF        192.168.42.2: who has 192.168.42.1?  arp         60    0
bridge1       1.385    8  <-   D4:AE:52:92:ED:B7  FF:FF:FF:FF:FF:FF    42  192.168.42.2: who has 192.168.42.1?  arp         64    0
sfp-sfpplus2  1.385   12  <-   D4:AE:52:92:ED:BA  D4:AE:52:92:ED:B7        192.168.42.1: at D4:AE:52:92:ED:BA   arp         60    0
client-bond   1.385   10  <-   D4:AE:52:92:ED:BA  D4:AE:52:92:ED:B7        192.168.42.1: at D4:AE:52:92:ED:BA   arp         60    0
client-bond2  1.385   14  ->   D4:AE:52:92:ED:BA  D4:AE:52:92:ED:B7        192.168.42.1: at D4:AE:52:92:ED:BA   arp         60    0

The Forward Database on my bridge after I disconnected the physical interface CA→SA (ether3):

SA:

[admin@peer1] > /interface/bridge/host/print
Flags: D - DYNAMIC; L - LOCAL; E - EXTERNAL
Columns: MAC-ADDRESS, VID, ON-INTERFACE, BRIDGE
#     MAC-ADDRESS        VID  ON-INTERFACE  BRIDGE 
0 DL  2C:C8:1B:9B:15:A5       client-bond2  bridge1
1 D E F4:1E:57:88:74:91       ether1        bridge1
2 D E F4:1E:57:88:74:99       client-bond2  bridge1
3 D E F4:1E:57:88:74:91    1  ether1        bridge1
4 DL  2C:C8:1B:9B:15:A2   42  ether1        bridge1
5 DL  2C:C8:1B:9B:15:A5   42  client-bond2  bridge1
6 D E D4:AE:52:92:ED:B7   42  client-bond2  bridge1
7 D E D4:AE:52:92:ED:BA   42  ether1        bridge1
8 D E F4:1E:57:88:74:91   42  ether1        bridge1
9 D E F4:1E:57:88:74:99   42  client-bond2  bridge1

SB:

[admin@peer2] > /interface/bridge/host/print
Flags: D - DYNAMIC; L - LOCAL; E - EXTERNAL
Columns: MAC-ADDRESS, VID, ON-INTERFACE, BRIDGE
#     MAC-ADDRESS        VID  ON-INTERFACE  BRIDGE 
0 DL  F4:1E:57:88:74:91       client-bond   bridge1
1 DL  F4:1E:57:88:74:99       client-bond2  bridge1
2 D E D4:AE:52:92:ED:B7   42  client-bond2  bridge1
3 D E D4:AE:52:92:ED:BA   42  client-bond   bridge1
4 DL  F4:1E:57:88:74:90   42  sfp-sfpplus1  bridge1
5 DL  F4:1E:57:88:74:91   42  client-bond   bridge1
6 DL  F4:1E:57:88:74:99   42  client-bond2  bridge1

My configuration:

SA:

[admin@peer1] > /interface/bridge/print 
Flags: D - dynamic; X - disabled, R - running 
 0  R name="bridge1" mtu=auto actual-mtu=1500 l2mtu=1584 arp=enabled arp-timeout=auto mac-address=2C:C8:1B:9B:15:A2 protocol-mode=rstp fast-forward=yes igmp-snooping=no auto-mac=yes 
      ageing-time=5m priority=0x9000 max-message-age=20s forward-delay=15s transmit-hold-count=6 vlan-filtering=yes ether-type=0x8100 pvid=1 frame-types=admit-all ingress-filtering=yes 
      dhcp-snooping=no port-cost-mode=long mvrp=no max-learned-entries=auto 

[admin@peer1] > /interface/bridge/port/print detail
Flags: X - disabled, I - inactive; D - dynamic; H - hw-offload 
 0   H interface=ether1 bridge=bridge1 priority=0x80 edge=auto point-to-point=auto learn=auto horizon=none hw=yes auto-isolate=no restricted-role=no restricted-tcn=no pvid=1 
       frame-types=admit-all ingress-filtering=yes unknown-unicast-flood=yes unknown-multicast-flood=yes broadcast-flood=yes tag-stacking=no bpdu-guard=no trusted=no 
       mvrp-registrar-state=normal mvrp-applicant-state=normal-participant multicast-router=temporary-query fast-leave=no 

 1     interface=client-bond bridge=bridge1 priority=0x80 edge=auto point-to-point=auto learn=auto horizon=none hw=no auto-isolate=no restricted-role=no restricted-tcn=no pvid=42 
       frame-types=admit-all ingress-filtering=yes unknown-unicast-flood=yes unknown-multicast-flood=yes broadcast-flood=yes tag-stacking=no bpdu-guard=no trusted=no 
       mvrp-registrar-state=normal mvrp-applicant-state=normal-participant multicast-router=temporary-query fast-leave=no 

 2     interface=client-bond2 bridge=bridge1 priority=0x80 edge=auto point-to-point=auto learn=auto horizon=none hw=no auto-isolate=no restricted-role=no restricted-tcn=no pvid=42 
       frame-types=admit-all ingress-filtering=yes unknown-unicast-flood=yes unknown-multicast-flood=yes broadcast-flood=yes tag-stacking=no bpdu-guard=no trusted=no 
       mvrp-registrar-state=normal mvrp-applicant-state=normal-participant multicast-router=temporary-query fast-leave=no  

[admin@peer1] > /interface/bridge/vlan/print detail
Flags: X - disabled, D - dynamic 
 0   bridge=bridge1 vlan-ids=42 tagged=ether1 untagged=client-bond,client-bond2 mvrp-forbidden="" current-tagged=ether1 current-untagged=client-bond2,client-bond 

 1 D ;;; added by pvid
     bridge=bridge1 vlan-ids=1 tagged="" untagged=bridge1,ether1 mvrp-forbidden="" current-tagged="" current-untagged=bridge1,ether1 

[admin@peer1] > /interface/bridge/mlag monitor 
       status: connected        
    system-id: 2C:C8:1B:9B:15:A2
  active-role: primary  

[admin@peer1] > /interface/bonding/print detail
Flags: X - disabled; R - running 
 0    name="client-bond" mtu=1500 mac-address=2C:C8:1B:9B:15:A4 arp=enabled arp-timeout=auto slaves=ether3 mode=802.3ad primary=none link-monitoring=mii arp-interval=100ms arp-ip-targets=">
      mii-interval=100ms down-delay=0ms up-delay=0ms lacp-rate=1sec transmit-hash-policy=layer-2 min-links=0 mlag-id=10 lacp-mode=active 

 1  R name="client-bond2" mtu=1500 mac-address=2C:C8:1B:9B:15:A5 arp=enabled arp-timeout=auto slaves=ether4 mode=802.3ad primary=none link-monitoring=mii arp-interval=100ms 
      arp-ip-targets="" mii-interval=100ms down-delay=0ms up-delay=0ms lacp-rate=1sec transmit-hash-policy=layer-2 min-links=0 mlag-id=20 lacp-mode=active 

SB:

[admin@peer2] > /interface/bridge/print
Flags: D - dynamic; X - disabled, R - running 
 0  R name="bridge1" mtu=auto actual-mtu=1500 l2mtu=1584 arp=enabled arp-timeout=auto mac-address=F4:1E:57:88:74:90 protocol-mode=rstp fast-forward=yes igmp-snooping=no auto-mac=yes 
      ageing-time=5m priority=0x9000 max-message-age=20s forward-delay=15s transmit-hold-count=6 vlan-filtering=yes ether-type=0x8100 pvid=1 frame-types=admit-all ingress-filtering=yes 
      dhcp-snooping=no port-cost-mode=long mvrp=no max-learned-entries=auto 

[admin@peer2] > /interface/bridge/port/print detail
Flags: X - disabled, I - inactive; D - dynamic; H - hw-offload 
 0   H interface=sfp-sfpplus1 bridge=bridge1 priority=0x80 edge=auto point-to-point=auto learn=auto horizon=none hw=yes auto-isolate=no restricted-role=no restricted-tcn=no pvid=1 
       frame-types=admit-all ingress-filtering=yes unknown-unicast-flood=yes unknown-multicast-flood=yes broadcast-flood=yes tag-stacking=no bpdu-guard=no trusted=no 
       mvrp-registrar-state=normal mvrp-applicant-state=normal-participant multicast-router=temporary-query fast-leave=no 

 1     interface=client-bond bridge=bridge1 priority=0x80 edge=auto point-to-point=auto learn=auto horizon=none hw=no auto-isolate=no restricted-role=no restricted-tcn=no pvid=42 
       frame-types=admit-all ingress-filtering=no unknown-unicast-flood=yes unknown-multicast-flood=yes broadcast-flood=yes tag-stacking=no bpdu-guard=no trusted=no 
       mvrp-registrar-state=normal mvrp-applicant-state=normal-participant multicast-router=temporary-query fast-leave=no 

 2     interface=client-bond2 bridge=bridge1 priority=0x80 edge=auto point-to-point=auto learn=auto horizon=none hw=no auto-isolate=no restricted-role=no restricted-tcn=no pvid=42 
       frame-types=admit-all ingress-filtering=no unknown-unicast-flood=yes unknown-multicast-flood=yes broadcast-flood=yes tag-stacking=no bpdu-guard=no trusted=no 
       mvrp-registrar-state=normal mvrp-applicant-state=normal-participant multicast-router=temporary-query fast-leave=no 

[admin@peer2] > /interface/bridge/vlan/print detail
Flags: X - disabled, D - dynamic 
 0   bridge=bridge1 vlan-ids=42 tagged=sfp-sfpplus1 untagged=client-bond,client-bond2 mvrp-forbidden="" current-tagged=sfp-sfpplus1 current-untagged=client-bond,client-bond2 

 1 D ;;; added by pvid
     bridge=bridge1 vlan-ids=1 tagged="" untagged=bridge1,sfp-sfpplus1 mvrp-forbidden="" current-tagged="" current-untagged=bridge1,sfp-sfpplus1 

 2 D ;;; added by switch-cpu
     bridge=bridge1 vlan-ids=42 tagged=bridge1 untagged="" mvrp-forbidden="" current-tagged=bridge1 current-untagged="" 

[admin@peer2] > /interface/bridge/mlag monitor
       status: connected        
    system-id: 2C:C8:1B:9B:15:A2
  active-role: secondary       
 
[admin@peer2] > /interface/bonding/print detail
Flags: X - disabled; R - running 
 0  R name="client-bond" mtu=1500 mac-address=F4:1E:57:88:74:91 arp=enabled arp-timeout=auto slaves=sfp-sfpplus2 mode=802.3ad primary=none link-monitoring=mii arp-interval=100ms 
      arp-ip-targets="" mii-interval=100ms down-delay=0ms up-delay=0ms lacp-rate=1sec transmit-hash-policy=layer-2 min-links=0 mlag-id=10 lacp-mode=active 

 1  R name="client-bond2" mtu=1500 mac-address=F4:1E:57:88:74:99 arp=enabled arp-timeout=auto slaves=sfp-sfpplus10 mode=802.3ad primary=none link-monitoring=mii arp-interval=100ms 
      arp-ip-targets="" mii-interval=100ms down-delay=0ms up-delay=0ms lacp-rate=1sec transmit-hash-policy=layer-2 min-links=0 mlag-id=20 lacp-mode=active 

Steps to reproduce the problem:

Configure a peer link mlag between two switchs.

On each switch, create two LACPs bonds.

Initiate a ping from CB to CA. Make sure that the ping is successfully passing between CB → SA and SA → CA.

Disconnect or disable the physical interface for SA → CA.

Your will see an ARP Reply dropped on SB between the logical bond interface and the physical interface.

Thank you for taking the time to read this,

Erwan DUFOUR, Withings

The configuration is exported with /export, not with print or screenshot, that omit other settings, and it's also annoying to read.
You must provide full COMPACT export with redacted serial numbers, passwords and public IPs.

Instructions for creating and posting export here:

Personally I would have both devices on the SAME RouterOS version, and since 7.20.4 has been reported as having quite a few issues, I would have both at either 7.19.4 or 7.19.6.
Not necessarily connected to the issue you are having, mind you, but this would exclude possible version-related issues (very difficult to pinpoint).

Hello,
Thank you for your comment,
I cannot upload files, but here is the content of the exports:
/export compact file=peerX_config

Peer1: (SA)

# 2025-11-25 13:02:28 by RouterOS 7.19.1
# software id = H1YX-Y5B8
#
# model = CRS312-4C+8XG
# serial number = redacted
/interface bridge
add name=bridge1 protocol-mode=none vlan-filtering=yes
/interface bonding
add lacp-rate=1sec mlag-id=10 mode=802.3ad name=client-bond slaves=ether3
add lacp-rate=1sec mlag-id=20 mode=802.3ad name=client-bond2 slaves=ether4
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/port
set 0 name=serial0
/interface bridge mlag
set bridge=bridge1 peer-port=ether1 priority=50
/interface bridge port
add bridge=bridge1 interface=ether1
add bridge=bridge1 hw=no interface=client-bond pvid=42
add bridge=bridge1 hw=no interface=client-bond2 pvid=42
/interface bridge vlan
add bridge=bridge1 tagged=ether1 untagged=client-bond,client-bond2 vlan-ids=\
    42
/ip dhcp-client
add interface=ether9
/system identity
set name=peer1
/system logging
add topics=bridge,debug
add topics=interface,debug
/system ntp client
set enabled=yes
/system ntp client servers
add address=172.16.6.14
add address=172.16.6.24

Peer2: (SB)

[edufour@ErwanDUFOUR ~]$ cat /home/edufour/Téléchargements/peer2_config.rsc 
# 2025-11-25 12:57:06 by RouterOS 7.20.4
# software id = YXG3-XJVU
#
# model = CRS326-24S+2Q+
# serial number = redacted
/interface bridge
add name=bridge1 protocol-mode=none vlan-filtering=yes
/interface bonding
add lacp-rate=1sec mlag-id=10 mode=802.3ad name=client-bond slaves=\
    sfp-sfpplus2
add lacp-rate=1sec mlag-id=20 mode=802.3ad name=client-bond2 slaves=\
    sfp-sfpplus10
/interface ethernet switch port
set 0 l3-hw-offloading=no
set 13 l3-hw-offloading=no
/port
set 0 name=serial0
/interface bridge mlag
set bridge=bridge1 peer-port=sfp-sfpplus1
/interface bridge port
add bridge=bridge1 interface=sfp-sfpplus1
add bridge=bridge1 hw=no ingress-filtering=no interface=client-bond pvid=42
add bridge=bridge1 hw=no ingress-filtering=no interface=client-bond2 pvid=42
/interface bridge vlan
add bridge=bridge1 tagged=sfp-sfpplus1 untagged=client-bond,client-bond2 \
    vlan-ids=42
/ip dhcp-client
add interface=ether1
/system identity
set name=peer2
/system logging
add topics=bridge,debug
add topics=interface,debug
/system ntp client
set enabled=yes
/system ntp client servers
add address=172.16.6.14
add address=172.16.6.24
/system routerboard settings
set enter-setup-on=delete-key

Thanks everyone for your comments. I think the problem comes either from the fact that it involves two different chassis or from different RouterOS versions.

I repeated my tests with same config on two CRS326-24S+2Q+ chassis, both running RouterOS 7.15.1 (stable), and I had no issues.
MLAG with MikroTik seems to be very sensitive to firmware versions and the chassis used, which is important information to consider in a production environment.

I hope this can help anyone who runs into the same problem I experienced.

I’m coming back with some additional information.
My MLAG between my two CRS326-24S+2Q+ chassis works perfectly in both 7.15.1 and 7.20.4. It was also functional when one switch was running 7.15.1 and the other 7.20.4.

In the MikroTik documentation, there is no mention that MLAG should not work between two different MikroTik chassis. If, based on the information in this thread, you know why MLAG was not working between the CRS326-24S+2Q+ and the CRS312-4C+8XG chassis... I’m not against your hypothesis or ideas ?

Hmmm...

On one device you use ETH ports and on the other there are SFP+ in use. How did you connect them?

I was successful in connecting my CRS312 and CRS354 in an MLAG setup.

In your config, I don’t see where you have changed the native VLAN (PVID) for your MLAG peer link (ether1 on the 312 and sfpplus1 on the 326). That should be untagged into a VLAN other than 1 (the default), and then if you want VLAN 1 passed through the stack, you tag it to the MLAG peer port on each switch.

I have always done that and have multiple MLAG stacks in production.

Hello,

Thank you very much for your comment,
Is it possible to share your lab’s config ?

That should be untagged into a VLAN other than 1 (the default), and then if you want VLAN 1 passed through the stack, you tag it to the MLAG peer port on each switch.

All my traffic was going through VLAN 42, the ports connected to the servers have PVID 42, and my peer link has VLAN 42 tagged. I don’t use VLAN 1 at all, especially since you can tell your bridge to only pass filtered VLANs for the peer-link, as they explain in the tutorial.

add bridge=bridge1 interface=sfp-sfpplus1 frame-types=admit-only-vlan-tagged

Hello,

I used a SFP Transceiver to RJ45 like this one that you can find on amazon:
https://www.amazon.fr/Transceiver-10GBase-T-30-mètres-Ubiquiti-Compatible/dp/B01M8O3MAL?th=1

That’s the problem. The traffic between the MLAG peers needs to be untagged, and on a VLAN different than any native or tagged VLANs the bridge is using elsewhere. By only allowing tagged traffic on the peer link, you’re breaking the MLAG.

  • MLAG peers need to be able to talk to each other over a unique PVID (i.e. untagged on the peer link)
  • Tag all other VLANs, including 1, that you want to pass between the peers, on the peering port

I have posted several examples in other MLAG threads on the forum. A quick search should be able to pull them up. But if you do those two things I mentioned, you should be up and running.

Hello,
Thank’s for your answer,

I don’t think that’s where the problem is, especially if you look at the configuration in my exports. I didn’t set frame-types=admit-only-vlan-tagged for the peer links.

In addition, my peer links use the default PVID, which is 1. So both of them can communicate without tagged VLANs on PVID 1.

I don’t use VLAN 1 at all, especially since you can tell your bridge to only pass filtered VLANs for the peer-link, as they explain in the tutorial.

I just wanted to say that in the tutorial they mention that we can add this option to indicate that we don’t need the PVID.

OK, I just saw this:

I disabled hardware offload on the LACP ports on the switches in order to be able to see the ARP requests. (ether3, ether4, sfp-sfpplus2,sfp-sfpplus10).

MLAG breaks (undefined/unexpected behavior) if hardware offloading is disabled on participating ports.

Did you turn that back on?

Also, you have STP disabled.

R/M/STP is required:

This system-id is used for STP BPDU bridge identifier and LACP system ID. The MLAG supports STP, RSTP or MSTP protocols. Use the same STP priority and the same STP configuration on dual-connected bridge ports on both nodes. When MLAG bridges are elected as STP root, then both devices will show as root bridges under the bridge monitor.