MLAG not Work in RouterOS 7.1 Stable

Hello, the MLAG protocol is not working correctly in RouterOS v7.1 Stable . We are carrying out tests on the following devices:

2 x CCR1036-8g-2S+
2 x CRS326-24S + 2Q +

It appears that bonding only works on the active port on the switch marked primary
In fact, by simulating port downs, sometimes we lose the link, other times everything works.


I attach the configurations:

CCR1036-8g-2S+ (1):

/interface bridge
add name=Lo0
/interface bonding
add mode=802.3ad name=Po1 slaves=sfp-sfpplus3,sfp-sfpplus4 transmit-hash-policy=layer-2-and-3
/interface vlan
add interface=Po1 name=vlan3000 vlan-id=3000
/routing table
add fib name=""
/ip address
add address=100.126.0.1/29 interface=vlan3000 network=100.126.0.0
add address=100.127.0.1 interface=Lo0 network=100.127.0.1
/ipv6 address
add address=200:100:126::1 interface=vlan3000
add address=200:100:127::1/128 advertise=no interface=Lo0
/system identity
set name=RTR-01

CCR1036-8g-2S+ (2):

/interface bridge
add name=Lo0
/interface bonding
add mode=802.3ad name=Po1 slaves=sfp-sfpplus3,sfp-sfpplus4 transmit-hash-policy=layer-2-and-3
/interface vlan
add interface=Po1 name=vlan3000 vlan-id=3000
/routing table
add fib name=""
/ip address
add address=100.126.0.2/29 interface=vlan3000 network=100.126.0.0
add address=100.127.0.2 interface=Lo0 network=100.127.0.2
/ipv6 address
add address=200:100:126::2 interface=vlan3000
add address=200:100:127::2/128 advertise=no interface=Lo0
/system identity
set name=RTR-02

CRS326-24S + 2Q + (1)

/interface bridge
add name=Bridge-MLAG vlan-filtering=yes
/interface bonding
add mlag-id=100 mode=802.3ad name=Po1 slaves=sfp-sfpplus1
add mlag-id=101 mode=802.3ad name=Po2 slaves=sfp-sfpplus2
/interface bridge mlag
set bridge=Bridge-MLAG peer-port=qsfpplus1-1
/interface bridge port
add bridge=Bridge-MLAG interface=Po1
add bridge=Bridge-MLAG interface=qsfpplus1-1 pvid=777
add bridge=Bridge-MLAG interface=Po2
/interface bridge vlan
add bridge=Bridge-MLAG tagged=Po1,Po2 vlan-ids=3000
/system identity
set name=CSW-01

CRS326-24S + 2Q + (2)

/interface bridge
add name=Bridge-MLAG vlan-filtering=yes
/interface bonding
add mlag-id=100 mode=802.3ad name=Po1 slaves=sfp-sfpplus1
add mlag-id=101 mode=802.3ad name=Po2 slaves=sfp-sfpplus2
/interface bridge mlag
set bridge=Bridge-MLAG peer-port=qsfpplus1-1
/interface bridge port
add bridge=Bridge-MLAG interface=Po1
add bridge=Bridge-MLAG interface=qsfpplus1-1 pvid=777
add bridge=Bridge-MLAG interface=Po2
/interface bridge vlan
add bridge=Bridge-MLAG tagged=Po1,Po2 vlan-ids=3000
/system identity
set name=CSW-02

Situation where all ports are active and switch are up

The peer bonding peer on switch 1 reports:

status: connected
system-id: 2C:C8:1B:65:97:BA
active-role: secondary

The peer bonding peer on switch 2 reports:

status: connected
system-id: 2C:C8:1B:65:97:BA
active-role: primary

Router 1 ping Router 2 and viceversa with sfp-sfpplus1 and sfp-sfpplus2 UP

Situation where the sfp-sfpplus1 port is disconnected. In bonding status, sfp-sfpplus2 goes into active mode but the routers do not communicate with each other

If in this situation I restart the switch marked as primary role and reconnect the sfp-sfpplus1 port, the ping starts to work

This is the network diagram used for the configuration

I have the exact same hardware setup, and experienced the same issue when testing 7.1beta6. I ended up giving up on MLAG and putting it into production on ROS6, with each CCR single-homed to only one CRS. Was hoping it was fixed in the recent stable release – bummer.

No it has not been fixed .. From what I am trying the bonding works if both active ports on the routers are on the same switch. If I restart the switch the state of the ports changes on both routers and everything works correctly but if I force the shutdown eg. of the sfp-sfpplus1 port of router 1 that goes to switch 1 (same configuration as router 2) the two routers are no longer visible because in router 2 sfp-sfpplus1 is active which goes to switch 1 while on router 1 sfp is active -sfpplus2 that goes to switch2 .. It does not seem correct as behavior, the traffic should pass from the sfp-sfpplus2 of both devices since switch2 is active or am I wrong ???

I have MLAG in stable production for several months now.

You need the ISL (MLAG control) vlan to be UNTAGGED on the ports (or bond) between the switches

Hi, the ports between the two switches are both in untagged dedicated vlan, see photo (applies to both switches)
Have you tried to interrupt the active port on one of your devices? From the tests I did it only works if the entire switch is broken and not the single port

Yes I did. Prior to rc6 it caused issues, but post rc6 for me it works.

What version are you running now? Because on v7.1 stable the problem still exists ..

Wasn’t MLAG dedicated to a few Mkt devices (CRS3XX, if I’m not mistaken) ?

Yes but in fact we are running MLAG on 2 CRS326-24S + 2Q +

but it is completely broken

I also sent the full analysis of the problem to support@mikrotik.com

Let’s see if they respond .. Meanwhile, I’m trying the v7rc7

I confirm that the problem is also in the rc7. The multilag works without any logic ..

After mikrtoik support reply i can tell that MLAG on v7.1 is working

The problem was the missing TAG on the PEER port exchange VLAN (in my case the qfsplus1)

On VLAN 3000 the peer port must also be tagged, unfortunately it was missing on the config so when the exchange had to take place between two different ports of the switch the packets were unable to pass between the peer interface

Can you post your updated configuration (possibly with comments) and mark it as a reply? I’m building exactly same solution to a production environment and would appreciate that.

Thanks!