MLAG configuration for v7.22.1

Reading the official documentation, I was a bit confused

/interface bonding
add mode=802.3ad name=bond1 slaves=sfp-sfpplus1,sfp-sfpplus2 lacp-rate=1sec
/interface bonding
add mlag-id=10 mode=802.3ad name=client-bond slaves=sfp-sfpplus2 lacp-rate=1sec
failure: sfp-sfpplus2 already in bond1

So, is this a bug in the firmware or a misprint in KB?

I have a test lab with the aim of connecting 2 Cisco Catalyst switches with LACP port-channel to pair of
CRS328-24P-4S

Has anyone had a similar setup in place?

Also, correct me if I’m wrong, but there is no chance of having MLAG and VRRP at the same time, right?

The first command shows how to create a regular LACP bond on a Client device (e.g. if the Client is RouterOS), that command does not apply to MLAG pair. But I see how this can be confusing, so I moved it below the MLAG configuration steps.

Also, correct me if I’m wrong, but there is no chance of having MLAG and VRRP at the same time, right?

I would recommend configuring the VRRP on different devices, not the MLAG pair. Support for VRRP+MLAG is in to-do list.

Thank you for the quick response! That makes sense.

The issue I have at the moment is the following:
Two cisco switches connected to the pair of Mikrotiks (as I mentioned)
All seems up and running.

The first issue:
I can't make MikroTik the root STP; I’m not sure that's possible.
Should the STP priority be the same on both devices?
Should both have the same priority or be different?
I had played with different scenarios, but that didn’t help much :(.
The second issue:
As I can’t make MKT STP root - one of Cisco switches become the STP root.
So, if I restart secondary MKT - all works as expected, but if I restart primary, then LACP fails, and each of the switches becomes a root. So I’m not sure if this is a configuration issue or a bug:

Before MKT primary reboot:

Summary

VLAN0888
Spanning tree enabled protocol rstp
Root ID Priority 33656
Address 04bd.97db.f880
Cost 10000
Port 4142 (port-channel47)
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec

Bridge ID Priority 33656 (priority 32768 sys-id-ext 888)
Address 4006.d5b0.03ff
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec

Interface Role Sts Cost Prio.Nbr Type


After MKT primary reboot:

Summary

Po47 Root FWD 10000 128.4142 P2p

SW1# sh spanning-tree
Interface Role Sts Cost Prio.Nbr Type


Po47 Root FWD 10000 128.4142 P2p

VLAN0888
Spanning tree enabled protocol rstp
Root ID Priority 33656
Address 4006.d5b0.03ff
This bridge is the root
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec

Bridge ID Priority 33656 (priority 32768 sys-id-ext 888)
Address 4006.d5b0.03ff
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec

Interface Role Sts Cost Prio.Nbr Type

MKT1 config:
/interface bonding
add lacp-rate=1sec mlag-id=20 mode=802.3ad name=MLAG20 slaves=ether20 transmit-hash-policy=layer-2-and-3
add lacp-rate=1sec mlag-id=24 mode=802.3ad name=MLAG24 slaves=ether24 transmit-hash-policy=layer-2-and-3
/interface bridge
add mlag-peer-port=ether15 mlag-priority=50 name=bridge-root vlan-filtering=yes

/interface bridge port
add bridge=bridge-root interface=ether15 pvid=1001
add bridge=bridge-root frame-types=admit-only-vlan-tagged interface=MLAG20
add bridge=bridge-root frame-types=admit-only-vlan-tagged interface=MLAG24
/interface bridge vlan
add bridge=bridge-root tagged=bridge-root,ether15 vlan-ids=1
add bridge=bridge-root tagged=bridge-root,ether15,MLAG20,MLAG24 vlan-ids=888
add bridge=bridge-root comment=PEER-PORT tagged=bridge-root,ether15 vlan-ids=1001

MKT2 config:
/interface bonding
add lacp-rate=1sec mlag-id=20 mode=802.3ad name=MLAG20 slaves=ether20 transmit-hash-policy=layer-2-and-3
add lacp-rate=1sec mlag-id=24 mode=802.3ad name=MLAG24 slaves=ether24 transmit-hash-policy=layer-2-and-3
/interface bridge
add mlag-peer-port=ether15 name=bridge-root vlan-filtering=yes

/interface bridge port
add bridge=bridge-root interface=ether15 pvid=1001
add bridge=bridge-root frame-types=admit-only-vlan-tagged interface=MLAG20
add bridge=bridge-root frame-types=admit-only-vlan-tagged interface=MLAG24

/interface bridge vlan
add bridge=bridge-root tagged=bridge-root,ether15 vlan-ids=1
add bridge=bridge-root tagged=bridge-root,ether15,MLAG20,MLAG24 vlan-ids=888
add bridge=bridge-root comment=PEER-PORT tagged=bridge-root,ether15 vlan-ids=1001

I’d really appreciate it if you could help me understand what I’m doing wrong.

Yes, both MLAG nodes should use the same STP priority. Try setting both with priority=0x0000 and see what the /interface/bridge/monitor [find] looks like.

I was able to make Mikrotiks as the root bridge, Thks!
However, it seems that I still have an issue, but with something else.

In my setup, I do not have a client, but L2 Cisco Catalyst connected with the standard LACP port-channel configuration. Just to be sure, I have re-done the setup one more time from scratch, and what I noticed is that I have not receive or send bpdu on the secondary mikrotik:
So, If I understand the idea correctly, it’s safer to set edge=no and point-to-point=yes for both links (between mikrotiks and uplink bond).
So, as you may see leraning and forwarding state is different on both Mikrotiks for the uplink towards switch. Also there is no BPDU packets on the secondary Mikrotik.
As a result, if I reboot the primary mikrotik the Catalyst switch becomes the root itselfs for the STP (as secondary Mikrotiks don't send any BPDU packets).
Any ideas what to adjust in my setup ?
Thks!

Regular LACP bond can also send/receive BPDUs over a single physical interface, they are subjected to transmit hash.

When it comes to MLAG and dual-connected bonds, the BPDU Tx is done only by the primary node (when both are UP). If MLAG pair is root, you should see only on the primary node the BPDU Tx counter increase. I believe this is acceptable behavior, as it mimics the regular LACP.

But here comes a missing RouterOS MLAG implementation. BPDU Rx and port state changes happens independently on both nodes. In other words, the MLAG does not synchronize the STP port states on the dual-connected bonds. If you configure both sides with edge=no, you risk one side to be always forwarding: no, because BPDU Rx is happening on the opposite node. We are planning to fix this in future RouterOS versions.

I did not reproduce this behavior in our lab. Once the primary goes down, the secondary should start sending own BPDUs.

My recommendation would be to use the default edge=auto to avoid risk of always forwarding: no port state, or perhaps try to make the MLAG pair the non-root.

Thank you, Edgars, for your quick response.

Since our current setup is running only in the test lab, we have the flexibility to upgrade the MikroTik devices to any test firmware at any time. I will continue monitoring the MikroTik development branch and update our lab environment as new features or fixes are released.

Additionally, it would be very helpful to have a wiki page with an example of a tested MLAG configuration involving at least one well-known switch vendor, such as Cisco, HP, or others. While there are some vendor-specific differences (e.g., port-channel hashing, VLAN filtering on the trunk port, PVST/RPVST behavior, etc.), such a reference would still be highly valuable for many users.
Thank you once again.

Hello,
We’re not sure if this helps improve LACP/MLAG overall, but we found the real issue. When the primary MLAG router reboots, the LACP ID on the MLAG interface changes. Because of that, Cisco Catalyst switches drop the port-channel (confirmed with Wireshark).

So the problem wasn’t only STP, as we first thought. The port-channel was also going down due to the LACP ID change after a reboot. Setting a static LACP ID (same on both MKTs) fixed it.

This might be specific to Cisco Catalyst. Cisco Nexus doesn’t seem to have the same issue. We didn’t test other vendors.

1 Like

Thank you for the update!

Yes, in RouterOS, the backup MLAG node does fallback to own system ID and expects to renegotiate through the LACP. Was not aware that this fails/not-allowed in some systems.

Thanks for the hint regarding manually setting the lacp-system-id on the backup node, will try to put this in docs.

Also, there is room for improvement in RouterOS, like a MLAG setting to allow/restrict system ID fallback, or some kind of "hold-time".