mlag setup questions

Hello dear Mikrotik users,

I am new in the Mikrotik world and I am also not very experienced with networking in general. So please bear with me.

I am trying to setup mlag with two CRS326-24S+2Q+RM switches. My first action was to upgraded them to routeros and routerboard version 7.10.

I am using one 40gbe port as uplink on each switch and the other 40gbe port as iccp/peer-port. I am not sure about the name and model of the parent device but I hope I can neglect that here. I am connected to the switches with serial cables so I am independent of most of my potential misconfigurations. Furthermore I have a linux server (alma9), a truenas box (core 13) and a infotrend iscsi system connected to the switches as clients.

I went through this https://help.mikrotik.com/docs/display/ROS/Multi-chassis+Link+Aggregation+Group and was struggling to get the config right for some time. At least I think it is right. And then I started simulating outages by rebooting the mikrotik switches one at a time.

Now, I am not sure what I am able to expect from my setup and if my tests are valid. I was running two permant pings with the source being outside the mlag setup and the destinations beeing inside. So the pakets were coming in on the uplink ports and going out on the respective downlinks. This is only a logic assumption. I did not use any sniffer to proof this. When I rebooted one switch I saw several pings not being replied to (paket loss). Right after issuing “/system reboot” there were a few pings lost then later on there were about 20 to 30 missing- Some times sequentially and some times there where ping replies coming again. The number of lost pakets looks a bit much to me and also the behaviour, but as I said, I am not much experienced. Is that expected behaviour?

Also on the almalinux server I saw the corresponding bond interface going down, going up, going down and finally coming up again. I would have expected it to go down and come up only once, shouldn’t it? I see similar behaviour on the truenas box. The log file there speaks of a possible flapping interface and brings down the whole lagg interface for a short time.

I searched through the forums and found several posts where people reported of problems with mlag. They say that it was broken since version 7.7 onwards. Somewhere someone said that it was fixed in 7.10, but I think that wasn’t an offical statement.

Now, I have upgraded to 7.11 yesterday and I have also set the lacp-rate to 1sec on the bonding interfaces on both switches, but I still see that behaviour. Again, is that expected or is mlag still broken in 7.11? How can I find out more? Is there an offical information channel besides the forums? I have only found the release notes yet, but I cannot tell from them.

Besides questions

  1. When I want to monitor lacp status I only see the the interface of the local switch. Is there a way to see both in one command?

  2. The howto from the documentation states that hw offloading should be disabled. I have not seen examples in other peoples config snippets around here in the forum.

  3. The howto also states that rstp or stp should be used and not mstp. I wonder if I need to configure something here, too. Again, I haven’t seen others dealing with that in their config snippets.

Here is my config which I have done on both switches:

/system reset-configuration no-defaults=yes skip-backup=yes

/interface bonding
add mlag-id=1 mode=802.3ad name=uplink slaves=qsfpplus2-1
add mlag-id=10 mode=802.3ad name=bond_eon04-A slaves=sfp-sfpplus1
add mlag-id=11 mode=802.3ad name=bond_eon04-B slaves=sfp-sfpplus2
add mlag-id=12 mode=802.3ad name=bond_sm0104 slaves=sfp-sfpplus3
add mlag-id=13 mode=802.3ad name=bond_truenas01 slaves=sfp-sfpplus4

/interface bridge
add name=br1 vlan-filtering=yes
/interface bridge port
add bridge=br1 interface=uplink
add bridge=br1 interface=qsfpplus1-1 pvid=4000
add bridge=br1 pvid=20 interface=bond_eon04-A
add bridge=br1 pvid=20 interface=bond_eon04-B
add bridge=br1 pvid=20 interface=bond_sm0104
add bridge=br1 interface=bond_truenas01

/interface bridge mlag
set bridge=br1 peer-port=qsfpplus1-1

/interface bridge vlan
add bridge=br1 tagged=br1,qsfpplus1-1,uplink,bond_truenas01 untagged=bond_eon04-A,bond_eon04-B,bond_sm0104 vlan-ids=20
add bridge=br1 tagged=br1,qsfpplus1-1,uplink,bond_truenas01 untagged="" vlan-ids=201

/interface/bonding
set lacp-rate=1sec bond_eon04-A
set lacp-rate=1sec bond_eon04-B
set lacp-rate=1sec bond_sm0104
set lacp-rate=1sec bond_truenas01
set lacp-rate=1sec uplink

Thanks in advance and greetings
Timo

Mlag is non functional. Don’t waste your time. They have no intention of fixing it.

Can you please expound upon that statement? I have set up MLAG on a couple of CRS326-24s+2q+rm starting with RouterOS 7.3 and upgraded a couple of times since then. Thank you for your input!

http://forum.mikrotik.com/t/mlag-issue-mlag-functionality-flaps-lacp-system-id-of-secondary-when-primary-reboots/157426/1

Broken worthless crap.