I have 2 brand new, unconfigured CRS326-24S+2Q+RM switches running v7.9.1 stable and I have been trying to get MLAG to work. I followed the guide here:
but the resulting configuration was very unstable. Running a constant ping between 2 hosts, I tried disconnecting and reconnecting random ports and it would work sometimes and sometimes not.
Eventually, I was able to figure out a configuration that appears to be working 100%. The key was to create a UNIQUE MLAG ID for each client bond, instead of using the same one for all bonds. Using this configuration, I am able to plug and unplug any hosts to/from any ports and have full redundancy -- but I have 2 questions:
Is this configuration correct?
In the current configuration, I am using port qsfpplus1-1 as the uplink/peer port (40Gbps). What I would really like to do is create a LACP bond using qsfpplus1-1 and qsfpplus2-1 and use that bond as the peer uplink. However, if I do this, the MLAG does not work any more.
My working config is below. Can someone confirm that this is correct, and can anyone explain why it won't work if I LACP my peer ports?
For reference, the hosts are Proxmox (Linux) with 2x ports in LACP.
I got it to work (at least it seems to be working!) You have to change the MLAG priority on one of the peers so they are not the same. That will make one a primary.
Set both switches to a blank config. I connect to the first copper Ethernet port on both using Winbox to maintain a connection while setting the switches up.
If you want more than 1/10/40Gbps between the switches, create a bond interface using whichever two, three, four, etc. physical interfaces you wish to be bonded. Make them identical on both sides: 802.3ad, LACP timing 1s (30s would be fine but I use 1s), and Layer 3 & 4 for hashing (better load balancing that way).
Create a bridge, then enable MLAG on both switches, using your chosen peer port: typically the fastest physical port on the switch, or an LACP bond (created in Step 2).
Add the peer port to the bridge, ensuring that the peer port’s PVID is NOT VLAN 1 (anything but 1 is fine). Make sure the peer port is set to allow all frame types (tagged and untagged).
Tag VLAN 1 to the peer port on both sides to ensure untagged traffic flows properly. Do NOT tag the VLAN you selected as the peer port PVID to anything else, ever.
Make sure STP settings on both bridges is identical.
Add your other ports to the bridge as you need them, being sure to tag any VLANs to the necessary physical ports and ALWAYS to the peer port.
If you are adding LAG’s to the MLAG stack (which is kind of the point), as the OP found out, the MLAG-ID is UNIQUE per LAG client. You can have as many links in the LAG as you want, as long as they all have the same ID, but different clients MUST have different ID’s.
I personally found issues with more than two links (one per switch) on versions between 7.15.3 and 7.19.x, so use 7.15.3 or 7.19.3 for maximum stability. I have some devices with four links (two per switch) in some of my MLAG stacks and it works fine on those versions (7.15 or 7.19).
You don’t have to have two links per downstream device. One will work, but you won’t have any redundancy to that device if the particular switch it’s connected to goes down. Yet, for 99% of what most of us do in home labs, that’s fine.
These are switches. Don’t try to do any fancy routing or additional CPU-bound work (unless you’re doing MLAG with a pair of 2116/2216 devices, but even then, it can make things complicated; and L3HW offload is not supported with MLAG anyway). It is likely to cause confusion and violates the concept of separation of functions. Use routers for routing and switches for switching at this point, even in a home lab.
This is an example from an MLAG pair of switches with both QSFP+ and SFP+ ports
For the most part, the config would be identical on both switches in the stack except for devices that have only one link into the stack
# The bond config could/should be mostly identical
# MLAG-ID's are the way the two switches determine which connections belong to which devices
# I use 1sec LACP rate and L3-4 for the hash; defaults are 30sec and Layer 2
/interface bonding
add lacp-rate=1sec mlag-id=101 mode=802.3ad name=bond-1-router slaves=sfp-sfpplus1-router transmit-hash-policy=layer-3-and-4
add lacp-rate=1sec mlag-id=101 mode=802.3ad name=bond-2-server slaves=sfp-sfpplus2-server transmit-hash-policy=layer-3-and-4
Step 7 from above:
Add switch ports to the bridge/MLAG
/interface bridge port
# PVID has to be the MLAG peer VLAN
# Can be any valid VLAN ID; I don't use 2-10 anywhere on my networks
add bridge=bridge interface=qsfpplus2-mlag-peer pvid=2
# Default PVID is 1 if you don't specify it
add bridge=bridge interface=bond-1-router
add bridge=bridge interface=bond-2-server
# These two are only on Switch 1:
add bridge=bridge interface=sfp-sfpplus3-lone-device
# For this machine we don't want to allow the native VLAN
add bridge=bridge interface=sfp-sfpplus8-macpro-esxi frame-types=admit-only-vlan-tagged
Tag the VLANs to their respective ports
/interface bridge vlan
# VLAN 1: untagged on whatever ports you want to be part of the "native" VLAN,
# and tagged across the MLAG peer link
add bridge=bridge vlan-ids=1 tagged=qsfpplus2-mlag-peer untagged=bridge,bond-1-router,bond-2-server,sfp-sfpplus3-lone-device
# VLAN 2: MLAG peer VLAN, untagged on the peer link and not tagged to anything else
add bridge=bridge vlan-ids=2 untagged=qsfpplus2-mlag-peer
# Example VLANs: 981-985 need to go between a Mac Pro running ESXi and a Linux server with a LAG into the stack
# On Switch 1: Mac Pro ESXi has only port, which is connected to this switch;
# tag its port, the peer port, and the other server's port
add bridge=bridge vlan-ids=981-985 tagged=qsfpplus2-mlag-peer,bond-2-server,sfp-sfpplus8-macpro-esxi
# On Switch 2: Only the server has a link to both switches, so just tag the peer and the server
add bridge=bridge vlan-ids=981-985 tagged=qsfpplus2-mlag-peer,bond-2-server
I think I am following your example. Here is the config from one of the switches. Does this look correct. Am I missing anything? It seems to work. I have 5 servers connected to 2 CRS520-4XS-16Q-RM 100 Gb switches. I don’t have any vlans other than the 99 PVID on the ICCP link.
I think the CRS devices with Marvell Prestera switch chips all have a L2MTU limit of 10218 (you can check with /interface print and looking at the MAX-L2MTU column) and according to MikroTik’s video on L2MTU there’s probably no hardware resource usage difference between setting l2mtu=9000 and l2mtu=10218 (the same amount of buffer memory will be used).
However, if you limit the L2MTU to 9000 on the qsfp28 ports like in your current config, and then have VLANs on them, then those VLANs won’t be able to reach an MTU of 9000. It might be better if you set l2mtu=10218 for the ports under /interface ethernet.
The point is to set them as high as the hardware will allow so that 9000-byte packets will traverse the MLAG stack. If you put in a value higher than the hardware will allow, it will auto-adjust it to the maximum for you. I usually put in 12000, which pretty much covers anything MikroTik makes.