I have a question about a planned MLAG setup witch CRS518.
I would like to connect some PVE clusters that are located in two different locations.
Each node is connected to both switches on its side.
There should be two edge-disjoint dark fibre 100G connections from site to site.
Is my drawing correct? Can this be implemented with the MikroTik’s using MLAG? The examples in the documentation are not entirely applicable for this case.
If you have just 2x Server, its not worth the complexity of MLAG config on the switches.
Simpler Solution:
You can make it to work without the “ICCP” links between the switches. On the server side you configure “static LAG bond0 interface (algo: balance-rr - linux specific mode)” - Don’t need any LAG/LACP configuration on the switch side.
The short answer to your question is that it should work and your diagram looks good.
I did a similar configuration in the core of my network, with two CRS317’s in one MLAG setup feeding six routers, and another MLAG pairing with a 312 and 354 feeding my home/office router (a 2116). Then I connected the two MLAG pairs to each other using LACP and it works great.
One thing is missing. You haven’t marked the blue links as a LAG with an MLAG ID (it can be the same for both sides if you like).
The blue links themselves are a LAG between the MLAG pairs. To the switches, they will need to be configured just like the links going to the PVE stacks.
In MLAG setup, the interconnect between collaborating switches is not active/backup kind of link, it's ICCP link (see MLAG manual) which is always active and passes traffic between switches which wouldn't otherwise reach destination. For example: if server has a LACP connection to two MLAG-configured switches (server is not aware of MLAG, it uses LACP link as if all member links were connected to single switch) while client is simply connected to one of switches, then server might send out frames for client via "wrong" LACP link ... then ICCP link between switches is used to pass frame towards end destination. Which means that (depending on physical layout of all connected devices) a considerable portion of switched traffic may pass ICCP link and this, in turn, means that ICCP link has to be over a relatively very fast link.
Hey, i done similar thing,
I have two server rooms, each with a pair of servers. Each server has 4 x 25 Gbps ports, and these are connected through MLAG (Multi-chassis Link Aggregation Group). When I perform an iperf3 test between the server rooms, I notice that MLAG is working, as traffic is sent from two switches and received on one bonding interface. The traffic distribution works, but it’s not as efficient as expected.
Traffic TX (transmitting) is only sent through one bonding interface.
Traffic RX (receiving) is distributed across two bonding interfaces.
This same behavior occurs when switching the test direction – the traffic sent through one bonding interface on one side is received across two bonding interfaces on the other side.
Even though I am using LACP with Layer 3 + Layer 4 hashing, the distribution of traffic is still uneven. The TX traffic always favors a single bond, while RX is properly spread across two bonds.
But when i perform test in same server room between two servers, lets say one server have only 2 x SFP28 links it works well, i can achive 50Gb.
As far as Rx goes, it’s up to transmitter (i.e. the pair of switches in MLAG config) to decide which particular LACP link to use according to it’s local configuration (I certainly hope that MLAG switch will use the “local” link instead of passing traffic via ICCP to the other MLAG switch). Note that LACP bond partners can work with different Tx hash settings and algorithms … which is absolutely fine, Tx hash algorithm is simply about spreading Tx traffic between available physical links and has nothing to do with Rx.
As far as Tx goes, it’s up to Tx hash which select physical LACP link used to transmit each ethernet frame. But hashing doesn’t guarantee even distribution over all available links. E.g. if using 2 links and L3+L4 hashing: hash function takes dst-address, src-address, dst-port and src-port, does hash over this “quadruplet”. And as long as the resulting hash is e.g. odd, all frames will use link #0 (and likewise link #1 when hash is even). Having any of those 4 values change by single digit doesn’t guarantee that has will flip between {odd,even}. However, it’s about statistics: when there are many connections going on simultaneously, then traffic will likely be distributed more evenly. OTOH, all frames belonging to same L4 connection will always pass same physical link.
Also beware that when using L2 Tx hashing, then for traffic going via gateway, it’ll be always router’s MAC address considered … which means that traffic from LACP host (e.g. server) towards internet will use single physical link. Meaning that then configuring server, which will communicate with clients via a gateway, it’s very sensible to use L3 hash or (even better) L3+L4 hash.
i'm not sure, if it is correct. I have exact the same setup as the picture.
But the ICCP Link on Site B is "alternate" on the Bridge and as Root Bridge will be the MLAG ID on Site-A.
BTW: In the MLAG Documentation, they say that Hardware offloading must be disabled.
Do you know if this will happen automatically? Should i disable HW Offloading on the complete bridge or only the MLAG Uplinks or only the ICCP?
i have done:
/interface/ethernet/switch/port set [find] l3-hw-offloading=yes
I dont’t understand what do you mean with “You haven’t marked the blue links as a LAG with an MLAG ID”.
The blue lines are normal interfaces (in this case sfp-sfplus1) without MLAG or LAG - right?
The role of blue links depends on particular setup. If nothing special is done about them, then they are active/backup, handled by xSTP. If they are configured as LACP bonds (between both pair of MLAG-configured switches), then the way they are used depends on how switches operate ... but will likely be used in parallel (reducing need to pass huge amounts of traffic over both ICCP links).
now it's up and running. My mistake was not to use MLAG between the two Sites. I thought normal links are ok, but then one of the lines will be selected as alternate.
when i use mlag between the sites, all links are designated