Planned MLAG Setup correct?

Hi there,

I have a question about a planned MLAG setup witch CRS518.

I would like to connect some PVE clusters that are located in two different locations.
Each node is connected to both switches on its side.
There should be two edge-disjoint dark fibre 100G connections from site to site.

Is my drawing correct? Can this be implemented with the MikroTik’s using MLAG? The examples in the documentation are not entirely applicable for this case.
HA.drawio(1).png

If you have just 2x Server, its not worth the complexity of MLAG config on the switches.

Simpler Solution:
You can make it to work without the “ICCP” links between the switches. On the server side you configure “static LAG bond0 interface (algo: balance-rr - linux specific mode)” - Don’t need any LAG/LACP configuration on the switch side.

I wouldn’t recommend static bondings. LACP is just a few lines more and provides active link monitoring.

The short answer to your question is that it should work and your diagram looks good.

I did a similar configuration in the core of my network, with two CRS317’s in one MLAG setup feeding six routers, and another MLAG pairing with a 312 and 354 feeding my home/office router (a 2116). Then I connected the two MLAG pairs to each other using LACP and it works great.

One thing is missing. You haven’t marked the blue links as a LAG with an MLAG ID (it can be the same for both sides if you like).

The blue links themselves are a LAG between the MLAG pairs. To the switches, they will need to be configured just like the links going to the PVE stacks.

Keep in mind that mlag is hopelessly broken on mikrotik.

I plan a similar Setup. One question about your picture: the blue and red connections are result in a loop, so one link is “alternate” right?

In MLAG setup, the interconnect between collaborating switches is not active/backup kind of link, it's ICCP link (see MLAG manual) which is always active and passes traffic between switches which wouldn't otherwise reach destination. For example: if server has a LACP connection to two MLAG-configured switches (server is not aware of MLAG, it uses LACP link as if all member links were connected to single switch) while client is simply connected to one of switches, then server might send out frames for client via "wrong" LACP link ... then ICCP link between switches is used to pass frame towards end destination. Which means that (depending on physical layout of all connected devices) a considerable portion of switched traffic may pass ICCP link and this, in turn, means that ICCP link has to be over a relatively very fast link.

Hey, i done similar thing,
I have two server rooms, each with a pair of servers. Each server has 4 x 25 Gbps ports, and these are connected through MLAG (Multi-chassis Link Aggregation Group). When I perform an iperf3 test between the server rooms, I notice that MLAG is working, as traffic is sent from two switches and received on one bonding interface. The traffic distribution works, but it’s not as efficient as expected.

Traffic TX (transmitting) is only sent through one bonding interface.
Traffic RX (receiving) is distributed across two bonding interfaces.
This same behavior occurs when switching the test direction – the traffic sent through one bonding interface on one side is received across two bonding interfaces on the other side.
Even though I am using LACP with Layer 3 + Layer 4 hashing, the distribution of traffic is still uneven. The TX traffic always favors a single bond, while RX is properly spread across two bonds.

But when i perform test in same server room between two servers, lets say one server have only 2 x SFP28 links it works well, i can achive 50Gb.

As far as Rx goes, it’s up to transmitter (i.e. the pair of switches in MLAG config) to decide which particular LACP link to use according to it’s local configuration (I certainly hope that MLAG switch will use the “local” link instead of passing traffic via ICCP to the other MLAG switch). Note that LACP bond partners can work with different Tx hash settings and algorithms … which is absolutely fine, Tx hash algorithm is simply about spreading Tx traffic between available physical links and has nothing to do with Rx.

As far as Tx goes, it’s up to Tx hash which select physical LACP link used to transmit each ethernet frame. But hashing doesn’t guarantee even distribution over all available links. E.g. if using 2 links and L3+L4 hashing: hash function takes dst-address, src-address, dst-port and src-port, does hash over this “quadruplet”. And as long as the resulting hash is e.g. odd, all frames will use link #0 (and likewise link #1 when hash is even). Having any of those 4 values change by single digit doesn’t guarantee that has will flip between {odd,even}. However, it’s about statistics: when there are many connections going on simultaneously, then traffic will likely be distributed more evenly. OTOH, all frames belonging to same L4 connection will always pass same physical link.

Also beware that when using L2 Tx hashing, then for traffic going via gateway, it’ll be always router’s MAC address considered … which means that traffic from LACP host (e.g. server) towards internet will use single physical link. Meaning that then configuring server, which will communicate with clients via a gateway, it’s very sensible to use L3 hash or (even better) L3+L4 hash.

Thank you!

i'm not sure, if it is correct. I have exact the same setup as the picture.
But the ICCP Link on Site B is "alternate" on the Bridge and as Root Bridge will be the MLAG ID on Site-A.

BTW: In the MLAG Documentation, they say that Hardware offloading must be disabled.
Do you know if this will happen automatically? Should i disable HW Offloading on the complete bridge or only the MLAG Uplinks or only the ICCP?
i have done:

/interface/ethernet/switch/port set [find] l3-hw-offloading=yes

but same alternate Port issue on ICCP link.


Best Regards,
YAN

I dont’t understand what do you mean with “You haven’t marked the blue links as a LAG with an MLAG ID”.
The blue lines are normal interfaces (in this case sfp-sfplus1) without MLAG or LAG - right?

Hi!

my problem is the alternate port - in my opinion, there should be no alternate port, cause of ICCP Link.
Is that right?

Same as in the Picture - i use only QSFP Port instead of SFP.

qsfp1-1 is the ICCP Peer on the same Site.
qsfp2-1 is the Connection to the other Site.

ICCP VLAN 777
Test-VLAN 1700

Config:

#Site 1 - Switch 1
/interface bridge
add frame-types=admit-only-vlan-tagged name=bridge1 pvid=1700 vlan-filtering=yes
/interface bridge mlag
set bridge=bridge1 peer-port=qsfp28-1-1
/interface bridge port
add bridge=bridge1 interface=qsfp28-1-1 pvid=777
add bridge=bridge1 frame-types=admit-only-vlan-tagged interface=qsfp28-2-1 pvid=1700
/interface bridge vlan
add bridge=bridge1 untagged=qsfp28-1-1 vlan-ids=777
add bridge=bridge1 tagged=qsfp28-1-1,qsfp28-2-1 vlan-ids=1700
/system identity
set name=site1-net-csw1

[admin@site1-net-csw1] > /interface/bridge/port monitor numbers=0,1
               interface: qsfp28-1-1               qsfp28-2-1              
                  status: in-bridge                in-bridge               
             port-number: 2001                     1                       
                    role: root-port                alternate-port          
               edge-port: no                       no                      
     edge-port-discovery: yes                      yes                     
     point-to-point-port: yes                      yes                     
            external-fdb: no                       no                      
            sending-rstp: yes                      yes                     
                learning: yes                      no                      
              forwarding: yes                      no                      
        actual-path-cost: 200                      200                     
          root-path-cost: 200                      200                     
       designated-bridge: 0x8000.D4:01:C3:F3:4B:7F 0x8000.D4:01:C3:F3:4B:7F
         designated-cost: 0                        0                       
  designated-port-number: 2                        2002                    
        hw-offload-group: switch1                  switch1     
[admin@site1-net-csw1] > /interface/bridge/mlag/monitor
       status: connected        
    system-id: D4:01:C3:F3:55:C0
  active-role: secondary

#Site 1 - Switch 2
/interface bridge
add frame-types=admit-only-vlan-tagged name=bridge1 pvid=1700 vlan-filtering=yes
/interface bridge mlag
set bridge=bridge1 peer-port=qsfp28-1-1
/interface bridge port
add bridge=bridge1 interface=qsfp28-1-1 pvid=777
add bridge=bridge1 frame-types=admit-only-vlan-tagged interface=qsfp28-2-1 pvid=1700
/interface bridge vlan
add bridge=bridge1 untagged=qsfp28-1-1 vlan-ids=777
add bridge=bridge1 tagged=qsfp28-1-1,qsfp28-2-1 vlan-ids=1700
/system identity
set name=site1-net-csw2

[admin@site1-net-csw2] > /interface/bridge/port monitor numbers=0,1
               interface: qsfp28-1-1      qsfp28-2-1              
                  status: in-bridge       in-bridge               
             port-number: 1               2                       
                    role: designated-port root-port               
               edge-port: no              no                      
     edge-port-discovery: yes             yes                     
     point-to-point-port: yes             yes                     
            external-fdb: no              no                      
            sending-rstp: yes             yes                     
                learning: yes             yes                     
              forwarding: yes             yes                     
        actual-path-cost: 200             200                     
          root-path-cost:                 200                     
       designated-bridge:                 0x8000.D4:01:C3:F3:4B:7F
         designated-cost:                 0                       
  designated-port-number:                 2                       
        hw-offload-group: switch1         switch1 
[admin@site1-net-csw2] > /interface/bridge/mlag/monitor
       status: connected        
    system-id: D4:01:C3:F3:55:C0
  active-role: primary
  
#Site 2 - Switch 3
/interface bridge
add frame-types=admit-only-vlan-tagged name=bridge1 pvid=1700 vlan-filtering=yes
/interface bridge mlag
set bridge=bridge1 peer-port=qsfp28-1-1
/interface bridge port
add bridge=bridge1 interface=qsfp28-1-1 pvid=777
add bridge=bridge1 frame-types=admit-only-vlan-tagged interface=qsfp28-2-1 pvid=1700
/interface bridge vlan
add bridge=bridge1 untagged=qsfp28-1-1 vlan-ids=777
add bridge=bridge1 tagged=qsfp28-1-1,qsfp28-2-1 vlan-ids=1700
/system identity
set name=site2-net-csw3

[admin@site2-net-csw3] > /interface/bridge/port monitor numbers=0,1
            interface: qsfp28-1-1      qsfp28-2-1     
               status: in-bridge       in-bridge      
          port-number: 2001            2002           
                 role: designated-port designated-port
            edge-port: no              no             
  edge-port-discovery: yes             yes            
  point-to-point-port: yes             yes            
         external-fdb: no              no             
         sending-rstp: yes             yes            
             learning: yes             yes            
           forwarding: yes             yes
     actual-path-cost: 200             200
     hw-offload-group: switch1         switch1        
[admin@site2-net-csw3] > /interface/bridge/mlag/monitor 
       status: connected        
    system-id: D4:01:C3:F3:4B:7F
  active-role: secondary

#Site 2 - Switch 4
/interface bridge
add frame-types=admit-only-vlan-tagged name=bridge1 pvid=1700 vlan-filtering=yes
/interface bridge mlag
set bridge=bridge1 peer-port=qsfp28-1-1
/interface bridge port
add bridge=bridge1 interface=qsfp28-1-1 pvid=777
add bridge=bridge1 frame-types=admit-only-vlan-tagged interface=qsfp28-2-1 pvid=1700
/interface bridge vlan
add bridge=bridge1 untagged=qsfp28-1-1 vlan-ids=777
add bridge=bridge1 tagged=qsfp28-1-1,qsfp28-2-1 vlan-ids=1700
/system identity
set name=site2-net-csw4

[admin@site2-net-csw4] > /interface/bridge/port monitor numbers=0,1
            interface: qsfp28-1-1      qsfp28-2-1     
               status: in-bridge       in-bridge      
          port-number: 1               2              
                 role: designated-port designated-port
            edge-port: no              no             
  edge-port-discovery: yes             yes            
  point-to-point-port: yes             yes            
         external-fdb: no              no             
         sending-rstp: yes             yes            
             learning: yes             yes            
           forwarding: yes             yes            
     actual-path-cost: 200             200            
     hw-offload-group: switch1         switch1         
[admin@site2-net-csw4] > /interface/bridge/mlag/monitor 
       status: connected        
    system-id: D4:01:C3:F3:4B:7F
  active-role: primary

Best Regards,
YAN

but one of the blue links is an alternate port?

The role of blue links depends on particular setup. If nothing special is done about them, then they are active/backup, handled by xSTP. If they are configured as LACP bonds (between both pair of MLAG-configured switches), then the way they are used depends on how switches operate ... but will likely be used in parallel (reducing need to pass huge amounts of traffic over both ICCP links).

Hi mkx,

now it's up and running. My mistake was not to use MLAG between the two Sites. I thought normal links are ok, but then one of the lines will be selected as alternate.
when i use mlag between the sites, all links are designated :sunglasses:

fyi: i use ros 7.17beta4