MLAG and STP

Hello,

Got the setup below:

While the LACP (MLAG) to the Fortigates work perfectly, I’ve got an issue when it comes to connecting to the ISP. The problem is that the second I enable the second link, everything goes down.

If I have a single link active to the ISP, everything works as expected though. Does anyone have any idea what’s going on?

P.S.: I know that it would be best to LACP on the ISP side too, but that’s currently out of the question, unfortunately.

P.S.2.: I don’t know what equipment does the ISP have there, and I don’t know if my devices are connected to a single switch / a chassis / two independent switches.

Thank you!

Without explicitly setting the bonding or aggregation capabilities on the ISP side, you’re likely creating a loop and shutting the ports down from either your end or the other end, or both. Never do that to an ISP (it breaks a lot of things when people loop our customer-facing networks up).

The better way to make this work is to have a single switch connected to the ISP, then have that switch run a LACP LAG into your MLAG stack. That’s what I’ve done in a couple places where I only have a single connection in/out of the redundant portion of the network. Sure, that switch becomes a single point of failure, but chances are the ISP’s CPE is also a single point of failure.

Well, this means that if I have MLAG enabled on a bridge, all the redundant connections 2 x links to a device) need to be LACP?

Why doesn't STP work like it would normally work with a single switch? From what I know (might be wrong), but an MLAG makes both switches identify themselves as the same bridge ID, so in theory, the ISP should see two links going to the same “device", so it should disable on interface, right?

Also, I guess that the correct way to fix this is to have an LACP to the ISP (already asked if that's supported), right?

P.S. can't I have my stp instance block one of the ports to the ISP?! I mean, just like the question above, the ISP either has a single switch, so stp should see the loop, or there are two switches interconnected, where STP should also work fine.

Thank you, I really appreciate the feedback.

Personally I’ve not had much luck relying on (R)STP to reliably handle any kind of failover. Not that it can’t work, but it quickly becomes a rat’s nest and doesn’t scale well. On the ISP network, I usually do that at Layer 3. If it’s a Layer 2 thing, I do what I can with LACP. I do have a couple of cases where the client (VMware ESXi) handles it, or where we do an active/standby bond. But I control both ends of all those links, which is not the situation in this case.

STP only works when all bridges (switches or otherwise) are participating in STP. That is, if all the ports can see STP packets coming and going, they can build up an internal idea of what the switched network looks like. But ISP’s typically drop/ignore STP packets coming from customer ports for security and stability. Therefore, your MLAG stack won’t see/know that it’s got multiple ports plugged into the same system, unless it is using loop detection, where it “sees” the same MAC address coming from multiple ports. That can cause all kinds of undesirable behavior, ranging from sporadic disruption to complete shut down of the network.

1 Like