Avoiding VPLS Routing Loop without Spanning Tree

I had a topology like this:

SW1 --- Mikrotik A ----- Mikrotik C --- SW3
 |                          |
SW2 --- Mikrotik B ---------+

The network was actually a bit more complex than this - but it’s the general idea - the two Mikrotiks (A & B) on the left were intended to be redundant. I wanted to use VPLS to bridge SW1/SW2 with SW3. There is an obvious loop, and the obvious way to avoid problems would be to enable spanning tree.

However, something between SW1 and SW2 ate spanning tree packets, and I eventually gave up on the spanning tree packets getting through. So I came up with this ugly hack which might be useful to someone else - that said, it still carries some risks, so use with care. It’s definitely not the right way to do this.

Basically, I disable the VPLS BGP entry on either Mikrotik A or Mikrotik B when the other is active. I can figure out which one is active based on the VRRP state. So if Mikrotik A is the master in VRRP, it will have active VPLS tunnels, while Mikrotik B will have them disabled.

The heart of it is this script:

# Bring VPLS up or down based on VRRP state

# We look for VPLS interfaces in bridges by looking for /interface vpls bgp_vpls
# entries where bridge-horizon=1, so use a different bridge horizon for entries
# you don't want impacted by this script.

:local vrrp;
:set vrrp "vrrp209"

:if ( [ /interface vrrp find where master=yes running=yes name=$vrrp ] != "" ) do={
    # We are *UP*
    /interface vpls bgp-vpls enable [ find where bridge-horizon=1 disabled=yes ]
} else={
    # We are *DOWN*
    /interface vpls bgp-vpls disable [ find where bridge-horizon=1 disabled=no ]
};

Basically, if this script is executed and vrrp209 is up, it will enable everything with a bridge-horizon=1 in /interface vpls bgp-vpls. Otherwise, it disables everything with a bridge-horizon=1 there. I can create entries with bridge-horizon=2 (for instance) if I don’t want them impacted by this script.

This script is executed at startup, and whenever VRRP goes up or down (via an on-master or on-backup in the /int vrrp entry). The VRRP is really just used as a semaphore to determine which router is “VPLS Master”. I also run this script every minute or so just to have sort of a belt-and-suspenders approach, because bad things are going to happen if VPLS is enabled on both Mikrotik A and Mikrotik B!

The bridge that the VPLS goes into on Mikrotik A and Mikrotik B is running spanning tree with a forwarding delay of 20s (the default). This gives me a little bit of slack on a router reboot, if the VPLS was up when the system was shut down - hopefully my on-start script executes before the bridge comes out of that state.

Now, if for some reason both routers thought they were the VRRP master (like, for instance, VRRP packets were being eaten somewhere), then bad things would happen and you would have a loop and likely some sort of switching meltdown. Spanning tree - assuming your switches don’t eat spanning tree - is probably the right way to do this (or, better yet, layer 3 VPN). I’d be very careful putting this in production somewhere.

It would be kind of nice if, one day, Mikrotik supported an alternative loop detection method for layer 2 - something like what HP does with theirs, where basically the switch can send a broadcast packet, and, if it hears that packet on any port, it can then shut down that port. I imagine they implemented that for similar reasons that I implemented this atrocity - some switches aren’t transparent to spanning tree and/or don’t support it. You can also get into issues with large diameters with spanning tree, if you have a need to cable enough switches together (not good practice, but it also happens).

Check the documentation about split horizon settings for your bridge.
This is what split horizon is for.

I thought about split horizon too, but I couldn’t find a solution that had neither a loop when there is a broadcast packet sent from SW3, nor duplicate packets sent to the SW1/SW2 site when a broadcast is sent from SW3.

Am I missing something? What bridge ports would be in what horizon?

I think you should break the link between switches and put a direct link between A and B, then put all VPLS peers onto a common horizon.

Or probably better would be to connect A & B together directly as well, and use (R)STP between those two routers and the two switches.
Then C puts A and B on the same split horizon, and A and B do not peer with VPLS - just normal bridging.

If A & B both have connections to C, whenever there is a broadcast from SW3, C will send that broadcast (or unlearned destination) to both A&B. A&B would both send those to the SW1/SW2. SW1 would send it’s copy to SW2, while SW2 would send it’s copy to SW1. Now both SW1 and SW2 would send the broadcast out all the ports, including the ports to A&B. A&B would then send the multicast to C (2x). This is assuming the link between A&B is either blocked (RSTP, for instance) or non-existent.

It’s not quite a layer 2 loop in the normal sense, but there’s a lot of extra packets floating around the network.

If a broadcast on SW3 is made

  • Original packet (broadcast) received by SW3 from workstation
  • SW3 sends packet to all ports including C (except original source)
  • C sends to A&B
  • A sends to SW1
  • B sends to SW2
  • SW1 sends to all ports except A’s (all devices on SW1 get a copy). This includes SW2.
  • SW2 sends to all ports except B’s (all devices on SW2 get a copy). This includes SW1.
  • SW1 sends second copy to all ports except SW2 (all devices on SW1 get a copy). This includes A.
  • SW2 sends second copy to all ports except SW1 (all devices on SW2 get a copy). This includes B.
  • A send’s it’s new copy to C.
  • B send’s it’s new copy to C.
  • C sends the copy from A to SW3
  • C sends the copy from B to SW3
  • SW3 sends C’s A copy to all ports except C
  • SW3 sends C’s B copy to all ports except C

So a machine plugged into C gets 3 copies of every broadcast sent from SW3 connected machines and a machine on SW1 or SW2 gets two copies.

No matter how you play around with spanning tree on the Mikrotiks or split horizons, there is no way I can see to do it with split horizon and/or RTSP.

Of course the obvious fix is to fix SW1/SW2 so they do RSTP, but sadly that wasn’t an option for me.