disabling RSTP - good idea?

mysz0n · November 17, 2020, 9:12am

Is disabling RSTP on a network that has about 1200 mikrotik devices a good idea? one of the admins in our network stated that when RSTP is turned off, the network is more stable, the newly connected radio links start responding to the ping immediately. There is no “unnecessary” broadcasting of devices. Is it true? recently we observe random drop of pppoe sessions to our CCRs - pppoe servers, I wonder if this could be related? at the moment, rstp is disabled on about 600 devices.

johnnyy · November 17, 2020, 9:45am

I think it is, I faced with similar issue some time ago

sindy · November 17, 2020, 12:10pm

If you are absolutely sure that an L2 loop can never occur in your network, you don’t need STP at all. But the thing is that even with a strictly tree-formed topology, you can never be absolutely sure about this unless unused ports are filled with concrete, cables are glued to the used ports, and no one ever touches the configuration.

The reason why radio links start responding to pings faster when STP is disabled is that when a member port of a bridge with STP enabled goes physically up, the bridge waits for some time whether STP BPDUs arrive to that port from the outside or not, and opens the port for forwarding of all traffic only after that guard interval elapses (or after a BPDU is received and the path is found safe). With wireless interfaces, the thing is that whenever no client is associated to an interface in AP mode, that interface appears as “down” to the bridge to which it is connected. So if a single client is roaming from one AP to another on the same bridge, there is a gap in the client’s connectivity. If more clients are associated to both APs, the APs stay up so the STP doesn’t disable forwarding on them.

So if the fact that a device freshly connected to a previously unused port of a bridge only starts responding to a ping in 15 seconds (by default) is considered a “stability” issue, then yes, switching off STP will make the network more “stable”. But for me, “instability” begins when someone creates an L2 loop, and the complete bandwidth of the links forming up that loop gets occupied by a broadcast storm, and no services work. The purpose of STP is to prevent these scenarios.

You can disable STP selectively per port, by setting edge=yes bpdu-guard=yes under /interface bridge port. This prevents BPDUs from being sent on that port, and the guard interval from being applied when the port comes physically up, but it also sort of prevents L2 loops if you connect such port to another one where STP is enabled as the first BPDU received on a port with bpdu-guard activated shuts the port down until you manually restart it.

Regarding “unnecessary broadcasting”, I wonder what exactly this means. A 53-byte BPDU is sent on each link every 2 seconds or so, what a bandwidth hog. And although its destination MAC address is a multicast one, it is from a “link-local” range, i.e. in a network where all switches/bridges are 802.1D compliant, messages with destination MAC addresses do not get past the first recipient. But if you set protocol-mode=none on a bridge, it treats frames with these destinatiom MAC addresses just like any other ones, i.e. it broadcasts them to all ports. In more complex topologies, this may confuse those devices connected to such bridge that do run STP.

So while I cannot see how disabling STP in the whole network could cause PPPoE sessions to be dropped, if it is not disabled in the whole network but only in part of it, BPDUs from multiple isolated islands of devices still running STP may leak to destinations also running STP, making them recalculate the spanning tree at each BPDU arrival as the BPDU from each of these other islands indicates a different root bridge ID, which may cause L2 traffic interruptions.

But regardless all the above - what makes you run a single L2 network consisting of 1200 elements? Does this figure include also all the client devices, or only the switches?

bpwl · November 17, 2020, 1:17pm

What I learned about STP and RSTP was in a large Cisco switch network. I don’t know if all applies to Mikrotik STP and RSTP or still applies today.
But I have some concerns on limitations about using RSTP and certainly about the older STP.
We built a fully redundant industrial switched network with core, distribution and edge switches. (1300 edge ports)
Downtime had to be avoided (after a major STP downed network, due to a slow converging STP transition)

What I remember … (from 20 years ago)

-STP has a diameter limit of 7 hops. The longest path in the network should not exceed 7 hops.
-RSTP has a larger limit . The maximum allowed diameter for RSTP is 40. “Max hops” default is 20 in Mikrotik ! (You hit this limit faster than expected.)

With STP the state change of ALL ports will trigger a state transition (root switch selection, and loop free tree calculation, setting of the STP ports.)
With RSTP the “edge” port state change will not trigger a state transition. And as edge ports are not included in the tree calculation convergence, that goes much faster.
Priority must be set carefully. Root switch selected should be one of the 2 core switches in that network. Path cost is used to steer what ports will be disabled.
During the state transition all or most traffic stops in the network.
STP and RSTP are two different environments, that decide independently. RSTP is (was) not compatible between different brand implementations.
Long chains (redundant loops) bring instability. Short redundant loops should be formed.
BPDU guard was added to stop development engineers from attaching their own switch to the corporate network (initiating a spanning tree transition when one of their ports changed state)

My Mikrotik L2 setup: (wifi for Internet access)

avoid STP (STP as protocol selection)
WLAN interfaces should be “edge” interfaces/ports for RSTP (Edge on “Yes”, not “auto” for bridge port, if wifi ‘bridge’ is not used ? Disable ‘bridge’ in AP-bridge mode if not needed?)
But: While in the AP and PtMP concentrators I’m using the same ‘Horizon’ in all bridge ports to separate them except the uplink port, I just disable the STProtocol to “none”.
My L2 netwerk diameter is 15, with 29 Mkt devices. (There are multiple L2 networks but L3 separated).