CRS328-4-20S-4S+ RSTP issues on SFP ports

Hey everyone!

We have a mixed vendor network with 15 edge switches at the other end of fiber runs. The CRS328-4-20S-4+ is our central switch. Most of the

Ever since we set up RSTP about once or twice per year we saw weird issues but were never quite able to track down the problem. And being such a rare occurrence we just left it be.

This past week, I was able to track it down: whenever a fiber link drops (SFP), its neighbor goes into a discarding loop. It senses a loop, discards, comes back, screams topology change, all the switches on the network follow and complete the tchange (over) but the port in question goes into blocking again and so on and so forth.

So for instance if the switch on the other end of SFP6 goes dark because someone pulled the plug, SFP5 goes into a blocking/tchange/blocking/tchange loop. Same goes for instance with SFP9, the switch on SFP10 goes into a loop.

I updated the switch to ROS 7.22.2 and its firmware, no change. RSTP is correctly set up afaict: CRS has a priority of 4096, long hex, all others have a priority of 32768. Hello times and everything is default and it only happens on SFP ports with a neighbor connected. If there’s nothing connected to the neighbor port and that switch goes dark, then nothing happens. When the switch that went dark comes back, it sends out a single TCHANGE, all the rest respond and that is the end of it.

If I restrict TCN on a port and it’s neighbor goes dark, the TCHANGE gets caught and the only port affected is that port. I.e. with TCN restricted one switch dying takes another with it but not the whole network.

Gemini tells me this neighbor reacting poorly thing is a known issue with Marvell 98DX chips. Is this true?!? Has anyone else experienced this?

Update:

Today we did some further testing. Of the 21 switches on the network, there are 3 Mikrotiks, 11 Unifis, a Cisco (legacy) and a couple of other brands, some L2, some not. The problem is related to Unifi switches only.

If we plug a switch out, no matter the brand, and the neighbor is a Unifi, it will port flap, scream TCHANGE at every chance it gets and throw the whole network into disarray due to constant fdb flushes.

Any suggestions? Anyone?