CRS106 – RSTP loop protection unreliable – broadcast storm using 6.39.3

Good afternoon guys,
hope you can help me on this:

Description:
In a project I am deploying a number of CRS-106-1C-5S using a fiber ring topology for redundancy.
There are a number of “outer rings” with junction switches connected via SFP3 and SFP4.
In addition, between these there are “subrings” connected using SFP1 and 2, and the subrings are tied to the junction switches.
Thus each switch has 2 or 4 fibers connected, providing 2 levels of redundancy, maintaining connectivity even at power outages/breakdowns.

I have been informed that since RouterOS v6.38 the RSTP works also in CRS switch chip:
https://wiki.mikrotik.com/wiki/Manual:CRS_examples#Spanning_Tree_Protocol
Therefore the SFP ports are setup in switch groups instead of as bridge ports.

Each end of the fiber ring is connected to the sfp ports of a CRS212-1G-10S-1S+.
I have set up VLANs using the switch chip in the CRS112. SFP1 to 4 in a switch group with SFP1 as master, added to an RSTP bridge.

In the CRS106es I have used VLANS over Ether1, configured under /interface/vlan
SFP2 to 4 are added as slaves with SFP1 as master port, added to an RSTP bridge.

Before deploying I tested the switches on my desk (6.38.7) ,and as far as I could see the RSTP worked like expected. I could run multiple fibers between the router, and upon pulling one of them nothing more happened than connection broken for some seconds, before it was restored over another fiber.
No loops.

Problem:
After deploying, the switches were behaving like expected, indicating RSTP were working.
In order to benefit from displaying optical Rx levels in the Dude, I upgraded a number of them to 6.40.4, and rebooted them.
Unfortunately, this caused a broadcast storm (loop) coming in to the CRS212 at wire speed, preventing me from accessing any of the switches.

Long story short, we have managed to disconnect different fibers, killing the loop and downgrading the switches (including the CRS212) to 6.39.3,everything seemed fine. And we kept the optical levels via SNMP.
Then I upgraded firmware on a couple of them to 3.41 and rebooted, and then the loop was there again..

Since then we have managed to isolate a few of the switches and kill the loop, but suddenly it reappears, for instance upon pulling or re-inserting a fiber, or when re-booting one or more of the switches.
At other times the same actions (pulling/inserting fibers or rebooting) does no harm, I just see the device itself and some adjacent devices gets temporarily unavailable , due to “re-routing” in RSTP.
The problem is that I cannot see any pattern in what causes the loops.


My questions are:

1.Is there a reason that 6.39.3 should behave otherwise than 6.38.7 with regards to RSTP, i.e. should I stick to the latter? Or is RSTP working the same way in these versions?
2 Is it possible that the VLANs setup in the switch chip of the CRS212 is the problem, i.e that the CRS106es and the CRS212 does not “collaborate” with regards to RSTP loop protection?
3. Are there any measures I can apply to get full L2 redundancy without loops?