Connection to CAPsMAN suddenly interrupted

Dear fellow forum members,

Today I had a weird experience. Using my D53G-5HacD2HnD as capsman and a “cap ac” connected/managed by the capsman. cap is connected by eth2.

Today the connection to cap was interrupted. Never seen this before. The logs on capsman device was flooded over a longer period (30min to 1h I guess) with this (I have enable debug logs for almost any topic):

5HacD2HnD

 2025-03-06 12:31:22 bridge,stp GENERAL: ether2:0 learning
 2025-03-06 12:31:23 bridge,stp GENERAL: ether2:0 discarding
 2025-03-06 12:31:24 bridge,stp GENERAL: ether2:0 learning
 2025-03-06 12:31:25 bridge,stp GENERAL: ether2:0 discarding
 2025-03-06 12:31:26 bridge,stp GENERAL: ether2:0 learning
 2025-03-06 12:31:27 bridge,stp GENERAL: ether2:0 discarding
 2025-03-06 12:31:28 bridge,stp GENERAL: ether2:0 learning
 2025-03-06 12:31:29 bridge,stp GENERAL: ether2:0 discarding
 2025-03-06 12:31:30 bridge,stp GENERAL: ether2:0 learning
 2025-03-06 12:31:31 bridge,stp GENERAL: ether2:0 discarding
 2025-03-06 12:31:32 bridge,stp GENERAL: ether2:0 learning
 2025-03-06 12:31:33 bridge,stp GENERAL: ether2:0 discarding
 2025-03-06 12:31:34 bridge,stp GENERAL: ether2:0 learning
 2025-03-06 12:31:35 bridge,stp GENERAL: ether2:0 discarding

As first try I unplugged the ethernet cable on eth2. This already resolved it.

5HacD2HnD

 2025-03-06 12:45:30 bridge,stp GENERAL: ether2:0 discarding
 2025-03-06 12:45:32 bridge,stp GENERAL: ether2:0 learning
 2025-03-06 12:45:32 bridge,stp GENERAL: ether2:0 discarding
 2025-03-06 12:45:34 bridge,stp GENERAL: ether2:0 learning
 2025-03-06 12:45:34 interface,info ether2 link down
 2025-03-06 12:45:34 interface,info GENERAL: ether2 link down
 2025-03-06 12:45:34 route,debug,calc GENERAL: route/calc/merge/input/route
 2025-03-06 12:45:34 route,debug,calc GENERAL: route/calc/merge/route
 2025-03-06 12:45:34 route,debug,calc GENERAL: route/calc/fwp/merge
 2025-03-06 12:45:34 route,debug,calc GENERAL: route/calc/publish
 2025-03-06 12:45:34 route,debug,calc GENERAL: route/calc/cleanup/route
 2025-03-06 12:45:34 route,debug,calc GENERAL: route/calc/publish
 2025-03-06 12:45:39 interface,info ether2 link up (speed 1G, full duplex)
 2025-03-06 12:45:39 interface,info GENERAL: ether2 link up (speed 1G, full duplex)
 2025-03-06 12:45:39 bridge,stp GENERAL: ether2:0 becomes Designated
 2025-03-06 12:45:39 bridge,stp GENERAL: ether2:0 learning
 2025-03-06 12:45:39 bridge,stp GENERAL: ether2:0 forwarding
 2025-03-06 12:45:39 bridge,stp GENERAL: ether2:0 TCHANGE start
 2025-03-06 12:45:39 route,debug,calc GENERAL: route/calc/merge/input/route
 2025-03-06 12:45:39 route,debug,calc GENERAL: route/calc/merge/route
 2025-03-06 12:45:39 route,debug,calc GENERAL: route/calc/fwp/merge
 2025-03-06 12:45:39 route,debug,calc GENERAL: route/calc/publish
 2025-03-06 12:45:39 route,debug,calc GENERAL: route/calc/cleanup/route
 2025-03-06 12:45:39 route,debug,calc GENERAL: route/calc/merge/input/route
 2025-03-06 12:45:39 route,debug,calc GENERAL: route/calc/merge/route
 2025-03-06 12:45:39 route,debug,calc GENERAL: route/calc/fwp/merge
 2025-03-06 12:45:39 route,debug,calc GENERAL: route/calc/publish
 2025-03-06 12:45:39 route,debug,calc GENERAL: route/calc/cleanup/route
 2025-03-06 12:45:39 route,debug,calc GENERAL: route/calc/publish
 2025-03-06 12:45:42 bridge,stp GENERAL: ether2:0 TCHANGE over
 2025-03-06 12:45:49 caps,info cap@78:9A:XX:XX:XX:XX%*9 joined
 2025-03-06 12:45:49 caps,info GENERAL: cap@78:9A:XX:XX:XX:XX%*9 joined
 2025-03-06 12:45:49 wireless,info provision cap@78:9A:XX:XX:XX:XX%*9 radio 78:9A:XX:XX:XX:X1
 2025-03-06 12:45:49 wireless,info provision cap@78:9A:XX:XX:XX:XX%*9 radio 78:9A:XX:XX:XX:X2

Link up again. CAP joined again. Wifi back in game.

The logs on “cap” show this from today:

cap ac

 2025-03-06 10:01:32 interface,info ether1 link down
 2025-03-06 10:01:40 caps,info disconnected from capsman@XX:XX:XX:XX:XX:D5%*5, failed to connect
 2025-03-06 10:01:43 interface,info ether1 link up (speed 1G, full duplex)
 2025-03-06 10:02:09 interface,info ether1 link down
 2025-03-06 10:02:20 interface,info ether1 link up (speed 1G, full duplex)
 2025-03-06 10:02:25 caps,info selected CAPsMAN capsman@XX:XX:XX:XX:XX:D5%*5
 2025-03-06 10:02:25 caps,info connected to capsman@XX:XX:XX:XX:XX:D5%*5
 2025-03-06 11:47:38 caps,info disconnected from capsman@XX:XX:XX:XX:XX:D5%*5, failed to connect
 2025-03-06 12:45:33 interface,info ether1 link down
 2025-03-06 12:45:39 interface,info ether1 link up (speed 1G, full duplex)
 2025-03-06 12:45:48 caps,info selected CAPsMAN capsman@XX:XX:XX:XX:XX:D5%*5
 2025-03-06 12:45:49 caps,info connected to capsman@XX:XX:XX:XX:XX:D5%*5

And now my question: what has happened here? RSTP hickup? Has anyone an educated guess? My log is in memory and limited to the usual 1000 lines, so I have no idea what was logged before the “ether2:0 discarding” log flood.

PS:
D53G-5HacD2HnD running 7.17.2
cap ac running 7.18.1

Your D53G-5HacD2HnD logs look a lot like the kind of mess I was seeing in my logs on the RB2011. Note that the RB2011 was not directly to blame. By upgrading it to v7.16.2 or higher, it became intolerant of the problem that was actually in another box. Look at the hardware connected to ether2 on your D53G-5HacD2HnD. Is that your cAP-ac?

Blending sources, the cAP-ac has an IPQ-4018 SOC which has a Qualcomm Switch built into it. MikroTik shows the switch as being a QCA8327, but the true 8327 dates back to Atheros. After Qualcomm acquired Atheros, they came out with the QCA8337. According to an AI search, whether the IPQ-4018 really has an 8327 or an 8337 is unclear. I would go to the cAP-ac’s /Switch/Settings and see what RouterOS thinks the switch is. If it thinks the switch is a QCA8337, then I would say you are having the exact same problem.
Solutions:
a) Downgrade the cAP-ac to v7.17.2 and alter the config to use software bridging instead of hardware switching. Fast forward is fine.
b) Alternatively, downgrade the D53G-5HacD2HnD to v7.13.5, which tolerates the extra BPDU packets and creates a stable RSTP tree despite them.

The only device connected to D53G-5HacD2HnD/ether2 is the cap-ac.

/interface/ethernet/switch/print 
Columns: NAME, TYPE
# NAME     TYPE        
0 switch1  Atheros-8327

You’re pointing out something that I noticed a long time ago but was never able to identify the cause of. Up until version 7.13.5, I had no issues with RSTP. I skipped 7.14, but with 7.15, I initially had problems with the bridge and RSTP when used with other RouterOS devices acting as “station clients” connected to the D53G-5HacD2HnD. Bridge ports would randomly stop forwarding after an undefined period. As far as I recall, this improved significantly with 7.16. In 7.15 I explicitly set the bridge priority to ensure that the D53G-5HacD2HnD bridge is always elected as the root bridge - that helped as well.

http://forum.mikrotik.com/t/v7-15-3-stable-is-released/176334/446
http://forum.mikrotik.com/t/v7-15-3-stable-is-released/176334/390
http://forum.mikrotik.com/t/rstp-blocking-dhcp/175629/3
http://forum.mikrotik.com/t/v7-14-3-stable-is-released/174007/264

Is the cAP-ac what is on ether2?

I used the Packet Sniffer function in the Mikrotik to “stream” the packets to Wireshark running on my computer. This allowed me to see that there were multiple sources of STP BPDU packets coming into the port even though it should have been point-to-point. If ether2 is connected to the equivalent of a dumb bridge, and you have multiple RSTP enabled bridges connected to that dumb bridge, then the problem is likely that intolerance of multiple BPDU sources. This might be related to MikroTik’s work on point-to-point ethernet. Having just thought of this, I haven’t done any testing, but you could try playing with the Point-to-Point setting on the STP tab of the Bridge Port settings for ether2.

Cap ac is only connected to the Chateau via ether1.

Cap/ether1-----------Chateau/ether2

Mikrotik just got back to me…
In order to have hardware switching and VLANs, on both the Atheros 8327 and the QCA8337 you must set up the VLAN configuration in the Switch menu. On MikroTik’s suggestion, I enabled independent-learning for VLAN1 (my default untagged network) in the Switch VLAN configuration. This corrected the BPDU packets leaking through the QCA8337.
It corrected another problem I had just found with the Atheros 8327. I found that if I set edge=no on the bridge port that was currently root, it would change to designated, and the alternate port on the other side of the CPU Bridge (RB2011) would change to root. I did not leave it that way long enough to find out if it really created a loop. Enabling independent-learning for VLAN1 on the Atheros 8327 corrected that problem too.

Is this even a thing when no VLANs are in use? I guess not, because the only BPDU packets should be from (implicit) VLAN 1 right?

BPDUs for STP and RSTP are untagged (because unlike MSTP, STP and RSTP are VLAN-agnostic). It’s then up to switch-chip magic about how to deal with untagged frames (including BPDUs) so that they’re delivered to (software) bridge … untagged.

Of course, when VLANs are not used, this is not an issue.

The issue happened again today. Despite both devices currently on 7.17.2. I was right watching a video stream when wifi dropped. So I could look at the logs right away. Started at “06:18:50”.

2025-03-09 05:33:52 dhcp,info defconf assigned 192.168.0.4 for xx:xx:xx:xx:xx:52
2025-03-09 06:02:07 wireless,info xx:xx:xx:xx:xx:B0@cap-wifi2 disconnected, connection lost, signal strength -77
2025-03-09 06:18:50 caps,info disconnected cap@xx:xx:xx:xx:xx:xx%*9, connection interrupted
 2025-03-09 06:18:50 caps,info GENERAL: disconnected cap@xx:xx:xx:xx:xx:B7%*9, connection interrupted
 2025-03-09 06:18:51 bridge,stp GENERAL: ether2:0 discarding
 2025-03-09 06:18:51 route,debug,calc GENERAL: route/calc/merge/input/route
 2025-03-09 06:18:51 route,debug,calc GENERAL: route/calc/merge/route
 2025-03-09 06:18:51 route,debug,calc GENERAL: route/calc/fwp/merge
 2025-03-09 06:18:51 route,debug,calc GENERAL: route/calc/publish
 2025-03-09 06:18:51 route,debug,calc GENERAL: route/calc/cleanup/route
 2025-03-09 06:18:53 bridge,stp GENERAL: ether2:0 learning
 2025-03-09 06:18:55 bridge,stp GENERAL: ether2:0 discarding
 2025-03-09 06:18:56 bridge,stp GENERAL: ether2:0 learning
 2025-03-09 06:18:57 bridge,stp GENERAL: ether2:0 discarding
 2025-03-09 06:18:58 bridge,stp GENERAL: ether2:0 learning
 2025-03-09 06:18:59 bridge,stp GENERAL: ether2:0 discarding
 2025-03-09 06:19:00 bridge,stp GENERAL: ether2:0 learning
 2025-03-09 06:19:01 bridge,stp GENERAL: ether2:0 discarding
 2025-03-09 06:19:02 bridge,stp GENERAL: ether2:0 learning
 2025-03-09 06:19:03 bridge,stp GENERAL: ether2:0 discarding
 2025-03-09 06:19:04 bridge,stp GENERAL: ether2:0 learning
 2025-03-09 06:19:05 bridge,stp GENERAL: ether2:0 discarding
 2025-03-09 06:19:06 bridge,stp GENERAL: ether2:0 learning
 2025-03-09 06:19:07 bridge,stp GENERAL: ether2:0 discarding
 2025-03-09 06:19:08 bridge,stp GENERAL: ether2:0 learning
 2025-03-09 06:19:09 bridge,stp GENERAL: ether2:0 discarding

This time I looked at “interface/bridge/port monitor” and “/interface/ethernet”. Nothing suspicious other than the constantly toggling between “learning: no” and “learning: yes” of port ether2. “forwarding: no”.

                   ;;; defconf        
            interface: ether2         
               status: in-bridge      
          port-number: 2              
                 role: designated-port
            edge-port: no             
  edge-port-discovery: yes            
  point-to-point-port: yes            
         external-fdb: no             
         sending-rstp: yes            
             learning: yes            
           forwarding: no            
     actual-path-cost: 20000          
     hw-offload-group: switch1

This time I just did “/interface/ethernet/disable ether2” and “/interface/ethernet/enable ether2”. Working again.

 2025-03-09 06:43:08 interface,info ether2 link down
 2025-03-09 06:43:08 system,info device changed by ssh:user@192.168.x.x/action:4 (/interface set ether2 disabled=yes; /interface ethernet set [ find ] disabled=yes; /queue interface set ether2)
 2025-03-09 06:43:08 route,debug,calc GENERAL: route/calc/merge/input/route
 2025-03-09 06:43:08 route,debug,calc GENERAL: route/calc/merge/route
 2025-03-09 06:43:08 route,debug,calc GENERAL: route/calc/fwp/merge
 2025-03-09 06:43:08 route,debug,calc GENERAL: route/calc/publish
 2025-03-09 06:43:08 route,debug,calc GENERAL: route/calc/cleanup/route
 2025-03-09 06:43:12 system,info device changed by ssh:user@192.168.x.x/action:5 (/interface set ether2 disabled=no; /interface ethernet set [ find ] disabled=no; /queue interface set ether2)
 2025-03-09 06:43:13 interface,info ether2 link up (speed 1G, full duplex)
 2025-03-09 06:43:13 bridge,stp GENERAL: ether2:0 becomes Designated
 2025-03-09 06:43:13 route,debug,calc GENERAL: route/calc/merge/input/route
 2025-03-09 06:43:13 route,debug,calc GENERAL: route/calc/merge/route
 2025-03-09 06:43:13 route,debug,calc GENERAL: route/calc/fwp/merge
 2025-03-09 06:43:13 route,debug,calc GENERAL: route/calc/publish
 2025-03-09 06:43:13 route,debug,calc GENERAL: route/calc/cleanup/route
 2025-03-09 06:43:14 bridge,stp GENERAL: ether2:0 learning
 2025-03-09 06:43:14 bridge,stp GENERAL: ether2:0 forwarding
 2025-03-09 06:43:14 bridge,stp GENERAL: ether2:0 TCHANGE start
 2025-03-09 06:43:14 route,debug,calc GENERAL: route/calc/merge/input/route
 2025-03-09 06:43:14 route,debug,calc GENERAL: route/calc/merge/route
 2025-03-09 06:43:14 route,debug,calc GENERAL: route/calc/fwp/merge
 2025-03-09 06:43:14 route,debug,calc GENERAL: route/calc/publish
 2025-03-09 06:43:14 route,debug,calc GENERAL: route/calc/cleanup/route
 2025-03-09 06:43:17 bridge,stp GENERAL: ether2:0 TCHANGE over
 2025-03-09 06:43:21 caps,info cap@xx:xx:xx:xx:xx:B7%*9 joined

What is happening here? I still don’t have a clue how to debug this further.

I can’t tell exactly which ROS version did introduce this issue. I was running “wap ax” instead of “cap ac” the whole cycle of 7.17 IIRC. Before 7.17 I did not face this issue. Last week I switched back to “cap ac” and now this issue pops up. Maybe it is some change in 7.17 causing the problem.

Here has someone this “learning/discarding” flood as well. http://forum.mikrotik.com/t/strange-mstp-behaviour-crs328-rb760igs-rbm33g-hap-ac-2-mmips-specific/159525/1

But over there it appears right after connecting. Here it took 2-3 days to trigger the issue.

It now gets annoying:

 2025-03-13 16:36:35 bridge,stp GENERAL: ether2:0 discarding
 2025-03-13 16:36:35 bridge,stp GENERAL: ether2:0 learning
 2025-03-13 16:36:35 bridge,stp GENERAL: ether2:0 forwarding
 2025-03-13 16:36:35 route,debug,calc GENERAL: route/calc/merge/input/route
 2025-03-13 16:36:35 route,debug,calc GENERAL: route/calc/merge/route
 2025-03-13 16:36:35 route,debug,calc GENERAL: route/calc/fwp/merge
 2025-03-13 16:36:35 route,debug,calc GENERAL: route/calc/publish
 2025-03-13 16:36:35 route,debug,calc GENERAL: route/calc/cleanup/route
 2025-03-13 16:36:35 route,debug,calc GENERAL: route/calc/publish
 2025-03-13 16:40:07 caps,info disconnected cap@XX:XX:XX:XX:XX:B7%*9, connection interrupted
 2025-03-13 16:40:07 caps,info GENERAL: disconnected cap@XX:XX:XX:XX:XX:B7%*9, connection interrupted
 2025-03-13 16:40:07 bridge,stp GENERAL: ether2:0 discarding
 2025-03-13 16:40:07 bridge,stp GENERAL: ether2:0 learning
 2025-03-13 16:40:07 bridge,stp GENERAL: ether2:0 forwarding
 2025-03-13 16:40:07 route,debug,calc GENERAL: route/calc/merge/input/route
 2025-03-13 16:40:07 route,debug,calc GENERAL: route/calc/merge/route
 2025-03-13 16:40:07 route,debug,calc GENERAL: route/calc/fwp/merge
 2025-03-13 16:40:07 route,debug,calc GENERAL: route/calc/publish
 2025-03-13 16:40:07 route,debug,calc GENERAL: route/calc/cleanup/route
 2025-03-13 16:40:07 route,debug,calc GENERAL: route/calc/merge/input/route
 2025-03-13 16:40:07 route,debug,calc GENERAL: route/calc/merge/route
 2025-03-13 16:40:07 route,debug,calc GENERAL: route/calc/fwp/merge
 2025-03-13 16:40:07 route,debug,calc GENERAL: route/calc/publish
 2025-03-13 16:40:07 route,debug,calc GENERAL: route/calc/cleanup/route
 2025-03-13 16:40:08 route,debug,calc GENERAL: route/calc/publish
 2025-03-13 16:40:18 caps,info cap@XX:XX:XX:XX:XX:B7%*9 joined
 2025-03-13 16:40:18 caps,info GENERAL: cap@XX:XX:XX:XX:XX:B7%*9 joined
 2025-03-13 16:40:18 wireless,info provision cap@XX:XX:XX:XX:XX:B7%*9 radio XX:XX:XX:XX:XX:B9
 2025-03-13 16:40:18 wireless,info provision cap@XX:XX:XX:XX:XX:B7%*9 radio XX:XX:XX:XX:XX:BA

I really want to know more what is wrong with RSTP and understand why it is flapping between forwarding/discarding.

I still had one 7.18.2 device on my network acting in station mode (map lite connecting my LAN Lexmark printer on WLAN). I disconnected the map like 10 days ago and since then have not experienced this bridge RSTP issue again. It seems like something in 7.18.x cause this problems. There is still a hap lite in station mode connected as well - but running ROS v6.