Switch bleeding tagged multicast/broadcast frames from other vlan. Bug?

Hello,

I have a Mikrotik hEX PoE running the latest 7.16 RouterOS version and I have noticed a weird behaviour (frame leaking between vlans for limited time after connection).

I have created 2 vlans. A VLAN 90 for the internal networking and a VLAN 100 for some CCTV stuff.

I believe this is what I have configured:

  • Ether1: access port in vlan 90 (192.168.123.0/24)


  • Ether2: access port in vlan 100 (172.16.100.0/24)


  • Ether3: access port in vlan 100 (172.16.100.0/24)


  • Ether4: access port in vlan 100 (172.16.100.0/24)


  • Ether5: access port in vlan 100 (172.16.100.0/24)

I’ve noticed some strange behavior. When my laptop isn’t connected to the network, I start Wireshark and begin monitoring the wired port. Then, I connect this port to an access port in VLAN 100 (like ether2). For the first 5 seconds, I see packets with a dot1q tag of 90 in my capture! It seems like the switch is leaking tagged frames from another VLAN into my access port.

All the tagged packets I can see on my vlan100 accessport appear to be multicast/broadcasts in VLAN90 (they are always visible ± 5 first seconds of the capture (total capture time 20s)):







This is the config on my switch that I believe is relevant for this issue:

/interface bridge
add admin-mac=D4:01:C3:96:3E:3D auto-mac=no name=bridge1 port-cost-mode=short


/interface bridge port
add bridge=bridge1 ingress-filtering=no interface=ether1 internal-path-cost=10 path-cost=10
add bpdu-guard=yes bridge=bridge1 edge=yes ingress-filtering=no interface=ether2 internal-path-cost=10 path-cost=10
add bpdu-guard=yes bridge=bridge1 edge=yes ingress-filtering=no interface=ether3 internal-path-cost=10 path-cost=10
add bpdu-guard=yes bridge=bridge1 edge=yes ingress-filtering=no interface=ether4 internal-path-cost=10 path-cost=10
add bpdu-guard=yes bridge=bridge1 edge=yes ingress-filtering=no interface=ether5 internal-path-cost=10 path-cost=10


/interface vlan
add interface=bridge1 name=CCTV_VLAN vlan-id=100
add interface=bridge1 name=INTERNAL_VLAN vlan-id=90

/ip address
add address=192.168.123.101/24 interface=INTERNAL_VLAN network=192.168.123.0
add address=172.16.100.254/24 interface=CCTV_VLAN network=172.16.100.0


/interface ethernet switch port
set 0 default-vlan-id=90 vlan-mode=secure
set 1 default-vlan-id=100 vlan-mode=secure
set 2 default-vlan-id=100 vlan-mode=secure
set 3 default-vlan-id=100 vlan-mode=secure
set 4 default-vlan-id=100 vlan-mode=secure
set 5 vlan-mode=secure


/interface ethernet switch vlan
add independent-learning=yes ports=ether1,switch1-cpu switch=switch1 vlan-id=90
add independent-learning=yes ports=ether2,ether3,ether4,ether5,switch1-cpu switch=switch1 vlan-id=100

Note that you would expect an “always-strip” on the ethernet switch ports. However, as the hEX PoE is using the QCA8337 switchchip, it doesn’t matter according to the doc:

QCA8337 and Atheros8327 switch chips ignore the vlan-header property and uses the default-vlan-id property to determine which ports are access ports. The vlan-header is set to leave-as-is and cannot be changed while the default-vlan-id property should only be used on access ports to tag all ingress traffic.

On QCA8337 and Atheros8327 switch chips, a default vlan-header=leave-as-is property should be used. The switch chip will determine which ports are access ports by using the default-vlan-id property. The default-vlan-id should only be used on access/hybrid ports to specify which VLAN the untagged ingress traffic is assigned to.

In QCA8337 and Atheros8327 chips when vlan-mode=secure is used, it ignores switch port vlan-header options. VLAN table entries handle all the egress tagging/untagging and works as vlan-header=leave-as-is on all ports. It means what comes in tagged, goes out tagged aswell, only default-vlan-id frames are untagged at the egress port.

When I do put the ‘always strip’ on ether1 to ether5. I still see the ARPs etc from vlan90 on my accessport in vlan100. But this time without the dot1Q tag 90 :slight_smile:. This is a bug right?




Anybody ever noticed something similar or can explain this behaviour? It feels like a bug to me. What do you think?

those “5 seconds” suggest to me that the ports have FastForward enabled by default.. and since it is the default state it does not appear in the configuration script.

You probably need to explicitly disable fast forward on the VLAN ports.

https://help.mikrotik.com/docs/display/ROS/Bridging+and+Switching#BridgingandSwitching-FastForward

Hello,

thank you for your reply. Your insight is appreciated. I checked the link you posted. One of the conditions for fast forward to become active is that Hardware Offloading should be disabled. As far as I know it is not in my case, as I am switching everything on the switch chip.

[admin@Mikrotik hEX PoE] /interface/bridge/settings> print
              use-ip-firewall: no
     use-ip-firewall-for-vlan: no
    use-ip-firewall-for-pppoe: no
              allow-fast-path: yes
      bridge-fast-path-active: yes
     bridge-fast-path-packets: 1024
       bridge-fast-path-bytes: 223788
  bridge-fast-forward-packets: 0
    bridge-fast-forward-bytes: 0



[admin@Mikrotik hEX PoE] /interface/bridge/port> print
Flags: I - INACTIVE; H - HW-OFFLOAD
Columns: INTERFACE, BRIDGE, HW, PVID, PRIORITY, PATH-COST, INTERNAL-PATH-COST, HORIZON
#    INTERFACE  BRIDGE   HW   PVID  PRIORITY  PATH-COST  INTERNAL-PATH-COST  HORIZON
0  H ether1     bridge1  yes     1  0x80             10                  10  none
1 IH ether2     bridge1  yes     1  0x80             10                  10  none
2 IH ether3     bridge1  yes     1  0x80             10                  10  none
3 IH ether4     bridge1  yes     1  0x80             10                  10  none
4 IH ether5     bridge1  yes     1  0x80             10                  10  none

I don’t know all the fasttrack, pastpath and fast forward differences yet (still quite new to the Mikrotik world).
On the link you share it states fast forward will not be used when using switch chips. But fastpath apperantly is still used by my setup so I might be understanding that incorrectly. I will do some research on fast-path, fast forward etc to get a better understanding. But for now it doesn’t look like that’s the culprit.

Fast Forward is disabled when hardware offloading is enabled. Hardware offloading can achieve full write-speed performance when it is active since it will use the built-in switch chip (if such exists on your device), fast forward uses the CPU to forward packets. When comparing throughput results, you would get such results: Hardware offloading > Fast Forward > Fast Path > Slow Path.

Anyways, I tried disabling ‘fastpath’:

But I tried to disable fastpath just now to see if it would make any difference:

[admin@Mikrotik hEX PoE] > /interface/bridge/settings print
              use-ip-firewall: no
     use-ip-firewall-for-vlan: no
    use-ip-firewall-for-pppoe: no
              allow-fast-path: no
      bridge-fast-path-active: no
     bridge-fast-path-packets: 1105
       bridge-fast-path-bytes: 234578
  bridge-fast-forward-packets: 0
    bridge-fast-forward-bytes: 0

I reran the test where I connect the computer again to the bridge with wireshark open, but i could still see the vlan90 packets the first 5 seconds. It’s really weird imo.

/interface ethernet switch vlan
add independent-learning=yes ports=ether1,switch1-cpu switch=switch1 vlan-id=90
add independent-learning=yes ports=ether2,ether3,ether4,ether5,switch1-cpu switch=switch1 vlan-id=100

Sure the independent-learning is required?

/interface/bridge print
check if fast-forward = “no” .. if it is “yes” (or missing in case of “yes” defaults that are not printed) then that might be the cause of those 5 seconds.

the lines of
bridge-fast-forward-packets: 0
bridge-fast-forward-bytes: 0

might be confusing in your output there because i think these only count packets that reach the CPU… but if the bridge fast forward setting is “yes” then the switch chip will probably fast forward those packets without bothering the CPU.

disable_fast_forward.png
on the same switch (CRS326-24G-2S+) as the screenshot ( with VLANs too ) “/interface/bridge/settings print” shows different info than “/interface/bridge print”

> /interface/bridge/settings/ print
              use-ip-firewall: no
     use-ip-firewall-for-vlan: no
    use-ip-firewall-for-pppoe: no
              allow-fast-path: yes
      bridge-fast-path-active: yes
     bridge-fast-path-packets: 150145
       bridge-fast-path-bytes: 11193740
  bridge-fast-forward-packets: 0
    bridge-fast-forward-bytes: 0

Hello,

Thank you for the reply. I disabled it as you suggested:

[admin@Mikrotik hEX PoE] > /interface/bridge print
Flags: X - disabled, R - running
 0 R name="bridge1" mtu=auto actual-mtu=1500 l2mtu=1598 arp=enabled arp-timeout=auto mac-address=D4:01:C3:96:3E:3D protocol-mode=rstp fast-forward=no igmp-snooping=no auto-mac=no admin-mac=D4:01:C3:96:3E:3D
     ageing-time=5m priority=0x8000 max-message-age=20s forward-delay=15s transmit-hold-count=6 vlan-filtering=no dhcp-snooping=no port-cost-mode=short mvrp=no max-learned-entries=auto

Then I redid the test. Unfortunately the behaviour didn’t change:

Well, I don't need it per se, but I mean, different vlans should have different mac address tables, right. I prefer having 2 independent tables for my vlans, so I don't run into issues when the same MAC appears in both vlans.

Just to test it out, I disabled 'independant learning' on both vlans and redid the test. Still saw tagged vlan 90 packets for the first +-5seconds on ether2.

Independent learning is not about same MAC in different VLANs (in majority of implementations that’s the way it is anyway). Independent learning might be about MSTP (or something like that) where same MAC address may be used in different VLANs and is connected to different switch ports (in which case common FDB may screw things up). I’m not highly experienced in this regard, but so far I’ve never seen a use case where independent learning really did the difference (and if anybody knows such a case, I’d be very interested to learn about it).

Thank you for your insight. However I believe this would lead us offtopic.


Does anybody have another idea why for 5 seconds I see tagged packets of vlan 90 on my accessport in vlan 100.

Hello,

For everybody interested. Turns out this is indeed a bug. Upgrading to 7.17beta4 helped resolving this on my device:

Added in 7.17beta2:
bridge - enable faster HW offloading when detect-internet is disabled;

Cheers!