Mesh configuration - pulling my hair!

Hi all,

New Mikrotik user here.

I am trying to do a simple configuration:

SW/AP1 ← mesh → SW/AP2

On SW/AP1,
VLAN10: SSID1, Ether1, Int10 (VLAN interface for IP and DHCP)
VLAN20: SSID2 and Ether2, Int20 (VLAN interface for IP and DHCP)
VLAN30: SSID3, Int30 (VLAN interface for IP and DHCP)

All 3 SSIDs managed by CAPSMAN

On SW/AP2,
VLAN10: SSID1 (called SSID1_2 so I can distinguish between the 2 AP) and Ether1
VLAN20: SSID2 (called SSID2_2)

Both SSID managed by CAPSMAN

I created a dynamic mesh (WDS) between the 2 APs. Mesh is up and both SW/AP show a WDS interface (though the numbering changes over time)

Connecting to any SSID on SW/AP1 works flawlessly. However connecting to the SSID on SW/AP2 fails: I see the registration in CAPSMAN (so the Wifi is connected) but no DHCP address ever comes back.

I tried a bunch of things, some which ended up making the AP unaccessible, such as adding “bridge” to the mesh ports.

The documentation I saw says to add the WDS ports to the bridge as tagged. Tried, does not work.

Can someone point me to the document that shows how to use the mesh to connect two bridges with VLAN together?

Thank you!

OK. I think I got it working. In a fashion.

Mikrotik Audience, RouterOS 6.49.10

CAUTION - I cut myself out a couple of times. After that, I assigned a L3 address to a physical interface to avoid shooting myself in the foot.

Mesh

/interface mesh add name="inter-ap" auto-mac=yes admin-mac=18:FD:74:FA:22:00

Security profile and Wlan interface

/interface wireless security-profile add name="mesh-security" mode=dynamic-keys authentication-types=wpa2-psk unicast-ciphers=aes-ccm group-ciphers=aes-ccm wpa2-pre-shared-key="moomeshmoo"
/interface wireless set 2 name="wlan3" mode=ap-bridge ssid="mesh-test" vlan-mode=no-tag vlan-id=1 wds-mode=dynamic-mesh wds-default-bridge=inter-ap wds-ignore-ssid=no bridge-mode=enabled hide-ssid=no security-profile=mesh-security
/interface mesh port add interface=wlan3 mesh=inter-ap

Mesh interface

/interface vlan add name="mesh.1" vlan-id=1 interface=inter-ap

Add the mesh interface to the bridge
NOTE: I did that before enabling vlan-filtering on the bridge.

/interface bridge port add interface=mesh.1 bridge=bridge pvid=1

Add the VLAN memberships
NOTE: no idea why there is no add/remove interface from the VLAN membership.

/interface bridge vlan set bridge=bridge vlan-ids=1 tagged="" untagged=bridge,mesh.1
/interface bridge vlan set ridge=bridge vlan-ids=10 tagged=mesh.1,bridge,vlan.10 untagged=""

From here, you be able to see your traffic tagged on the other side. For example, 192.168.100.1/24 is the L3 address on the vlan.10 interface on R1, 192.168.100.2/24 the L3 address on the vlan.10 interface on R2.

R1

[admin@R1] > /ping 192.168.100.2
  SEQ HOST                                     SIZE TTL TIME  STATUS                                                           
    0 192.168.100.2                                           timeout                                                          
    1 192.168.100.2                              56  64 0ms  
    2 192.168.100.2                              56  64 1ms  
    3 192.168.100.2                              56  64 1ms  
    4 192.168.100.2                              56  64 0ms

Packet capture on R2

 0 time=4.22 num=1 direction=rx src-mac=18:FD:74:FA:21:FD dst-mac=FF:FF:FF:FF:FF:FF vlan=10 
   interface=mesh.1 protocol=arp size=46 cpu=2 fp=no 

 1 time=4.22 num=2 direction=tx src-mac=18:FD:74:FA:22:34 dst-mac=18:FD:74:FA:21:FD vlan=10 
   interface=mesh.1 protocol=arp size=46 cpu=2 fp=no 

 2 time=5.21 num=3 direction=rx src-mac=18:FD:74:FA:21:FD dst-mac=FF:FF:FF:FF:FF:FF vlan=10 
   interface=mesh.1 protocol=arp size=46 cpu=2 fp=no 

 3 time=5.21 num=4 direction=tx src-mac=18:FD:74:FA:22:34 dst-mac=18:FD:74:FA:21:FD vlan=10 
   interface=mesh.1 protocol=arp size=46 cpu=2 fp=no 

 4 time=5.212 num=5 direction=rx src-mac=18:FD:74:FA:21:FD dst-mac=18:FD:74:FA:22:34 vlan=10 
   interface=mesh.1 src-address=192.168.100.1 dst-address=192.168.100.2 protocol=ip ip-protocol=icmp 
   size=74 cpu=2 fp=no ip-packet-size=56 ip-header-size=20 dscp=0 identification=3140 fragment-offset=0 
   ttl=255 

 5 time=5.213 num=6 direction=tx src-mac=18:FD:74:FA:22:34 dst-mac=18:FD:74:FA:21:FD vlan=10 
   interface=mesh.1 src-address=192.168.100.2 dst-address=192.168.100.1 protocol=ip ip-protocol=icmp 
   size=74 cpu=2 fp=no ip-packet-size=56 ip-header-size=20 dscp=0 identification=46406 
   fragment-offset=0 ttl=64 

 6 time=5.215 num=7 direction=rx src-mac=18:FD:74:FA:21:FD dst-mac=18:FD:74:FA:22:34 vlan=10 
   interface=mesh.1 src-address=192.168.100.1 dst-address=192.168.100.2 protocol=ip ip-protocol=icmp 
   size=74 cpu=2 fp=no ip-packet-size=56 ip-header-size=20 dscp=0 identification=3141 fragment-offset=0 
   ttl=255

On R2, the bridge host shows the entries.

Flags: X - disabled, I - invalid, D - dynamic, L - local, E - external 
 #       MAC-ADDRESS        VID ON-INTERFACE                  BRIDGE                 AGE                 
 0   DL  18:FD:74:FA:22:34      bridge                        bridge                
 1   DL  18:FD:74:FA:22:35      wlan1                         bridge                
 2   DL  18:FD:74:FA:22:36      wlan2                         bridge                
 3   DL  18:FD:74:FA:22:37      mesh.1                        bridge                
 4   D   18:FD:74:FA:22:00    1 mesh.1                        bridge                 45s                 
 5   DL  18:FD:74:FA:22:34    1 bridge                        bridge                
 6   DL  18:FD:74:FA:22:35    1 wlan1                         bridge                
 7   DL  18:FD:74:FA:22:36    1 wlan2                         bridge                
 8   DL  18:FD:74:FA:22:37    1 mesh.1                        bridge                
 9   D   18:FD:74:FA:21:FD   10 mesh.1                        bridge                 2m16s               
10   DL  18:FD:74:FA:22:34   10 bridge                        bridge                
11   DL  18:FD:74:FA:22:37   10 mesh.1                        bridge                
12   D   DC:A6:32:4A:EF:3C   10 ether2                        bridge                 4m22s

“In a fashion” as this is unstable: it stays up for about 5 minutes, then “timeout”, then disappears for 5 minutes. Something is clearly having a value of 300s. It could be hwmp-prep-lifetime, ageing-time, or something else I have not found.

I found that on my Audience, if I do not set vlan-filtering=yes on the bridge, I cannot set the untagged vlan to anything else than 1. Maybe I did something wrong in my config.

Any idea welcome!

More tests …

Neither hwmp-prep-lifetime nor ageing-time changed a thing but I saw something that could be a bug.

Aging seems to work differently between ARP/switchport MAC table and Mesh-learned MAC.

For ARP and switch port MAC, an entry is purged after aging, that is that the association ARP/IP or switchport MAC has been inactive for a period of time. The Mesh-learned MAC, this seems to be from the first occurrence of the learning.

On R1 (192.168.100.1, 18:FD:74:FA:21:FD), I have a continuous ping to R2 (192.168.100.2, 18:FD:74:FA:22:34).

At first, everything is fine and on R2, bridge host and mesh fdb show that the entries were just learnt.

> /interface bridge host print
Flags: X - disabled, I - invalid, D - dynamic, L - local, E - external 
 #       MAC-ADDRESS        VID ON-INTERFACE                         BRIDGE                         AGE                 
 9   D   18:FD:74:FA:21:FD   10 mesh.1                               bridge                         0s                  
10   DL  18:FD:74:FA:22:34   10 bridge                               bridge

> /interface mesh fdb print detail 
Flags: A - active, R - root 
    mesh=inter-ap type=larval mac-address=18:FD:74:FA:21:FD lifetime=0s age=3s metric=0 seq-number=0 
A  mesh=inter-ap type=local mac-address=18:FD:74:FA:22:34 age=3s metric=0 seq-number=22

After a few moments, the MAC on the bridge has not aged however the entry in the mesh FDB has. Additionally, the local entry for 18:FD:74:FA:22:34 has also aged.

 > /interface bridge host print
Flags: X - disabled, I - invalid, D - dynamic, L - local, E - external 
 #       MAC-ADDRESS        VID ON-INTERFACE                         BRIDGE                         AGE                 
 9   D   18:FD:74:FA:21:FD   10 mesh.1                               bridge                         0s    
10   DL  18:FD:74:FA:22:34   10 bridge                               bridge 
 
 > /interface mesh fdb print detail 
Flags: A - active, R - root 
    mesh=inter-ap type=larval mac-address=18:FD:74:FA:21:FD lifetime=0s age=1m58s metric=0 seq-number=0 
 A  mesh=inter-ap type=local mac-address=18:FD:74:FA:22:34 age=1m58s metric=0 seq-number=22

When the ping on R1 times out, the entry for the local MAC on R2 shows as “unknown”, even though there were frames sent every second.

/interface mesh fdb print detail 
Flags: A - active, R - root 

 AR mesh=inter-ap type=neighbor mac-address=18:FD:74:FA:22:00 on-interface=wds4 age=8h55m37s metric=50 seq-number=2860 
    mesh=inter-ap type=unknown mac-address=18:FD:74:FA:22:34 lifetime=0s age=8s metric=0 seq-number=0

Showing the FDB on R1 and R2 shows that

R1:   inter-ap                  unknown  18:FD:74:FA:22:34                                  0s           5s 
R2:    inter-ap                     unknown  18:FD:74:FA:22:34                                     0s           7s

So the current questions:

  • How is the aging defined for the FDB entries?
  • Why does a local entry go into the “unknown” state?
  • Am I doing something wrong in my configuration?

Notes:

I tried setting one of the devices with mesh-portal but that does not change the behavior.

Pinging the tagged (vlan 10) or untagged (vlan 1) interfaces does not work BUT they have the same MAC address. This may indicate that the MAC information is passed on the mesh outside of the tagged content. Meaning: it could be impossible to have the same MAC on different VLANs and different devices, which would make the mesh useless to create a trunk between two devices. To be confirmed by a mesh expert. As far as I know, it is not possible to change the MAC address of an interface.

This is symmetric: pinging from R1 to R2 continuously and the MAC associated with the vlan.10 interface in R2 goes to the “unknown” state on the mesh. Pinging from R2 to R1 and that is the MAC from the vlan.10 interface on R1 that goes to the “unknown” state on the mesh.

Removing mesh.1 from the tagged interfaces on VLAN 10 does not change the behavior.

More tests.

The configuration of the WDS master interface has a field to select the default switch in which the WDS interfaces are put. Interestingly enough (is it an incorrect name?), there is no switch statement for a specific WDS interface, which prompts the question “why then calling this a default switch?”. Anyway …

Flags: X - disabled, R - running 
...
 2  R name="wlan3" mtu=1500 l2mtu=1600 mac-address=18:FD:74:FA:22:00 arp=enabled interface-type=QCA9984 mode=ap-bridge ssid="mesh-test" frequency=5500 band=5ghz-a/n/ac channel-width=20/40mhz-XX secondary-frequency="" scan-list=default 
      wireless-protocol=any vlan-mode=no-tag vlan-id=1 wds-mode=dynamic-mesh wds-default-bridge=bridge wds-ignore-ssid=no bridge-mode=enabled default-authentication=yes default-forwarding=yes default-ap-tx-limit=0 
      default-client-tx-limit=0 hide-ssid=no security-profile=mesh-security compression=no

The WDS port is automatically added to the switch and as untagged in VLAN 1, and has to be added as tagged to any other VLAN.

/interface bridge vlan set 1 tagged=bridge,vlan.10,wds3

Again, what about having a command such as /interface bridge vlan 1 tagged-add=…? The WDS interface then shows in VLAN 1 and VLAN 10.

 #   BRIDGE                                                     VLAN-IDS  CURRENT-TAGGED                                                   CURRENT-UNTAGGED                                                             
 0   bridge                                                     1                                                                          bridge                                                                       
                                                                                                                                           wlan2                                                                        
                                                                                                                                           wlan1                                                                        
                                                                                                                                           wds3                                                                         
 1   bridge                                                     10        bridge                                                                     
                                                                          vlan.10                                                                    
                                                                          wds3

The ping is rock solid and ran for more than 30 minutes without any drop. There does not seem to be any VLAN confusion: mac addresses seen on one VLAN are not seen on the others. This means that the issues I had previously were not due to the WDS interfaces and connections, but rather due to the HWMP+ that sits on top of it.

The above works fine as long as the interface does not change: this means that this limits the mode to static WDS. Not an issue if one has a few APs but can be a challenge with larger infrastructures: the number of connections in a full mesh between n nodes grows as n^2.

Not using HWMP+ means that STP will be used to guarantee a loop-free network, however this will be with the risk of having a suboptimal topology. Set your bridge priorities correctly!

Conclusions

  • Bridging VLAN over WDS works, but not using HWMP+, which exhibits a strange “5 minutes” issue.
  • Using dynamic WDS interfaces may result in changing names, which may result in disconnections. Use static WDS only.
  • Drawing your topology is important if you don’t want an absurdly high number of WDS connections to create and/or maintain.
  • … and to select your root and backup root bridges.

Lastly, I would like if anyone could look into this and see whether I made any mistakes during my test. I can post any config snippets as needed.