Trouble with vlan & routing

Update: FIXED. Entries in the config that were removed are now commented out in the main post.


Our end-goal is to be able to rate-limit all the individual ports on a switch, so that our customer’s tenants can better share bandwidth. I THOUGHT I had it working well, but ran into a major wrinkle once it was brought to the site.

The main Mikrotik M1 currently has the below configuration, it gets its internet on ether1 via pppoe (or for my testing via dhcp). Switch S is able to (transparently) tag all traffic on its ports; port 1 is the trunk, port 2 uses vlan “2”, port 3 uses vlan “3”, all the way up to port 24 using vlan “24”. I had a Windows Laptop WL plugged in to any given port on the switch, and its bandwidth was correctly limited to my specs. The devices connecting directly to the switch are unaware of vlan, the switch does all that for them.

The wrinkle came when a tenant plugged in a router instead of a device or switch. (Sure there’s some potential double-nat involved, but that doesn’t create any issues here.) When they did that, NONE of the devices could route to the internet. I got the devices M1 and S back and replicated the setup with a second Mikrotik M2 and the laptop WL.

Works: WL → M1 directly ; WL gets an ip in 192.168.88.x
Works: WL → M1 directly, but with WL picking vlan XX ; WL gets an ip in 192.168.1XX.x
Works: WL → M2 → M1 ; M2 gets an ip in 192.168.88.x, and WL gets an ip from M2 – note, “double nat but with working internet”
Works: WL → S → M1 ; WL gets an ip in 192.168.1XX.x
DOES NOT WORK: WL → M2 → S → M1

In that final not-working case, I see that M2’s dhcp query is being received by M1 and an ip is allocated from the pool, but M2 isn’t applying it. And so there’s no routing. The only difference I can see in the config is that the connection to M1 is via vlanXX instead of via the bridged ether2-5 ports.

There seems to be something fundamental that I’m missing, but I can’t see it. Help?


/interface vlan
add interface=bridge name=vlan2 vlan-id=2
add interface=bridge name=vlan3 vlan-id=3
...
add interface=bridge name=vlan24 vlan-id=24
/interface list
add comment=defconf name=WAN
add comment=defconf name=LAN
/ip pool
add name=dhcp ranges=192.168.88.10-192.168.88.254
add name=vlan2 ranges=192.168.102.10-192.168.102.254
add name=vlan3 ranges=192.168.103.10-192.168.103.254
...
add name=vlan24 ranges=192.168.124.10-192.168.124.254
/ip dhcp-server
add address-pool=dhcp disabled=no interface=bridge name=defconf
add address-pool=vlan2 disabled=no interface=vlan2 name=vlan2
add address-pool=vlan3 disabled=no interface=vlan3 name=vlan3
...
add address-pool=vlan24 disabled=no interface=vlan24 name=vlan24
/queue simple
add max-limit=4M/16M name=vlan2 target=vlan2
add max-limit=16M/64M name=vlan3 target=vlan3
...
add max-limit=64M/512M name=vlan24 target=vlan24
/interface bridge port
add bridge=bridge comment=defconf interface=ether2
add bridge=bridge comment=defconf interface=ether3
add bridge=bridge comment=defconf interface=ether4
add bridge=bridge comment=defconf interface=ether5
### add bridge=bridge comment=defconf interface=vlan2
### add bridge=bridge comment=defconf interface=vlan3
### ...
### add bridge=bridge comment=defconf interface=vlan24
/ip neighbor discovery-settings
set discover-interface-list=LAN
/interface list member
add comment=defconf interface=bridge list=LAN
add comment=defconf interface=ether1 list=WAN
add comment=defconf interface=vlan2 list=LAN
add comment=defconf interface=vlan3 list=LAN
...
add comment=defconf interface=vlan24 list=LAN
/ip address
add address=192.168.88.1/24 comment=defconf interface=ether2 network=192.168.88.0
add address=192.168.102.1/24 interface=vlan2 network=192.168.102.0
add address=192.168.103.1/24 interface=vlan3 network=192.168.103.0
...
add address=192.168.124.1/24 interface=vlan24 network=192.168.124.0
/ip dhcp-client
add comment=defconf disabled=no interface=ether1
/ip dhcp-server network
add address=192.168.88.0/24 comment=defconf gateway=192.168.88.1
add address=192.168.102.0/24 gateway=192.168.102.1
add address=192.168.103.0/24 gateway=192.168.103.1
...
add address=192.168.124.0/24 gateway=192.168.124.1
/ip dns
set allow-remote-requests=yes
/ip dns static
add address=192.168.88.1 comment=defconf name=router.lan
/ip firewall filter
add action=accept chain=input comment="defconf: accept established,related,untracked" connection-state=established,related,untracked
add action=accept chain=input comment="allow ssh,http,https" dst-port=22,80,443 in-interface=ether1 protocol=tcp
add action=accept chain=input comment="allow ssh,http,https" dst-port=22,80,443 in-interface=pppoe-out1 protocol=tcp
add action=drop chain=input comment="defconf: drop invalid" connection-state=invalid
add action=accept chain=input comment="defconf: accept ICMP" protocol=icmp
add action=accept chain=input comment="defconf: accept to local loopback (for CAPsMAN)" dst-address=127.0.0.1
add action=drop chain=input comment="defconf: drop all not coming from LAN" in-interface-list=!LAN
add action=accept chain=forward comment="defconf: accept in ipsec policy" ipsec-policy=in,ipsec
add action=accept chain=forward comment="defconf: accept out ipsec policy" ipsec-policy=out,ipsec
add action=fasttrack-connection chain=forward comment="defconf: fasttrack" connection-state=established,related disabled=yes
add action=accept chain=forward comment="defconf: accept established,related, untracked" connection-state=established,related,untracked
add action=drop chain=forward comment="defconf: drop invalid" connection-state=invalid
add action=drop chain=forward comment="defconf: drop all from WAN not DSTNATed" connection-nat-state=!dstnat connection-state=new in-interface-list=WAN
/ip firewall nat
add action=masquerade chain=srcnat comment="defconf: masquerade" ipsec-policy=out,none out-interface-list=WAN

I think you’ve misunderstood a bit in how the bridge and VLANs work.

Each /interface bridge row actually creates two distinct logical objects - the bridge itself (a software model of a switch chip) and a virtual L2 port connected to that bridge, see the diagrams below:

Hardware switch and router:
hw switch hw router


| | | |
| P0[==============]ether3 ether1[
| | | |
| P1[ | ether2[
| | |________________|
| P2[
| |
| P3[
|___________|
An approximate equivalent with a software bridge looks as follows:
bridge part router part


| | | |
| unnamed[==============]bridge ether1[
| | | |
| ether3[ | ether2[
| | |______|
| ether4[
| |
| ether5[
|
|

An /interface vlan is basically just a pipe, which tags frames passing in one direction and untags frames passing in the other one. The IP configuration is attached to its tagless end.

So what you have actually done is that

  • you’ve attached the tagged ends of all the /interface vlan to the virtual interface connected to the virtual bridge (which is correct), by specifying interface=bridge on the rows of the /interface vlan table
  • you’ve made the tagless ends of all the /interface vlan member ports of the virtual bridge (which is wrong as you’ve thus looped the pipes back to the bridge), by listing them in the /interface bridge port table

The IP configuration for anything connected to a bridge should always be attached to the virtual port sharing its name with the bridge. However, in reality it effectively works the same even if attached to any of the other member ports of the bridge, so your IP addresses for all the VLANs are effectively attached directly to the bridge.

I’ve got no precise idea what exactly doesn’t work - most likely the DHCP offer sent from the IP address attached to the tagless end of the VLAN interface takes a shortcut, bypassing the VLAN pipe, so it doesn’t get tagged and thus the external switch cannot deliver it to the access port through which the request came in, nor why this wasn’t an issue when the tenants connected their gear via switches (so the gear was getting IP addresses directly from you).

But first of all remove all the interface=vlanX bridge=bridge rows from the /interface bridge port table. Then test again, and if it still doesn’t work, we’ll have a look further.

I will be able to continue work on Monday. In the meantime, your explanation does NOT explain why the config does indeed work when it’s a non-router connecting to the switch’s ports, the dhcp query fully works in that case. The dhcp only fails when it’s a router (Mikrotik, D-Link, etc) connecting to the switch.

Sure it doesn’t, I’ve already written that myself. I’ve merely told you what is clearly wrong in your setup at first glance; whether it is the actual cause of the issue from the OP or not has to be tested. Even if it is not, it still needs to be fixed as that may bring some surprises in future.

But there is one more point to have a look at, which dawned on me as you’ve used the keyword “Windows”: in the OP, you explicitly state that the external switch tags the traffic on its ports, but you do not mention untagging. I’ve seen quite a few switches where you had to configure each direction separately; if this is one of them, the tagged frames received on ether1 may be egressing through ether2..ether24 still tagged. E.g. on some D-link switch I can remember, you had to specify whether a given port is a tagged or untagged member of a given VLAN, but this setting only affected untagging in the egress direction; to tag the ingress packets, you had to set the PVID on the port, and it only affected the ingress direction. So you could even end up with two different VLAN IDs being untagged on egress and tagged on ingress.

Now most Windows network card drivers strip the VLAN tags from the received frames before handing them over to upper layers of their networking stack - in another words, the Windows don’t care if the packet arrives in a tagged or tagless frame and only looks at the MAC and IP addresses. Networking equipment normally doesn’t do this, which could explain why the Windows accept the DHCPOFFER while the routers don’t if the switch indeed doesn’t untag it on egress.

Thank you so much for your help. It was definitely a MUCH better experience here than the “hurr hurr hurr learn how to internetz dude” one I had on Reddit. But I’ll at least be able to update my post there with the pair of fixes required to make it work. (Because SOME day somebody else will run into a similar issue, and google is forever lol)

But in short:

  1. All of the 23 “vlan” entries under “/interface bridge port” were removed; and
  2. The 802.1Q VLAN entries on the switch were, per port, changed from “Tagged” to “Untagged”

I would have at least gotten the switch’s setup working properly in the first place, if only Microsoft’s implementation were done to spec/standards, instead of that unexpected thing where it ignores the tags. But removing entries from the “interface bridge port” would never have occurred to me, as I was “only” copying somebody else’s almost-the-same-but-not-quite implementation, and I guess they didn’t do the same “update post” thing I always try to do.