I have added the bridge/bridge port to the vlan table, as you suggested. The dhcp server now works!
After adding the bridge cpu port as a tagged port, the vlan table looks like this:
[admin@MikroTik] /interface bridge vlan> print
Flags: X - disabled, D - dynamic
# BRIDGE VLAN-IDS CURRENT-TAGGED CURRENT-UNTAGGED
0 bridge 20 bridge vlan20
1 bridge 30 bridge vlan30
ether3
2 D bridge 1 bridge
There is a dynamic entry added for pvid=1, untagged port=bridge. If I'm not mistaken, that is because /interface bridge has pvid=1, and vlan-filtering=yes.
So I think I'm beginning to grasp this.
Now I know how to create vlans with bridge filtering, but I have doubts about what is happening exactly.
If you have time, please read these and correct me if I'm wrong.
1. **port** is a layer 2 concept. Anything that has a MAC address is an ethernet port. Phyisical interfaces: ether2, ether3 and ether4 are ports. The bridge itself is a port (it has a MAC address). The virtual interfaces vlan20 and vlan30 are also ports.
2. When I add a port to a bridge (/interface bridge port add) then the bridge's internal port is connected to the port of the interface. The term "port" is somewhat distorted in routeros. A real layer 2 ethernet port is called an interface (with a mac address) in RouterOs. What RouterOS calls the "port of a bridge" is in fact a connection between the bridge's internal port, and the interface's port.
3. Inside a single routeros instance, one interface can only be connected to one bridge at most. All virtual interfaces connected to the same bridge share the same MAC address with the bridge. (I guess it is more efficient that way?) It includes tunnel, vlan and wlan interfaces etc. Physical ethernet interfaces always have their own unique mac addresses, even when they are connected to the same bridge.
4. When vlan-filtering is disabled on the bridge, then /interface bridge vlan table is not used, and any packet can leave on any port. (Or maybe it is used for tagging/untagging packets? But not for filtering.)
5. When vlan-filtering is enabled on the bridge, then /interface bridge vlan table is used to control which packet can leave on which port, based on the vlan id of the packets. When we say that a package "can leave the bridge on interfaceX", we actually mean that the packet can be copied to the port of the interfaceX interface, making it an egress port for the packet. The source MAC address of the copied packet is modified to match the MAC address of the interfaceX interface in this process.
The bridge is always implemented in software. (In contrast, switch is usually implemented in hardware.) Bridge based vlan filtering is also implemented in software - it requires the packet to be copied from the switch chip to the CPU, where it is processed. (This processing includes not just the vlan based filtering, but also execution of IP firewall rules, when it is enabled for the bridge.) The copying of the packet into the CPU is represented by the packet "leaving on the bridge CPU port".
My misunderstanding came from an apparently wrong assumption about packet processing in bridges. I had this mental construct:
1. packet enters the bridge
2. CPU processes the packet
3. packet leaves the bridge
Now I believe this is happening instead:
1. the packet **enters** the bridge on an interface that is added as a port to the bridge
2. the packet **leaves** on the bridge CPU port
3. the CPU processes the packet
4. the packet **enters** on the bridge CPU port
5. the packet **leaves** the bridge on one or mode interfaces that are added as ports to the bridge
It seemed counter-intuitive to me first, because **the packet leaves before it enters**. But I guess it just depends on how you look at it: it enters the CPU when it leaves the bridge and vice versa. We just don't say that "it enters the CPU" when we are talking about bridging, because packets can only enter and leave on ports, and "port" is a layer 2 networking concept. So we are not looking at it from the CPU's point of view.
I hope I have the correct concepts and a good view about how it works, but I'm going to write down an example, just to check that.
1. dhcp server2 sends a packet (for example, DHCPOFFER)
2. the packet **enters** the port of the vlan20 interface. It ges assigned vlan-id=20, because vlan20 interface has pvid=20.
3. the packet **enters** the bridge at same time as a tagged packet, because vlan20 is added to the bridge, and the bridge's internal port is added to the bridge as a tagged port.
4. then the packet **leaves** the bridge on the CPU bridge port. This requires that the bridge CPU port is present as an untagged port in the bridge vlan table for vlan-ids=20
5. the CPU processes the packet. If ip firewall is turned on for the bridge, then firewall rules are excuted. They may change or block the packet. The vlan id of the packet is not changed.
6. the packet **enters** the bridge on the CPU bridge port again, as a tagged packet with vlan-id=20
7. if there is an entry for the destination MAC address of the packet in the bridge's host table, then it is used to determine the outgoing interface for the packet. Otherwise flooding occurs, and the bridge will use all possible ports to send out the packet. For any possible egress port, the vlan table is checked again. In any case, ports that are not listed in the vlan table for the vlan id (=20) of the packet are excluded.
8. the packet is copied to the selected port(s) and **leaves** the bridge again
Some things that I I'm not sure about:
* I'm not sure if the packet actually re-enters the bridge on the CPU port. If it does, then is it subject to ingress filtering? I suspect that packets coming from the CPU are not subject to ingress filtering on the bridge.
* I'm not sure if vlan-filtering happens in the CPU before it enters the bridge on the cpu port, or after that. Or maybe packages do not enter on the CPU port, but they are directly created in the bridge, by the CPU?
* I did not try yet, but I guess I need to enable ip firewall on the brige to separate vlans from each other? Or maybe, if the source and the destination IP addresses are on different subnets, then layer 3 routing happens? So If I choose different subnets for different vlans, then maybe I don't need to turn on ip filtering on the bridge?
And finally, I wonder what would be the point of adding the bridge's internal CPU port to the bridge as an untagged port?
E.g. instead of this:
/interface bridge vlan
add bridge=bridge tagged=ether4,bridge untagged=ether2,vlan20 vlan-ids=20
add bridge=bridge tagged=ether4,bridge untagged=ether3,vlan30 vlan-ids=30
I can also do this:
/interface bridge vlan
add bridge=bridge tagged=ether4 untagged=ether2,vlan20,bridge vlan-ids=20
add bridge=bridge tagged=ether4 untagged=ether3,vlan30,bridge vlan-ids=30
But it makes no point to me, because internally all packets are "tagged" (they always have a vlan-id), and the CPU always sees that. It must mean something (because it is allowed by RouterOS).