You’re missing concept of vlan interfaces and how services (L5 and above) interact with lower layers.
All services (e.g. web server, winbox server) work on top of IP packets (mostly on top of TCP or UDP). Now, ROS L3 layer works with untagged ethernet frames. If ethernet frames are tagged, then tags have to be manipulated before being passed on to L3. And VLAN tag manipulation is (the only) task of vlan interfaces.
So you do need appropriately configured vlan interfaces (and bridge interface has to be tagged member of coresponding VLANs), but they don’t need IP addresses set. SoHo default config has limit about ingress interfaces through which MAC winbox is possible, when working with VLANs vlan interfaces have to be allowed (bridge interface doesn’t matter … except if it has pvid set and then only for that particular VLAN, hence why it works for VID=1 as this is implicit default setting for all bridge ports including bridge CPU-facing port carrying the same name as bridge itself).