[SOLVED] Wi-Fi Broadcast ARP/UDP unexpectedly throttled/blocked

Are there any default settings in routeros, perhaps Capsman, which would lead to Layer-2 broadcast packets (ARP, UDP broadcast) etc being throttled over Wi-Fi?

I’m seeing precisely that - Arp requests to clients on my Routerboard Wi-fi network are not being delivered to the Wi-Fi client except every 10-30 seconds (90% broadcast packet loss). The same for UDP broadcast packets. Arp isn’t so much of a problem (it’s an annoyance because it leads to unnecessary client-to-client latency), but the UDP broadcast packets being blocked is a significant nuisance as it causes the kids favorite game - Minecraft - from being able to discover other users on the same Wi-Fi. It uses UDP broadcast messages to discover other gamers on the same Wi-Fi / L2 network.

When connected to other Wi-Fi access points (TP-Link 802.11AC, or 802.11b Raspberry Pi Hostapd based wi-fi) there is no problem with broadcast packets being filtered or blocked in any way. Only when the devices are on the RouterOS based Wi-Fi network do I see this behavior.

IP firewall on my routerOS devices is blank - only a few ACCEPT rules, and no drop rules. No rules in the filter table.

ARP requests to the Ethernet based managenet IP of the routerboard flow uninhibited. The management IP is on a software based VLAN, just like the Wi-Fi.

The path of arping packets to the wi-fi clients looks like this:

Wi-Fi client -> SSID:MC -> [capsman tunnel] -> BR_VLAN_44 (bridge) -> VLAN_44 -> ETH1 (Trunk/tagged uplink) -> minecraft1 server eth0 -> bridge br0 -> arping command.

in the capsman datapath, client-to-client forwarding is enabled, arp is enabled, local-forwarding is off (CAP traffic is tunneled to the capsman routerboard).


I’m looking for help thinking of what mechanisms RouterOS has in place to throttle/block L2 broadcast packets (destination MAC ff:ff:ff:ff:ff:ff).

Thanks!

2020-05-25 UPDATE: Added [SOLVED] - I tracked this back to what looks like a bug in openvpn. When I bridge with the server-side openvpn, it loses ARP packets quite consistently. However when I bridge with client-side openvpn it works perfectly. I have since shifted my topology so that I can bridge with a client of the openvpn tunnel and it worked like a charm. It wasn’t until I traced the packet loss to the tunnel (ARP packets being sent into client side of tunnel, not showing up on the server-side of the tunnel, but showing up on all the other client interfaces) that I was able to apply this workaround.

Similar to:

I’m focusing right now on the bridge, ports in particular, and this mysterious “learn” option, as well as the other flood options…

interesting… “Traffic Storm Control” and a “storm-rate” parameter… (url)

Tested - Under Switch → port → ether1 (or any other port) I see “limit broadcasts” CHECKED/ENABLED (default). Unchecked it then re-tested. No change: ARP: 27/68 received, gameplay: fail - (users can’t see eachother).

Bridge port shows “auto” for MAC Learning parameter. Maybe this needs to be set to “yes”, so set it, capsman remote CAP interface bridges show “auto” but grayed out. Unclear how to update them.

Tested - multicast-helper=full on Wi-Fi interface - Changing this from default to “full” results in 100% ARP response rate (improvement) yet in terms of gameplay, users still can’t see eachother (UDP broadcast packets still getting dropped?)

Tested - Torch on VLAN - shows all gameplay broadcasts. However Torch on the Wi-Fi interface showed one packet, and not again for at least 30 seconds. The gameplay search packets in question are broadcast four times per second.

When users are on the same Wi-Fi hosted by the Capsman network there are no issues. The issues present themselves when a user on my Microtik Wi-Fi network are hosting a game, and other users from across the other side of the VLAN try to join. The VLAN is bridged between Wi-Fi and Ethernet. Via Ethernet packets are all present as expected, arp/broadcast/etc. Broadcast packets seem to be getting lost either going out the Wi-Fi interface or coming back. Likely the former because return packets would be L2 unicast whereas the outbound packet would have a broadcast destination MAC (ff:ff:ff:ff:ff:ff). Could be bridge or Wi-Fi settings, or combination of both. Will try to conjure up some experiments tomorrow to perform on new mikrotik with default configs to try and narrow this down a bit further.

“IP firewall on my routerOS devices is blank - only a few ACCEPT rules, and no drop rules. No rules in the filter table.”

IP firewall normally is only for routed connections. Where in this path do you you use routing (= 2 interfaces not on the same bridge) and where do you us bridging?

Reason for the question:
Routers do not pass arp, broadcasts or multicasts (unless special settings for multicast)
Bridges will normally flood arp, broadcast and multicast, unless special settings (like IGMP). The flooding for unicast is for all unknown MAC addresses.

VLAN-44 → eth1 ??? , minecraft-1 server eth0 ???

Broadcast packets on WiFi are transmitted without handling of link errors. When you do a normal unicast packet exchange (e.g. a download), the receiving end of the WiFi link sends back “I have received that” when it receives some data, and the sending end watches that and re-sends the data when it does not receive such an acknowledgement.
With broadcast (and multicast) data, this is not happening. So, when a lot of data is lost on the link (e.g. because the signal is weak or there is a lot of interference) you will lose a lot of the broadcast traffic while normal traffic may still appear to be fine.
When you look in Wireless->Registration there is a TX/RX CCQ % column you can add (or you can open each entry and find that info on the Signal tab) which gives an average % of the packets that go through on the first attempt in each direction. That is for unicast traffic but it should be similar for broadcast.

This problem is inherent to WiFi. There is a solution for multicast (“convert multicast to unicast”) but not for broadcast.
Programs should not use broadcast to send bulk data, it is only intended for infrequent information transmissions like “I am here, my name is…” or “I need your MAC address (ARP)”.
Bulk game data should be sent as multicast or unicast.

The only thing that I would add to what pe1chl already said is that broadcast traffic in wireless networks is always sent using the basic data rate (i.e. the slowest allowed data rate for the given network), so sending a lot of broadcast traffic will significantly degrade the performance of the whole network.

bpwl,

Granted, routers do not pass arp, but you gave me another idea - maybe the lost UDP broadcast packets are getting mis-directed to another interface. Below is the path in question, where UDP broadcasts are observable all the way from right to left until the last Wi-Fi link.

MC Wi-FI → BR_VLAN44 → VLAN_44 → ETH1 Trunk on routerboard (tagged) → backbone switch → Core router with an L2 VPN server → L2 VPN client (minecraft1)

I can log in and check for broadcasts every point along the way and I see them get all the way from minecraft1, to VLAN44, to the bridge on the mikrotik, but then they are gone when I torch the Wi-Fi link.

As I describe this I’m realizing I need to work on re-producing this issue on a stand-alone routerboard, for simplicity sake. I’ll add this to my TODO list.


pe1chl,

multicast-helper=full, despite its name, does seem to also help with broadcast. This is a misnomer, but the following forum post gave me the idea, and I was able to confirm that it did improve arp broadcast packet response rate (not exactly my problem, but very similar). I can’t explain why it helped arp broadcast response rate (100%) but not UDP broadcasts. both use L2 destination ff:ff:ff:ff:ff:ff:

“I’ve heard a rumour that in order to make broadcasts working reliably over wlan, one should set multicast-helper=full on AP’s wireless interface (works for broadcasts as well despites the name of option).”

The UDP broadcast that’s failing to reach the Wi-Fi client is only broadcast for brief periods, as players are about to join, in order to build the list of nearby players on the network. I would be willing to sacrifice battery consumption during these brief periods (multicast-helper=full) for the game functioning correctly, however even with multicast-helper=full the problem persists. This is very confusing to me.


andriys,

Great info, I didn’t know that broadcast packets are sent at basic-rate. I know that players only view the nearby-players list (the list that’s built using UDP broadcast) as they are about to enter the game, so the cost of bogging down the Wi-Fi speed during that time is low, but certainly worth noting, thank you.



I’m going to focus my next effort on reproducing the issue on a stand-alone routerboard, to help eliminate a lot of the variables in my current topology. It is admittedly quite complex, although I have done my best to rule out most of the complexities, it still makes it difficult to describe the topology and get support for this sort of thing :slight_smile:

So one can speed up the broadcast/multicast by raising (with caution!) the minimum basic rate. 6Mbps could become 12 Mbps if all can follow.

Also the multicast/broadcast seems to be buffered.
"Another problem with multicast/broadcast traffic is that the AP needs to buffer this traffic if any associated clients are in powersave mode. The AP transmits the buffered data on a periodic interval known as the DTIM, which leads to bursts of traffic on the network. The AP also prioritizes buffered traffic over all other traffic, so it essentially blocks everything else while it’s being sent. "
(from: https://wyebot.com/2019/02/13/multicast-broadcast-traffic-worry-part1/, not 100% correct info in this blog, it is the lowest basic rate, not the legacy rates)

Reduction of the amount of broadcast/multicast ? ( no forwarding , bridge horizon, …?)

This is what is in the wiki: https://wiki.mikrotik.com/wiki/Manual:Multicast_detailed_example#Multicast_and_Wireless

Ok but I think the MikroTik “multicast helper” can only be used on point-to-point links, right? (and not on access points where a bunch of clients are connected)
Or does it re-send every broadcast/multicast packet to every connected client?

I thought that the “convert multicast to unicast” thing that some other manufacturers do will only handle multicast in conjunction with the IGMP snooping that they do, to know which clients are interested in which multicast traffic (so they can copy and send it to each client). That would not work for broadcast, or again it would have to copy it to all connected clients.

Yes, it does.


As far as I know, Mikrotik implemented IGMP snooping on bridges only, unfortunately. Seeing IGMP snooping helping multicast helper to be more efficient would have been superb, of course.

OK Here’s the latest,

I am pretty sure multicast/broadcast IS actually getting through Wi-Fi and to the client. I see the Wi-Fi client (one of several Android tablets) must be getting the broadcasts, because it is trying to ARP for the client who sent the broadcast (so it can apresumably respond to the broadcast) however the ARP responses are not being received (a conclusion I’m drawing because the device continuously sends arp who-has requests, something that it wouldn’t do unless it does not yet have an ARP entry for the device). If this ARP were successful then no doubt the unicast response would reach the requestor and the player would show up in the list.

So the question is, why is the Wi-Fi client (“Player B”) not seeing the ARP responses from clients (eg. Player “A”) on the local MC Wi-Fi?

Below is a brief capture of broadcast traffic and port 19132 (Minecraft Raknet protocol) showing a client, “A” (A.A.A.A/AA:AA:AA:AA:AA:AA) looking for other players, remote player “C” responds and is visible in the players list, they are on a remote Wi-Fi network a few hundred miles away, on the other side of VLAN44. While player “B” who is local, never shows up in the player list. From the capture we can see that player B is struggling to resolve an ARP entry for “A”, while from the perspective of A’s Wi-Fi network (where the capture was taken) A is indeed responding to each ARP request.

Device “D” is the Raspberry Pi where the capture was taken, also broadcasting the Wi-Fi that player “A” is on (And “C” via L2 VPN), bridged to the Routerboard Wi-Fi that “B” is on. “D” runs a script that I wrote, which polls for active games every few seconds. Player “B” is not only invisible to player “A” but also to device “D”, with similar ARP behavior for both - Player “B” never seems to see the ARP “is-at” responses from “D” or the devices connected to its Wi-Fi (which is bridged to VLAN44).

13:01:07.708753 AA:AA:AA:AA:AA:AA > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 75: A.A.A.A.43366 > 192.168.251.255.19132: UDP, length 33
13:01:07.724373 BB:BB:BB:BB:BB:BB > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 56: Request who-has A.A.A.A tell B.B.B.B, length 42
13:01:07.755211 AA:AA:AA:AA:AA:AA > BB:BB:BB:BB:BB:BB, ethertype ARP (0x0806), length 42: Reply A.A.A.A is-at AA:AA:AA:AA:AA:AA, length 28
13:01:07.756067 AA:AA:AA:AA:AA:AA > BB:BB:BB:BB:BB:BB, ethertype ARP (0x0806), length 42: Reply A.A.A.A is-at AA:AA:AA:AA:AA:AA, length 28
13:01:08.253083 CC:CC:CC:CC:CC:CC > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 75: C.C.C.C.50665 > 192.168.251.255.19132: UDP, length 33
13:01:08.551898 BB:BB:BB:BB:BB:BB > 01:00:5e:00:00:fb, ethertype IPv4 (0x0800), length 119: B.B.B.B.5353 > 224.0.0.251.5353: 19 [3q] PTR (QM)? _674A0243._sub._googlecast._tcp.local. PTR (QM)? _8E6C866D._sub._googlecast._tcp.local. PTR (QM)? _googlecast._tcp.local. (77)
13:01:08.710938 AA:AA:AA:AA:AA:AA > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 75: A.A.A.A.43366 > 192.168.251.255.19132: UDP, length 33
13:01:08.720319 BB:BB:BB:BB:BB:BB > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 56: Request who-has A.A.A.A tell B.B.B.B, length 42
13:01:08.789349 AA:AA:AA:AA:AA:AA > BB:BB:BB:BB:BB:BB, ethertype ARP (0x0806), length 42: Reply A.A.A.A is-at AA:AA:AA:AA:AA:AA, length 28
13:01:08.790763 AA:AA:AA:AA:AA:AA > BB:BB:BB:BB:BB:BB, ethertype ARP (0x0806), length 42: Reply A.A.A.A is-at AA:AA:AA:AA:AA:AA, length 28
13:01:09.135696 DD:DD:DD:DD:DD:DD > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 75: D.D.D.D.45200 > 192.168.251.255.19132: UDP, length 33
13:01:09.139372 BB:BB:BB:BB:BB:BB > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 56: Request who-has D.D.D.D tell B.B.B.B, length 42
13:01:09.139446 DD:DD:DD:DD:DD:DD > BB:BB:BB:BB:BB:BB, ethertype ARP (0x0806), length 42: Reply D.D.D.D is-at DD:DD:DD:DD:DD:DD, length 28
13:01:09.140330 DD:DD:DD:DD:DD:DD > BB:BB:BB:BB:BB:BB, ethertype ARP (0x0806), length 42: Reply D.D.D.D is-at DD:DD:DD:DD:DD:DD, length 28
13:01:09.254783 CC:CC:CC:CC:CC:CC > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 75: C.C.C.C.50665 > 192.168.251.255.19132: UDP, length 33
13:01:09.256926 BB:BB:BB:BB:BB:BB > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 56: Request who-has C.C.C.C tell B.B.B.B, length 42
13:01:09.317692 CC:CC:CC:CC:CC:CC > BB:BB:BB:BB:BB:BB, ethertype ARP (0x0806), length 42: Reply C.C.C.C is-at CC:CC:CC:CC:CC:CC, length 28
13:01:09.544996 BB:BB:BB:BB:BB:BB > 01:00:5e:00:00:fb, ethertype IPv4 (0x0800), length 119: B.B.B.B.5353 > 224.0.0.251.5353: 20 [3q] PTR (QM)? _674A0243._sub._googlecast._tcp.local. PTR (QM)? _8E6C866D._sub._googlecast._tcp.local. PTR (QM)? _googlecast._tcp.local. (77)
13:01:09.712435 AA:AA:AA:AA:AA:AA > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 75: A.A.A.A.43366 > 192.168.251.255.19132: UDP, length 33
13:01:09.723459 BB:BB:BB:BB:BB:BB > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 56: Request who-has A.A.A.A tell B.B.B.B, length 42
13:01:09.798049 AA:AA:AA:AA:AA:AA > BB:BB:BB:BB:BB:BB, ethertype ARP (0x0806), length 42: Reply A.A.A.A is-at AA:AA:AA:AA:AA:AA, length 28
13:01:09.799014 AA:AA:AA:AA:AA:AA > BB:BB:BB:BB:BB:BB, ethertype ARP (0x0806), length 42: Reply A.A.A.A is-at AA:AA:AA:AA:AA:AA, length 28
13:01:10.138788 BB:BB:BB:BB:BB:BB > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 56: Request who-has D.D.D.D tell B.B.B.B, length 42
13:01:10.138865 DD:DD:DD:DD:DD:DD > BB:BB:BB:BB:BB:BB, ethertype ARP (0x0806), length 42: Reply D.D.D.D is-at DD:DD:DD:DD:DD:DD, length 28
13:01:10.139851 DD:DD:DD:DD:DD:DD > BB:BB:BB:BB:BB:BB, ethertype ARP (0x0806), length 42: Reply D.D.D.D is-at DD:DD:DD:DD:DD:DD, length 28
13:01:10.254028 BB:BB:BB:BB:BB:BB > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 56: Request who-has C.C.C.C tell B.B.B.B, length 42
13:01:10.260419 CC:CC:CC:CC:CC:CC > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 75: C.C.C.C.50665 > 192.168.251.255.19132: UDP, length 33
13:01:10.313856 CC:CC:CC:CC:CC:CC > BB:BB:BB:BB:BB:BB, ethertype ARP (0x0806), length 42: Reply C.C.C.C is-at CC:CC:CC:CC:CC:CC, length 28
13:01:10.547314 BB:BB:BB:BB:BB:BB > 01:00:5e:00:00:fb, ethertype IPv4 (0x0800), length 119: B.B.B.B.5353 > 224.0.0.251.5353: 21 [3q] PTR (QM)? _674A0243._sub._googlecast._tcp.local. PTR (QM)? _8E6C866D._sub._googlecast._tcp.local. PTR (QM)? _googlecast._tcp.local. (77)
13:01:10.722082 BB:BB:BB:BB:BB:BB > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 56: Request who-has A.A.A.A tell B.B.B.B, length 42
13:01:10.751619 AA:AA:AA:AA:AA:AA > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 75: A.A.A.A.43366 > 192.168.251.255.19132: UDP, length 33
13:01:10.869897 AA:AA:AA:AA:AA:AA > BB:BB:BB:BB:BB:BB, ethertype ARP (0x0806), length 42: Reply A.A.A.A is-at AA:AA:AA:AA:AA:AA, length 28
13:01:10.871567 AA:AA:AA:AA:AA:AA > BB:BB:BB:BB:BB:BB, ethertype ARP (0x0806), length 42: Reply A.A.A.A is-at AA:AA:AA:AA:AA:AA, length 28
13:01:11.138937 BB:BB:BB:BB:BB:BB > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 56: Request who-has D.D.D.D tell B.B.B.B, length 42
13:01:11.139014 DD:DD:DD:DD:DD:DD > BB:BB:BB:BB:BB:BB, ethertype ARP (0x0806), length 42: Reply D.D.D.D is-at DD:DD:DD:DD:DD:DD, length 28
13:01:11.140028 DD:DD:DD:DD:DD:DD > BB:BB:BB:BB:BB:BB, ethertype ARP (0x0806), length 42: Reply D.D.D.D is-at DD:DD:DD:DD:DD:DD, length 28
13:01:11.253988 BB:BB:BB:BB:BB:BB > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 56: Request who-has C.C.C.C tell B.B.B.B, length 42
13:01:11.266981 CC:CC:CC:CC:CC:CC > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 75: C.C.C.C.50665 > 192.168.251.255.19132: UDP, length 33

Did you happen to disable the default-forwarding property on your wireless interface? Or forwarding property for a particular client via access list? Just guessing…

I see default-forwarding=yes for Wi-Fi devices. Below is the bridge config for VLAN44 on the Capsman server

/interface bridge
add ageing-time=5m arp=enabled arp-timeout=auto auto-mac=yes dhcp-snooping=no disabled=no fast-forward=no forward-delay=15s igmp-snooping=yes igmp-version=2 last-member-interval=1s last-member-query-count=2 max-message-age=20s membership-interval=4m20s mtu=auto multicast-querier=no \
    multicast-router=temporary-query name=BR_VLAN44_MC priority=0x8000 protocol-mode=rstp querier-interval=4m15s query-interval=2m5s query-response-interval=10s startup-query-count=2 startup-query-interval=31s250ms transmit-hold-count=6 vlan-filtering=no

and a few other sections with “forward” in the config

/caps-man datapath
add arp=enabled bridge=BR_VLAN44_MC client-to-client-forwarding=yes local-forwarding=no name=datapath-BR_VLAN44_MC

queue snippet

/queue interface
set VLAN44-MINECRAFT queue=no-queue

I’ve tested three Android tablets and a Galaxy S10 phone. I’ve tested all four of them on MC and half work when running a Minecraft server and two don’t. When they work, players on the bridged Wi-Fi (Minecraft) can see them and when it doesn’t work, other players don’t see them. Players on the same Wi-Fi can always see each other. The three tablets are on the same software, OS, patch level. The S10 is also up to date on all software, OS and patches. The fact that half work and half don’t baffles me, yet I’m able to drill down to the packet level and see the problem is with ARP responses not being received either by the application or device/OS. I’ll keep digging but I wanted to add this baffling observation.

Can you elaborate on this “same Wi-Fi” thing please? Do you mean associated with the same CAP in your CAPsMAN?

There are two Wi-Fi networks here: MINECRAFT and MC. MINECRAFT is hosted by the Pi, and bridged to VLAN44 via a L2VPN. MC is new, broadcast by routerboard and two CAPs, using Capsman and local-forwarding=no so it’s tunneled through the routerboard running Capsman. Both Wi-Fi networks are bridged to VLAN44.

I just connected a non-Mikrotik Wi-Fi AP in bridge mode, to a switchport that I provisioned to VLAN44 and I am seeing the same behavior. I’m going to focus my search on the VLAN itself, and the devices, and perhaps based on what I find I’ll be able to make a tweak in the mikrotik network settings to work around it, but I do not suspect my Mikrotik Wi-Fi/routerboard equipment is at fault in any way. The hints provided in this thread helped me narrow my search a bit further. I need to find out why certain devices continuously search via ARP for their neighbor, whether it’s a Wi-Fi client problem or something in my bridge/VLAN settings I’ll be working to narrow it down and will look forward to posting back to this thread with whatever I find.