I have multiple tplink APs on a Mikrotik router. When my android client roams to a different AP, an existing stream hangs for 5-10 seconds before resuming (resulting in dead air). As soon as the client roams, it is successfully connected to the new AP, and I can continue to ping the client without interruption. But something is causing the existing connections to not get passed on in a timely manner.
This happens (with one exception) to IPv6 connections. IPv4 connections are normally passed on seamlessly.
The one exception is a particular AP, when switching from 5ghz to 2.4ghz, the existing connections hang permanently, and never resume. It doesn't matter if it's IPv4 or IPv6.
It's a rb750gr3 router, currently at v7.21.1. I have the default Mikrotik firewalls installed. I'm not doing anything fancy. Any idea what might be happening, or if there's some settings on the Mikrotik that could affect this behavior?
More likely to be on the AP than on the router, assuming you have all APs connected to the same switch/bridge.
You may also want to check if the client gets a new IPv6 address when it roams. That could indicate it disconnected/re-connected rather than do “roaming”. Check if “fast roaming” is enabled on the APs.
The APs are on the same switch. I tried connecting them directly to the Mikrotik as well, but it made no difference.
I checked, when roaming, and the IPv6 address remains the same. There’s no fast roaming, but it is connecting to all the APs quickly. Even though the existing IPv6 stream hangs, I can continue to ping the phone without interruption.
The IPv6 stream is to an IPv6 wireguard endpoint. If I toggle the wireguard off/on, it will reconnect right away.
You are pinging the phone on its IPv6 address and that works all the time?
What I notice is that with SLAAC it can take some time before the device realizes it can use IPv6. That could re-trigger when the device gets disconnected and re-connected. IPv4 is not affected by that because it uses DHCP which immediately answers on a RENEW.
When station disconnects/reconnects, then any data, buffered in old AP, gets lost. To make things worse, this process is fone entirely by station and until station successfully reconnects to the new AP, any "data on the fly" will still hit the previous AP and its buffers.
That causes major data loss and all kinds of weird stuff happens then ... what in particular depends on particular connection properties (e.g. TCP vs UDP) and application layer. And recovery can be lenghty - again depending on exact mechanizm which is in charge of recovery.
But that should be similar for both IPv4 and IPv6. Since APs are, in principle, L2 devices (bridge between ethernet and WiFi), they work with ethernet (MAC) addresses so there shouldn't be any difference in handling ethernet frames with different payload types. And since all your APs are connected to same L2 domain, router (the entity which actually cares about IPv4 vs IPv6) should not care if station moves from one AP to another.
When station actually roams with FT bells and whistles, then both involved APs can move buffered data to the new serving AP so packet loss can be avoided. And actual move of wifi link of station between both APs is much faster, so there's less "on the fly data" which hits the previous AP which all by itself makes recovery process less demanding (and thus faster).
So based on the written above, I suspect that your tplink APs don't handle buffering well. And that you'd see similar pauses also when using IPv4 connection with constant data flow of considerable throughput.
My experience is that when station moves from one AP to another where SSID (and security properties) are the same, then station asdumes it's still in the same L2 broadcast domain and doesn't perform "connect protocols" such as DHCP handshake and similar.
I was pinging the IPv4 address, and it works all the time. So I know it has successfully roamed to the new AP.
I tried pinging the IPv6 address, and it does indeed drop out. So even though the IPv6 remains the same, you may be correct about SLAAC causing the interruption. So it’s not just existing connections, it’s IPv6 altogether that drops out for 5-10 seconds on the android client.
Yeah, that sounds familiar. When I try to open the network info quickly after connecting I can see a state where it does have IPv4 and not IPv6. That comes on later.
Maybe see if the “RA Delay” setting in IPv6→ND is related to it. Probably not, its default is 3 seconds.
Still, when it is true what @mkx wrote there should be nothing like this when just roaming to another AP with the same SSID. That is what you have, right? Not different SSIDs that are all known to the client?
For what it’s worth, I think it was an issue with the APs. I replaced them all with a mesh system running in AP mode, and I can roam without losing the IPv6 now.
Normally, I connect everything to some unmanaged switches that are connected to the Mikrotik router. I found with the new mesh APs connected to the unmanaged switch, I got some “bridge RX looped packet” errors and “mac connection syn timeout” errors.
These seem to have gone away once I plugged the mesh APs directly into the Mikrotik.