Wifi-qcom-ac issues detailed description.

Hello,

I have an Mikrotik infrastructure with a main router RB1100AHx4 (RouterOS 6.49.13), an hEX secondary router (RouterOS 7.17rc3).
The main router acts as a router, firewall and a CapsMan Controller.
The secondary router acts as a CapsMan2 Controller, for tests and migration from Router 6 to RouterOS 7 and wireless to wifi-qcom-ac.

Downstream I have about 10 switches CRS354-48G-4S+2Q+RM with MSTP configured, 14 cAP ac and two hAPax^2.
The entire network is separated with VLANS and each customer has it’s own VLAN and SSID.
All devices are spread on multiple floors, some adjacent, two of them are not.

When using the wireless driver and Capsman, I made sure that the OFDM rates were configured like below:

/caps-man rates
add basic=24Mbps name=OFDM supported=24Mbps,36Mbps,48Mbps,54Mbps

This made roaming near perfect by using the signal quality as a promoter/deterrent of sticky clients or clients that would connect to a far away AP.
This also forces clients whenever a radar is detected on a DFS channel to roam to a nearby 2.4GHz or 5GHz radio with minimal disruption, even though there is no Fast Roaming or 802.11r/k/v.

On another floor where I have a similar solution, but a separated setup, that I migrated to wifi-qcom-ac and CapsMan2 with the following features configured:

WNM&RRM enabled (also the defaults)
FT and ft-over-ds, unique ft-mobility-domain per SSID and a ft-r0-key-lifetime of 12 hours (Windows 10 default for WPA2-Enterprise)
Neighbor groups for each SSID
WPA2-Enterprise with CCMP encryption and group encryption (default settings).

Tx Power limit at 20dBm (including antenna gain) for 2.4GHz and 23dBm (including antenna-gain) for 5GHz. I tried lower Tx power but maximum speed could not be reached and so Wi-Fi cell capacity would be negatively impacted. I couldn’t convince the building owners to double the number of APs per floor, this way I could of used a much lower Tx power per radio.

I didn’t use WPA3 because I wanted to roll back with minimum customer disruption.

There are two Wi-Fi networks, one is using WPA2-Enterprise, the other WPA2-Personal configured in the same way except the authentication.

Everything is working perfectly, both Android and Windows 10 devices roam from one AP to another on either the WPA2-Enterprise or WPA2-Personal network.

Seeing that everything is OK, I migrated the rest of the building APs from wireless to wifi-qcom-ac and enrolled them into CapsMan2 with a very similar setup like the previous described, the major difference being is that the building uses only WPA2-Personal (at least for now).

Also, each floor has it’s neighbor group for every SSID so that the clients would have AP neighbors recommendations within the floor and not adjacent floors.

Soon I started having issues similar with Capsman, before limiting the caps-man rates. Some Windows 10 devices would completely avoid close APs and connect to APs with a very low signal, like -88dBm - -90dBm, of course at this signal the connection is unstable and after 20-30s the client would roam to another AP, also far away or low signal (like an AP from the upper or lower floor), eventually remaining connected, even though the connection would be slow. After a while it would roam to a closer AP where it would stay for about 30-40 minutes (sometimes for hours) and then
start the whole roaming all over again, distrupting the user’s work.

It is possible that WNM has the disassociation imminent bit (but I couldn’t verify), triggered by high air time use of the 2.4GHz or 5GHz radio, or some other reason.

The same problematic clients would sometime connect to 2.4Ghz and then disconnect, only to connect to 2.4GHz again, manually removing the connection from the controller would make them switch to 5GHz immediately but not always.

Not all clients would behave this way, some of them would remain connected to the same AP that some other clients would actively avoid it for no apparent reason. I’ve seen that mostly Intel AX adapters (2xx series) seem to have this issue but only on WPA2-Personal.

It is as if the neighbor groups are completely ignored and some clients make bogus roaming decisions that make no objective sense. Usually the problematic clients have at least another close AP on both 2.4GHz and 5GHz radios so there are better roaming candidates.

That’s way is PARAMOUNT that higher (at least 24Mbps) minimum bitrates must be used and configurable in the wifi-qcom-ac driver, the DSSS and low OFDM modulations must be manually disabled, since they can pass thru walls and even between floors with enough integrity that they are decoded and used to connect to sub-optimal APs.

I really like the benefits added by wifi-qcom-ac but without more fine grained settings it’s hard to use it a multi AP setup without WPA2-Enterprise or WPA3-Enterprise.

I understand, roaming is a client decision but signal quality can be used to force a client to connect to the closest AP regardless of 802.11 roaming extensions.

I’m willing to collaborate with Mikrotik in order so solve this issues, until then I had to rollback to the wireless and CapsMan, there were just too many bad roaming decisions that even access lists could not solve, actually made them worse in some situations.

Before I even read all of this, there is no Wifiwave2 in 7.17rc

Hello Normis,

I know I upgraded the APs from 7.16.2 to 7.17rc3 (hoping that it will improve my situation), I’ve just checked the extra packages for ARM architecture.

RouterOS 7.17rc3 has both wireless and wifi-qcom-ac drivers.

I’m sorry if I missed something.

Wifiwave2 is named WiFi now, I’ll correct my post accordingly.

I have no idea how clients decide which AP to connect to. I assume they evaluate beacon frames before actually connecting to an AP. The parameters they use to decide where to connect are probably different from device to device. But it is odd that some of your clients prefer AP with very weak signal. But maybe e.g. reported airtime utilization of some AP is very high - even with better signal - and client avoids such APs? But maybe it is all due to FT? Maybe it helps to disable FT and rely on 802.11v/k solely.

I have no idea either, all I know is that setting 24Mbps as minimum rate makes far away APs invisibile and the same with upper floor or lower floor APs.
All APs are ceiling mounted.

I tried disabling FT, neighbor groups, rrm and wnm, the issue remains.
Windows 10 does not support FT for WPA2-Personal according to Microsoft
https://learn.microsoft.com/en-us/windows-hardware/drivers/network/fast-roaming-with-802-11k--802-11v--and-802-11r
Quote:“…Windows 10 supports Fast BSS Transitions over networks using 802.1X as the authentication method. Pre-Shared Key (PSK) and Open Networks are currently not supported…”

As far as I know beacons are sent at the lowest rate, and the lowest rates have much less strict SNR and would “trick” the clients that the connection is better than it actually is. The receive sensitivity for the lowest rate is much higher than for higher rates.

I also have daily radar events, that force some APs to stop transmitting on that channel, but since the available APs are limited when using the wireless package, clients have no other choice but the closest APs and not far away ones.

PS: On all implementations, wifi-qcom-ac or wireless, Android devices behave as they should, fast roaming works, and the closest AP is always selected and there are no random disconnects.

yes, looking at wireless specifications. Lower rates operate at higher tx power.

“1MBit/s 26 -100”
https://mikrotik.com/product/cap_ac#:~:text=1MBit/s,-100

Yes, kind of, each rate has a max Tx power in order to align with the transmit spectral mask. If you go over, you will go off spec and quite possibly damage the card.

But regardless of the max Tx power for that rate, the total power (tx power+antenna gain) must not go beyond the regulatory domain. Depending on the channel the value is variable, here is an output from the wireless package:

/interface/wireless/info/country-info romania
ranges: 2402-2482/b,g,gn20,gn40(20dBm)
2417-2457/g-turbo(20dBm)
5170-5250/a,an20,an40,ac20,ac40,ac80,ac160,ac80+80(23dBm)/passive,indoor
5170-5330/a,an20,an40,ac20,ac40,ac80,ac160,ac80+80(20dBm)/dfs,passive,indoor
5250-5330/a,an20,an40,ac20,ac40,ac80,ac160,ac80+80(20dBm)/dfs,passive,indoor
5490-5710/a,an20,an40,ac20,ac40,ac80,ac160,ac80+80(27dBm)/dfs,passive
5190-5310/a-turbo(20dBm)/dfs
5180-5300/a-turbo(20dBm)/dfs
5520-5680/a-turbo(27dBm)/dfs,passive
5510-5670/a-turbo(27dBm)/dfs,passive
902-927/b,g,g-turbo,gn20,gn40(30dBm)

So the card will modulate the Tx power to stay within the regulatory domain. For example if the a cAP ac uses ch 36 (5180MHz) 802.11ac mode, where the max power is 23dBm, the card would have the max Tx power of 23-2.5=20.5dBm even on the lowest modulation. When using the highest modulation, MCS9, the max power will be 18dBm+2.5=20.5dBm.

One way to create a compact wi-fi cell is to use the max tx power of the highest modulation, combined with a higher rate (12-24Mbps, I prefer 24Mbps) to limit the AP range only to the region of interest.

Since different channels have different max power levels, this will give all APs the same overall range and not give the clients the wrong impression that an AP is actually closer than another, just because it’s operating at a higher tx power.

Hello,

I’ve thought about this a little bit more and looked at the logs looking for more clues and my conclusion is:

Due to radar events the 5G radios have to stop transmitting, this leads to optimal APs to be avoided due to signal interruptions. If only the closest APs are visible, then the clients have only one or two AP choices on either 2.4GHz or 5GHz radios and only 2.4GHz if there is CAC conducted on the 5GHz radios.

The only clients that won’t roam to far away APs are the ones that have that SSID configured only on APs near them.

The floor where I have WPA2-Enterprise and one guest WPA2-Personal network has better AP converge and the configured SSIDs are only on that floor and nowhere else, whatever AP is selected, the signal would be decent. WPA2-Enterprise connected devices would sometimes choose a sub optimal AP but would roam back to an optimal one after a while.

Why some clients roam to far away APs? Because they can decode their beacons and because of bad roaming implementation, anything below -72dBm should be avoided like the plague if there are any closer APs.

Regardless of WNM&RRM, the client background scans would reveal multiple AP choices, even the ones that are woefully inadequate. This leads the clients to connect to far away APs and then try to connect to each one of them until they establish a relatively reliable link, even though the speed would be very low. Since the signal level is low, the client would eventually decide to roam to the strongest AP within it’s reach after a while, but it could take some time leading to customer dissatisfaction.

As long as the beacons are sent at 1Mbps for 2.4GHz and 6Mbps for 5GHz, these low modulations can be easily decoded beyond the building walls and floors (even though the floors are made of concrete) due to lower SNR and very high receive sensitivity.

A typical business laptop (HP Probook 455 G10) has a receive sensitivity of -93dBm for 1Mbps and -86dBm for 6Mbps but only -72dBm for 54Mbps, this very high receive sensitivity gives the clients the ability to hear sub optimal APs and select them for a while.

Another reason to increase the minimum rates has to do with multicast and a faster transmission of management frames.

If Mikrotik developers would be so kind to enable manual rates just like in the wireless package, that would be great!

@infabo, I’ve read some of your posts regarding some of the issues I’ve experienced

Thank you for the recommendations, like management-protection and connect-priority.

The one with disabling management-protection really fixed a lot of issues, maybe the SA Query timeout is too short when an AP has a high air time use.

Also the default connect-priority may be an issue when using ft-over-ds, since a STA would authenticate to another AP while still being connected to the current one. However it doesn’t seem to be an issue when management-protection is disabled.

I’ve found that because I have two capsman controllers, one on routeros 6 and another on 7, due to some APs not supporting wifi-qcom-ac, when discovery interface is used, the APs with wifi–qcom-ac will disconnect probably because during the discovery process they would connect to the wrong controller, I fixed it by removing the discovery interface from the config and using only their respective IP of the controller on all APs.

Essentially there were 3 issues:
APs would disconnect from the controllers leading to “bogus” roaming.
Clients would get SA Query timeouts when Protected Management Frames are activated, leading to disconnects. The fix being disabling PMF and use WPA2 Only.
Intel chipsets would get the most SA Query timeouts. I’m curious if this issue persists on RouterOS 7.18.