capAX and capAC fail when loaded in "real world" environment (school) with 30+ clients

I am posting this additional topic because my original topic subject line does not fully reflect the problems we had. Original topic: http://forum.mikrotik.com/t/cap-ac-at-school-seeing-huge-drop-in-performance-when-more-than-25-clients/183667/8
Maybe someone with similar problems will find this useful. The forum user maigonis was very helpful and appears to have gotten a school functioning with capsMAN and capAX. But we could not spend any more time on this ongoing issue.
We are not new to Mikrotik. We have been using Mikrotik routers, APs, switches, P2P for 10 years and are Mirotik certified. We like them and they are are primary equipment.

Environment: K-12 Private School with approximately 550 students, multiple buildings, 50 staff
WiFi Devices: capAC and capAX
Other Network Devices: all Mikrotik Routers and Switches and P2P
WiFi Management: tried both capsMAN and direct device configuration

Problems (WiFi):

  • Random client drops (eventual solved by abandoning capsMAN and doing direct configurations on the APs)


  • All clients on an “loaded” AP losing bandwidth (other campus locations ok). Problem solved with Unifi APs

Solutions Attempted:
Problem 1 - Random Client Drops:

  • Switched from capAX to capAC due to number of forum posts about compatibility problems with some client devices. Did not change random client drops.


  • Abandoned capsMAN configuration. We have experience with capsMAN, we know there is “overhead”, but we like it for management. We know how to configure capsMAN for best performance. capsMAN config was solid, but when we did direct config on the AP (no capsMAN), the random client drops stopped.

Problem 2 - Loaded APs losing bandwidth:

  • The APs where clients were losing bandwidth did not “look” loaded - CPU was 2%, client connections were good, number of clients was 30-60. Problems came up most often when students were taking an online test, not a really heavy load - not streaming videos - just all kids taking an online test, the available bandwidth for the clients would come crashing down. Only for clients on that AP, other clients on other APs, not loaded, had normal bandwidth. Core router queues were normal.


  • We tried every tweak and config recommendation we could find on these forums or reddit. Tried capAC and capAX in the heavily used classrooms. The problem just kept coming back (at the worst times - during an important lesson or test).


  • capAX had compatibility problems with a lot of the iPads, even when we adjusted configuration for “max compatibility”.

What finally worked?

  • After 9 months of tweaking, replacing devices, testing in office and “testing” in live classrooms - with countless un-billable hours wasted…


  • We installed 5 Unifi U6 APs in the classrooms with the most issues. Everything worked. No issues the next day or 5 days out.


  • I don’t even like Unifi - the “dumbed down” GUI management, the lack of controls, lack of meaningful data, the push to get all Unifi devices - these are reasons we switched the school to Mikrotik the year after we took over IT management.


  • But, the Unifi APs clearly worked where Mikrotik APs were failing. And with very little configuration and time.

.
How is it that Unifi works so smoothly in this mildly loaded environment, with a central controller/manager, and Mikrotik fails? Mikrotik has so many amazing products and abilities, but it cannot handle 40 clients on one capAX (with local config, not capsMAN) when all the clients (students) need to take an online test?

  • First capsMAN disappointed us, now the actual device (capAX and capAC) have left us completely disillusioned. So much time (un-billable) was lost in the last 9 months on this one client because of undocumented Mikrotik issues.


  • We will still use Mikrotik for router/switches/P2P, but anywhere there may be a “load” on the WiFi, we will have to go with something else. This time it was Unifi, but we will keep an open mind.

So ends this postmortem of this issue.
-Keith

Sad to hear this…

Based on my experience whit Unifi here are a few thing they do differently:

Their security settings (at least was) dumbed down by default. Its a know issue in general, that wpa3, management frame protection (mfp), and fast transition (ft) can cause issues. Not just “cant connect”, but can cause disconnects after a while for multiple reasons. SAE as wpa3 protection can cause authentication related issues and ft can cause issues related to roaming. This is not MT specific issue, all vendors suffer from this. So, for maximum compatibility you can set (if haven’t done already):

/interface wifi security
add authentication-types=wpa2-psk disabled=no management-protection=disabled name=Administracija wps=disable ft=no

To go wpa2 only, no mfp and ft off (like Unifi does). This is code example, just set in GUI accordingly or edit command before passing it to capsman/standalone. Btw, your capsman vs standalone test is most likely related to roaming problems, as capsman puts all their APs in one steering group. Be aware, changing security setting like this (going wpa2/wpa3 to wpa2 only whit mfp and visa a versa) can cause stations connectivity problems, and you need to reconnect them (by forgetting network and reentering password) duo security related changes in SSID.

Random client drops can be security related as I posted, but ax lineup (as I mentioned before) has issues. Do update to 7.19.1, I haven’t done my testing fully yet, but looks like there is changes related to this. Cant say for sure driver is finally changed, but I do see improvements/changes already.

Bandwidth related issues can be caused by overlapping channels and hi airtime load on them. As ax can cause disconnects (this for example can cause ipads roaming to AP behind wall or connect to 2.4ghz band), it performs well on throughput (up to 40mhz wide channels, 80mhz have issues). I haven’t experienced any throughput related issues, but my ax experience is limited (that school I mentioned is on ac lineup). Unifi by default can pick a better channel, so performs better.

Load you are describing is minimal, online tests (that are just text and pictures) do not put much load on AP, so whit out knowing more details it makes me think about bad RF environment. (not blaming you). Its AP channel selection or load in general, that can be caused by many problems. 30-60 clients whit good modulations do not generate that much ilde/management RF load to halt a channel, it is something else or multiple issues (most likely). How can Unifi work better? I cant say, I dont have all statistics and tests done to identify specific issue, but Unifi usually does dumb down their settings, sometimes even go a round 802.11 standards to make their APs a easy install and “set and forget” experience (builders dream how its called in industry). Now when you are trying to fix things by making changes it can even more break everything, as I mentioned, security related changes can require connecting to WiFi again with a password - not that easy to do on hundreds of schools assets.

I do feel the pain and know how it is when network have issues while students are taking exams, participating in local olimpiads (might not be the right word, but its basically competition city or state wide in topics for best students whit prizes etc, now done online) or daily cant just use WiFi in learning process as learning program specifies to do. Its not 2h fix, even if you decide to go Unifi only. Good deployment practices apply and you still have work to do.

Best of luck, as this experience will not be pleasant, you will learn more about hi dense WiFi networks and how to plan, deploy and configure them.

All of the above is valid except just step back. Try all of the above or buy Unifi which just works. That’s the crux of the issue here and it’s sad as I really want Mikrotik to succeed.

@Keith
I think Mikrotik should get in touch with you directly. This really is a sad story. Not because of the happy end (Unifi APs) but because of all the “production testing” which - I feel it - must have been frustrating for everyone.

Please also bring this to MikroTik attention. Unsure if they will care, even for us certified professionals and those of use who’ve been supporting them for 10+ years [ourself included]. I feel MikroTik does not care about “professional”. Want to now stay in this niche market, low cost hardware for home environments in third-world countries.

We’ve also had very similar issues with deploying CapAX hardware in environments. Soon as we replaced with another AP vendor [Cambium], issues went away. Endless time spent with tweaking CapsMan configuration and still had disconnects or poor performance. We had better experience with AC hardware with CapsMan, although - not perfect.

Also, the release of 7.19.1 is far too late for them to just now fix their existing AX hardware; albeit driver related or not. Time and time issue with MikroTik software QC — hardware on paper is excellent… software is holding them back 2+ years after release. Not acceptable for these type of deployments where end-users are the beta tester. Esp. for large scale PRODUCTION deployments.

MIkroTik Wifi we no longer suggest, and barely a this time are deploying their routers [due to loss of confidence and software QC].

Also feel, and prove based on hardware breakdowns [look at FCC hardware breakdowns]. Unifi and other vendors have better antenna designs and shielding.

Why I started buying TPLINK APs. :frowning: and will likely buy a zyxel wifi 7 device at some point.

Yeah, been down that road as well. Unsure about the Zyxel though. We’ve moved to Cambium and are not looking back.

Cambium is really a different ballpark but then again, what price do you put on stability and performance?

Some vendors really build for density and they perform very well.

Right now I have two setups, at home I have full Mikrotik which consists of: ATL18, RB5009, CRS310 and 4 pcs wAP ax where 3 wAP axes are APs controlled by @anav’s here favourite CAPsMAN and one is client device for my cameras at the garage.

Second is at another house which is full Unifi, so their UCG Max, 2 Poe Lite switches and one flex 2.5G PoE, 2 U6+ and one U7 Pro AP.

I’m really happy with my wAP ax, setup is rock solid and I have a lot of IoT devices, wireless cameras laptops, tablets etc. When I had cAP ax it was different story, lot of dropouts, disconnections etc.

Routers and switches are rock solid. No issues whatsoever.

Unifi, well everything is user friendlier (Which I thought I will like but apparently when you work with Mikrotik that is no longer the case), BUT, their gateway UCG Max, which is also as NVR for cameras, is running really, really hot, like 78 degrees in winter (About 173 degrees in freedom units) and right now when summer is slowly but surely kicking in 82 degrees (about 180 degrees in freedom units)

I had 3 USW Lite 8 PoE but replaced one with USW Flex 2.5G 8 PoE because one of the USW Lite 8 PoE started, after only 3 months, acting up, trunk port fell to the 100 Mbps from 1 Gbps, switch started to reboot without a reason. Right now on the same wire I get 2.5G uplink without a problem.

APs are rock solid and U7 Pro is really great AP.

All in all, wAP ax is a step in good direction for Mikrotik but they really need to up their game in wireless segment as well as routers and switches. UCG Max have 5 ports, all 2.5G… Flex 2.5G PoE have 8 2.5G ports PoE++ and one RJ and SFP+ 10G combo. Internet speeds even at home are getting faster and faster and it’s not uncommon in Croatia to have 1G fibre, and in bigger cities there is 2 or even 10G available.

I think it’s time Mikrotik to ditch 1G ports on your more premium devices like hAP ax, RB5009 etc and start using 2.5G, also I really hope that Mikrotik wifi7 devices will not have 1G ports…

A home setup, no matter how many IOT devices you have, can’t really be compared with a busy school/office scenario. The demand, connects, disconnects will be vastly different.

I agree, I can understand OP here because I had problem with cAP ax in home scenario and I know a lot of IT guys that like Mikrotik but they avoid their wireless at all costs.

I will agree that in large BYOD networks anyone can bring anything, expecting it to have a connection, so network needs to be configured for max compatibility (if that is your goal).

Issue in general I see is, when people think that custom ASICs, gold plated antennas or other stuff will magically work. That Cisco will hold 300 devices and somehow from 40mhz channel will provide usable internet to everyone that needs it - it not the case. 40mhz wide channel on ax lineup at best can give you 5730.7=401mbps theoretical speed, but if we count in additional headroom to manage connected devices, environmental issues (including people moving, device path blocked, distorted to AP, resulting in lower modulation and increased retry count in general) it most likely drops to 5730.7/2=200mbps. How does one expect 200mbps to be enough for 300 devices? Its 0.6mbps per station (web browsing today needs at least 2-4mbps per station). Its a hyperbolic example, but sometimes it feels like people think like that.

Mikrotik is creating basic devices, but it can be enough. I have posted about my journey creating 1000 students school network whit ac lineup (hap ac2, cap ac, wap ac used) and how I did succeed. Key is proper deployment and configuration, to make sure your stations will have good modulations (even under load, not just idle), load is reasonably split and channels used by APs are clean + traffic shaping. As result ac lineup in school environment holds well, can output 140mbps under pressure (400*0.7/2=140mbps as ac modulations) and hold station connection even when radio max peer setting is reached. There was an event in schools, back yard, and at that point only one wap ac was online, so it took it all - 129 stations constantly connected to 5ghz radio (2.4ghz band is off there). Mostly devices was idle as it was event, tested airtime load from another AP on that channel, dont remember exact number, but it was low, around 20-30%, maybe.

So it can be done whit basic hardware, other important part is software - this is where MT currently have issues on ax lineup. Other vendors can be better because of this, not just driver/code stability but also configuration. Aka they configure stuff for you. In my Cisco example it can be a simple, well configured queue as improvement under the hood, can be done in ROS manually (CAKE can do magic). So for me devices can stay mostly the way they are (I really hope to see lower power 6e/7 APs for high-density networks, as 2-3dbi is fine for 8x6m classroom, large gain antennas are too sensitive, it is harder to organize channels as APs “hear” too far and roaming issues as stations stick to AP further away), but software needs to be fixed/improved.

However, there is room for hw improvement, whit out adding gold antennas and additional ASICs. For example 4x4 is a thing and does help even 2x2 devices on volume ( need to do more testing), waiting for 6ghz band support, wifi7 MLO etc. To further improve connectivity to larger amount of devices per AP.