802.11ac severe speed degradation with ROS above 6.45.9 (LTS)

Hi all

I want to share something I finally tracked down to the source (or so I believe) yesterday evening.
Thanks to Kev for bringing this performance issue to my attention. Kev noticed it on his hAP AC3, I noticed/replicated the same on my hAP ac and started digging…

I run hAP ac with 350/35 line from ISP, there is no fancy configs on Mikrotik, really basic NAT setup, ethernets and wifi linked on LAN bridge. I tend to run LTS branch these days.

When I run on gigabit ethernet the speedtest.net flatlines immediately at 350-380 download as expected. Problems begin when I go onto wifi - I noticed it depends on which RouterOS is running on the device. I have tested it multiple times, over and over… taking several days to test each variant in different network conditions (time of day) and different speedtest.net servers (also tested against fast.com which is Netflix AFAIR).

Anyway, here are common things that are fairly constant across all the tests:

  • radio interface is set up as AP Bridge, 5GHz-only-AC, channel auto, width 20/40/80MHz XXXX, UK, regulatory domain
  • my Mac shows that radio link speed is floating between 1170Mbps and 1300Mbps

Here is where things get strange - depending on version of RouterOS (all below are LTS):

  • RouterOS 6.44.6 - wired speed 350+, wireless speed 350+ Mbps
  • RouterOS 6.45.9 - wired speed 350+, wireless speed 350+ Mbps
  • RouterOS 6.46.8 - wired speed 350+, wireless speed 50-80 bursting to 90-100 Mbps for short durations of time
  • RouterOS 6.47.9 - wired speed 350+, wireless speed 35-50 bursting to 70-80 Mbps for short durations of time

I was upgrading and downgrading ROS versions and monitoring what changes in behaviour, without touching config on the device and finally settled on 6.45.9 as the last LTS release working at full speed.

Another observation:
I run DNS cache/resolver/forwarder on Mikrotik, handing off queries to Pi-Hole running on home server, so mikrotik asks pi-hole, all machines ask mikrotik… Since update to 6.46.8 I started getting odd DNS timeouts and my network monitoring scripts light up the dashboard like a Christmas tree several times a day, with ROS 6.46.8 or above it happens every 10-15min, sometimes on and off for hours… but again, since going back to ROS 6.45.9 problem disappeared completely.

Now, I have no clue what may be the reason for this, I hope some really smart people here will be able to work this out. For now I run 6.45.9 LTS and I’m happy with the outcome, but my OCD kicks in and I need to understand what caused it. May it be a bug in ROS, maybe some driver change, maybe something else? Fingers crossed someone will know the answer.

Cheer
Tom
Screenshot 2021-05-03 at 10.31.21.png

Difficult one. Just some hints

(0. My favorite ROS version is 6.45.6 , yes stable, but really stable)

  1. With channel on “auto” and “XXXX” sideband you leave quite some variation in your setup that is not related to the performance of a certain release, but dependent on the channel and sideband selected. Either verify what freq are chosen, or set it fixed to a channel and to a specific sideband like Ceee.
  2. You looked at the Mac wireless interface status, there is more information in the MT AP “Registration” table, detailed view or print in CLI. Interface rate is expressed in Mbps, not MCS encoding, but that conversion is easy to do (mcsindex.com). AFAIK hAP ac3 has only a NSS of 2 chains.
  3. I get higher speeds between 2 MT devices, and via Btest on client. “Speedtest.net” includes quite some more technology in the test path. What is to blame for the performance hit? Even if it boils down to ROS, then what part of it ?

Thanks for the response!

Yes, I am fully aware of that and good comments… but even with sideband fluctuations, how otherwise to explain such massive variance in actual bandwidth?

When radio link shows sync over 1gbps (screenshot taken just now), fully saturated ISP line will be 35% of it… yet with certain versions of ROS I am around 5% struggling to reach 10%, so this is massive impact here :frowning:

When I have a bit more time I will actually set the sideband to fixed and re-run tests across ROS versions. I doubt this will make any difference to be honest, just looking at relative numbers above.

Tom
Screenshot 2021-05-03 at 11.31.31.png

So it seems not to be the wifi connection. Difficult but very interesting case. Looking forward for more information on your experiments. Maybe look into more information (like CPU load, Profile, Sniffer or Wireshark dump to see the packet timing (what side is delayed?) …).

Can you please post your configuration while running 6.47.9? (/export compact)
Do you see any difference between configuration of 6.45.9 and 6.47.9 at all if you compare full /export?

I’m sorry, but I really don’t have that much time for digging to find root cause - I’d like to, but… no time :frowning:
The only thing I do when testing is upgrade ROS or downgrade - no other changes at all. I have stable config running for years that I do not touch at all (unless ROS edits them up and down with update/downgrade and doesn’t tell me). As such, my post is a statement of a fact - something observed and tested multiple times.

BTW I just found one of my other devices (RB951Ui-2HnD) also being affected in the same way. Device worked for 2 years spotless, at some point I upgraded ROS to 6.47.9 and issues started but I did not connect them to the ROS change, blamed it on radio congestion in the area. Downgraded firmware yesterday - problem disappeared, performance back to one from before upgrade. Again, no config changes, only ROS version downgrade.

I would like to know if this is a general performance problem. I do see issues with 6.47.9 @home (my test bed), but I’m not sure if it is related to MT. The issues are non-specific and different for all devices, (like a relative old laptop with only a 802.11n interface , that sees the wAP ac 1/2 of the scans, and the hAP ac2 1/3 of the times in the scan every 8 seconds in WinFi. The Draytek AP and BBOX from the ISP are seen all the time. So the laptop never picks one of the MT AP’s). A pure 802.11ac stick sees them all the time in WinFi. Is it MT "ac only, N/AC, A/N/AC related? Changing that didn’t help. Even in my recent Android smartphone “Wifi Analyzer” misses the MT AP’s until one is very close to the MT AP, never misses the other AP’s. Something with “Wifi Analyzer”, something with “WinFi” , or something in the MT AP beacon ?

There is more in the config than the user set configuration. From release to release sometimes “defaults” in the config are changed. So indeed you need “export verbose” to know all parameters, and if they are changed or not. ARM cpu frequency stepping is only since 6.48, so it is not in scope, yet.

EDIT: downgrading to 6.45.6 didn’t change anything here. My client devices just prefer other brands of AP’s for connections. Or is it because they can estimate their capacity (QBSS field in the beacon, missing in the MT beacon). ?

@bpwl - interesting find here.

What about with the latest 6.47.10 release; do you still have same issue of devices not seeing and or not picking the MT wifi?

Perhaps we can send this over to MT regarding MT AP’s not having QBSS field in their beacon.

The hAp ac2 is on release 6.47.10 , and there is no change in behavior compared to 6.45.6.. QBSS is not there with MT.

Even if I try hard to create “laboratory conditions”, this is not really the case. It is my home installation, with ISP modem, Draytek and hAp Lite in the garage, hAP ac2 in the living room and the wAP ac out in the veranda. hAP ac2 , wAp ac and Draytek have common SSID’s set. they use channel 1, 6 and 11 respectively. The ISP modem jumps sometimes to another channel. hAP Lite is on channel 6.

Where I use my clients, the signal strength is almost equal in the living room, except for a weaker wAP ac. Even if the signal of the hAP ac2 is somewhat stronger than the Draytek (signal reading is delayed on Android) … PC and smartphone connect to the Draytek, not the expected hAP ac2.

Too many parameters and influences to conclude anything. (Except that it is annoying behavior.) e.g. basic rate in ISP modem and Draytek is 1 Mbps, and that cannot be changed (even if it is set to operate in g/n or n-only.). Bringing down the TX power in the Draytek from 80% to 20% convinces the laptop to connect to the hAp ac2, but not the smartphone. (while changing SSID’s). So it is just not a good enough experiment ! Better controlled environment and better measuring/analyzing tools are needed.

To be clear: if have no degradation issues with any connection!

I do personally miss that channel utilization information in MT. “What is your expected throughput if you only know the interface rates and CCQ?” One can be expecting way too much, and even select the wrong channel. I know there is “Freq Usage” in MT, but that stops transmissions. The MT should already know the channel utilization from the CCA process, while working.
Klembord-2.jpg

@tmiklas

Could you try 6.47.10 Long-term release and see if speeds improved? I noticed an improvement on my devices.

Or even goto stable branch and install 6.48.3? Curious if any differences.

Run speed tests a few times, usually the first test will be lower.

What if you enable WMM on your wireless interfaces. This should be QBSS? Let me know if you already had WMM enabled or not?

I have WMM enabled. Should check the WMM info in the beacon… just will do it …

WMM is clearly in the Routerboard (hAP ac2) beacon.
.
Klembord-3.jpg
.
And I see the normal values for “CW min” etc.
.
Klembord-2.jpg
.
But the QBSS is missing, as can be seen from any other AP, Like the Draytek (also same WMM information here)
.
Klembord-4.jpg
.
Interesting information, for client device, and network operators.
.
Klembord-5.jpg

@bpwl Thanks for the great info, as always!..

Perhaps MikroTik can make a few comments here, or I send in a ticket for their review of this thread…

I have a feeling we will have wait for RouterOS 7 & the new hardware that supports it, along with the new wifi driver implementations. But, maybe MT will give us a present and fix lingering issues in the 6.XX train that is supported on the current wireless hardware [probably only fixes to ARM cpu based]??

Gut feeling routerOS 7 wireless will ONLY support ARM based CPU’s, and we will see further improvement in hardware + features… wishful??

I have an upcoming client wireless refresh for their locations. I am to quote MikroTik, but i am unsure. I know MT RouterOS very well and so comfortable. However, I might have to quote out the Aruba InstantON range of access points. 50/50 decision

wifiwave2 package for RouterOS v7.x consists of binary firmware drivers from chipset vendors which MikroTik started to integrate into RouterOS, that´s all. It has nothing to do with the drivers which are used for RouterOS v6

For which environment do you want to replace your access points? Higher density locations like a school, university, enterprise building, small homes? MikroTik is not meant for the first 3 ones.
Have you already created a requirement list for your new accees points?
e.g.

  • IEEE 802.11k
  • IEEE 802.11v
  • IEEE 802.11r
  • IEEE 802.11w
  • Airtime Fairness feature support (https://www.tp-link.com/ae/support/faq/2095/)
  • Fully compliant with IEEE 802.11ax specification
  • DL MU-MIMO
  • UL-MU-MIMO
  • BSS Coloring
  • Target Wake Time
  • Something like ZeroWait DFS
  • Upload and download OFDMA with up to 37 resource units when 80 MHz channel is used
  • Central management software or embedded controller?
  • Should the access points forward the traffic to a central controller via tunnel?
  • On Premise or cloud management?
  • Possibility to deny client-to-client connections
  • Powered by 802.3af, 802.3at or passive PoE?
  • Which multicast optimizations
  • How many clients do the access points support in reality you are currently quoting for?
  • Are there any high density benchmarks available for that specific access point model you are looking for?

Understood about your recommendations. I also realize the wireless driver binaries for RouterOS v7. It should help once MikroTik starts utilizing vendor wireless drivers over their own; or they learn from it.
No, I am not rolling out MikroTik Wireless to enterprise environment; that is not their space. I reserve MikroTik wireless for small business clients [Small restaurants, Pizzaria, doctor offices, etc]

I’ve also had success with TP-lInk Omada AP’s [EAP series] over Ubiquiti – I do not trust UBT anymore. However, when I have a restaurant client that has both indoor and outdoor wireless requirements; I tend to lean toward MikroTik. CapsMan works great.

Enterprise environments that I’ve done roll outs was aerohive, as well as Aruba. But SMB space, there are so many options…

I didn’t believe it. But the findings in this post are true.

I’ve had several clients on a single access point from our tower. on AP4 we have 20 and on AP6 we have 14. All clients are approximately 10-13 KM from the tower. All are mikrotik with NV2, 3ms, fixed 65% download. AC only. Ceee 80mhz.

All have good to mid signal - -40 to -57 RX.

Recently there have been complaints and reports from my installers.

I’ve tried new firmware, I’ve adjusted settings. I finally had resigned myself to the fact that it’s possible that after 5-6 years an AP with heavy traffic may just slowly degrade and the radio chipset has been “burned” and we are scheduled next week to replace the netmetals listed in my report, on the tower @ 6500’ elevation, at 180’ off the ground. (tower crews are expensive)

I’m astonished that after reading this post I did nothing but downgrade firmware. And Now the Finale:

Clients that were struggling to get 15-20 mbit TCP, or 40mbit UDP this morning…

after the downgrade - Now get 140mbps TCP and 201 mbps UDP on a send test from the AP to the client. From the Ethernet interface, I get 280mbit to my head end 20gbps fiber downtown.

I didn’t adjust settings, I didn’t downgrade the client software.. I didn’t even downgrade the routerboard firmware. just downgraded to 6.45.9. Problem solved.

Mikrotik… What the… (I mean.. Please read my post and please please help, as there are clearly issues with the newest “stable” and long term software with regards to the Radio Performance.)

Also - my 2 cents.. I do use mikrotik for enterprise clients too.. a school uses our service for a 300mbps fdx 60 GHZ link. With mikrotik and Enterprise.. just buy double. Now you have a redundant system that performs just as well, and if/when a failure occurs your gear migrates over. Still less than a single unit of “enterprise” gear that will also fail. It all fails.. eventually. I just got a 45K Aviat Microwave system… the newest best I could buy.. Software defined radio… QAM4096.. the best I could get with the MAX licensing… Aviat sent me undersized pwer supplies that burned the gig ports, and ultimately I had to power the radios with 48vdc x 10 amp - though the system says it’s 3 amp… Even spending that much and they sent undersized crap that doesn’t work.. so for 45K the best of the best still has problems… Cheers!

Dowgraded from 6.49.1 to 6.45.9 one of my home wAP to test this.
The AP is a cap on a 1100AHx4DE.
Tested both versions on 5 Ghz at ~ 5m distance, direct sight and could not observe any differences.

6.49.1
Screenshot_20211204-071551_Speedtest.jpg
6.45.9
Screenshot_20211204-072340_Speedtest.jpg

At some point in time (could be it was something like 2 years ago, quite likely after 6.45) Mikrotik started to observe country regulations about allowed Tx power (and some other details, such as DFS). As country regulations mostly restrict WiFi devices to some pretty liw EIRP, this means most of APs reduced Tx power. With high-gain antennae the Tx power drop is even greater. Antenna gain settings used to be pretty free before, after that point if time it is not possible to set antenna gain on devices with actory-attached antennae to value lower than real gain.

Which likely means that your PtMP devices transmit at much lower power when running 6.49 than they did previously. You can verify the currently used Tx powers using command /interface wireless monitor to see if that’s actually the case.

Reduction in maximum Tx power would explain difference in observed behaviour between OP’s case and test by @inteq … the later being conducted in ideal radio condition and the former where there’s a very considerable loss in signal strength due to distance between AP and station.

Note that current behaviour it what it should have been all the time, Mikrotik (and other vendors) were forced to correct the behaviour by country regulators, most notably FCC, and there’s nothing you can do to make Mikrotik revert the changes. The only thing to do is to keep running old ROS (and live with consequences, FCC might not like it) or re-design your wireless network so that it’ll work within country regulations.

At some point in time

Yes, indeed, …

I’m young in MT experience , but have seen those last changes. Most of my installations is still running 6.45.6 because it is very stable. The antenna gain setting is still there, but cannot be set lower than the built in antenna. This minimum antenna-gain limit has been updated in more devices since then. So for Europe (ETSI) escaping was not possible with built in antenna, and is not wanted.

To find out if things changed for a specific unit, one will have to check regulatory setting (/interface wireless info country-info) and antenna gain setting in the different ROS releases.

Meanwhile, my tp link eap 245 v3 hums along with no issues and expected performance…
Very happy with MT routing, home wifi not so much. I quickly learned the wrath of university students, poker players, and significant other, is not worth the fun of playing bpwls “explore the minutia of a broken wifi design” game.