WAP AX out of memory issues

We have a Mikrotik WAP AX for a month a now. We ran into out of memory problems with it. It serves about 50 WIFI clients, most of them 2.4GHz N devices, and a few 5 GHz AX devices, phones. No Capsman.
We also have about 40 scripts running on it.
Every 5 to 10 days the WAP AX runs out of memory, the CPU load gets higher and higher (maybe zram?), then it becomes unresponsive, and either it reboots itself or we reboot it due to service outage.
It runs latest 7.20.5, we have tried most 7.20 versions since we have it, but all had the same issue.
The WAP AX has about 45 MB RAM free out of 256 MB when doing nothing. This gets less and less over time, then it crashes.
Yesterday we migrated back to the 10 years old WAP AC, which has only 64 MB RAM, we use the old wireless drivers, and it can serve the above requirements without any problems.

Has anyone else experienced such issues with the WAP AX?

This type of issue is down to software, not hardware. As others are not complaining of 7.20.x exhausting its memory over a comparable timescale, it points away from RouterOS and to the scripts. As the old unit is working, this suggests looking at differences between scripts on the 2 units.

I have multiple wAP AX running for months but only as AP.
Zero problems.

So it would definitely be worthwhile to look at the scripts being applied.

40 of them ???

As a workaround: add 1 script to reboot every 2 or 3 days ?
Doesn't solve the problem but might take away the unexpected part of the reboot.

What are those scripts doing? I've never really heard of them running out of memory...

The same scripts run on the old WAP AC without any problems. A few of them run every minute, others run once or twice a day, turning lights on or off (via calling web-urls), monitoring if our phones are connected to the AP, deciding if we are at home or not etc. It is part of our home automation.

How many wifi clients do you have on the WAP AC and how much free RAM? We have about 50 IOT devices (ESP8266 and ESP32) sitting on the 2.4 GHz radio 24/7. Not much traffic, but they are connected.

I suspect it is this new style wifi-qcom wireless package, as this is the main difference from the old style wireless that is on the WAP AC. I’ve read a thread here on this forum, that the 128 MB WAP AC version had out of memory issues with new wifi-qcom-ac, but not with the old wireless. It can also be some wifi configuration, that is causing the issue in our case.

We bought the WAP AX second hand, the configuration was cleared, and we updated to Routeros 7.20.x. I don’t think that it has something to do with it.

The Wifi interfaces are disabled since Yesterday, a few scripts running, to see if it is the scripts that causing the issue. It has 41 MB RAM free. We are monitoring the situation.

The new AX drivers do lock physical memory for Wi-Fi use, more than an "ac" model. Not expert, but at the higher data rates, more memory buffer is needed to keep it fast — so you're starting with less. Now newer model also have more memory so an "ax" model so it's an bit apple-to-oranges comparison

Never seen a memory leak due the engine running scripts. And, I doubt it is the script themselves per se are the cause.

What may be happening is what the scripts are doing. They likely invoke CLI commands and/or enable/disable things. So it is possible some feature/actions they are being trigger by a script has some memory leak. But I suspect the same leak might happen if done via CLI, thus blaming scripting engine is premature here.

Now at 40 scripts, that makes it hard to narrow down which one has some problematic thing... But I doubt memory leak is due to using some scripting syntax, so you'll have to look at what these scripts are doing with more granularity and/or disable/enable some scripts to find which one is causing this.

Memory Leaks.

These are caused by processes which do not fully terminate but remain resident in memory. In the complied environment this arises due to running a process and not releasing the memory when it is done or just plain recursive running of a process which calls itself either directly or indirectly. So recursion is good for parsing finite data structure but really problematic for arbitrary processes such as finding the square root of something to an undefined precision.

In the scripted environment, similar applies, although it may not be obvious. For example a script could call a process multiple times and never actually close any of them. We might be looking at a script which just finished and is zombified because an exit is not called - on one hardware platform routeros might just clear the debris and recover the memory space and on another this might not happen. We might be looking at a timed job which finishes correctly and is cleared on one platform but not on another.

Nonetheless I would personally try 7.19.4 or 7.19.6 or even an older release, noone knows if the recent 7.20.x has some peculiar behaviour in some particular case.

@LFi
Which Ros version does the Wap AC run?

With the new download section of mikrotik.com, only listing 7.20.x releases….

Haven’t seen any memory leaks on any of the WAP AX’s I have running. I would test scripts first.

If possible create a supout.rif when these CPU spikes start. Then inspect it using the supout.rif viewer on mikrotik.com.

We installed 7.19.6 on WAP AX, and it starts with even less free memory (26MB) than with 7.20.5 (42MB).

WAP AC runs 7.20.5, and it runs fine. It has 12.2 MB free RAM out of 64 MB total RAM. It has 29 Wifi clients connected, running the same scripts as the WAP AX did.

We have tried that, unfortunately the hap ax crashed while creating the supout.rif.

Since you mention 64MB of RAM it must be the old MIPSBE Version of the wap ac.

Which also means you run the “old” wireless drivers which require less ram.

My wAPs ax only have 3 bridge rules and nothing else besides being controlled by CAPsMAN. However, they always run at 87%-91% of RAM usage, which is quite high to be honest.

I hope the next generation has more RAM, but also that at some point a memory profiling tool is introduced.

My wAP ax has 256MB RAM and is consuming around 220MB. It's not changing over time so I'm not worried.

If device is not crashing with some sort of OOM condition, then seeing 95% of RAM used means that ROS is utilizing resources as intended.

OTOH my Audience (with also 256MB RAM) uses around 130MB of RAM. Both are capsman-controlled, the big difference is wifi driver used (wifi-qcom vs. wifi-qcom-ac). It does seem that wifi-qcom-ac is geared towards 128MB RAM devices (e.g. the "normal" hAP ac2)

WAP AC with 7.20.5 also crashed with out of memory. It survived about 9 days.

See the graphs, the same pattern as with the WAP AX, CPU load starts to grow, device starts to not respond to SNMP (resulting in the choppy memory graph), then reboot, memory usage falls back.

The last known good version is 7.19.4 on the WAP AC, that was the version running fine before the WAP AX arrived. Then WAP AC was upgraded to 7.20.4 then to 7.20.5. So I think there must be some sort of regression affecting us, and may be related to the scripts.

We will go back to 7.19.4 on the WAP AC and test again, and if it works, test WAP AX also with 7.19.4.

Since I am not allowed to post two pics in one post (which I was able to do in the beginning of the thread) I post the memory graph here.

I can see that the "CPU consumption" was going on for a whole day .... so when/if you see it again next time, check what in particular is using up the CPU cycles (run CPU profiler). And if you do see something weird, create a supout.rif and attach it to issue ticket you're going to open with MT support.