Memory leak CCR2116 v7.19.4 (fixed!)

We have multiple CCR2116 running (4 to be exactly), but on 1 of these routers we have some strange memory leak, i noticed it also in v7.19.1 but maybe i was started earlier, that i don't know for sure.

Every 3 to 4 weeks the memory reaches around 35% the router get problems.

Routing table is stuck (freezed), so routes that where in the routing table just before the critical point stays working, but are not visible any more and not updating, but traffic stils flowing.

BGP and OSPF stops working completly (also config not visible any more in winbox), no route updates and routes not getting announced any more.

router is only used for BGP, OSPF and VRRP
no firewall active or session tracking.
no l3hw offload active

To get it working again i have to reboot the unit.

Do they run same ROS version?
Do they went same way of upgrades as some leftovers could be deeply hidden in rules?
Can you export config, netinstall device and import configuration?

Just answering myself after a few months, i found the cause/solution of the problem.

For anyone who have maybe the same:

We have around 160 BGP peering session (transit and peering) we have them al on “alone“ for input and output affinity. But, i think since a certain version update, the limit for BGP is 100 unique Processes: Routing Protocol Multi-core Support - RouterOS - MikroTik Documentation

After setting input to “remote-as“ and output to “input“ for most of the peering sessions, and transit i have set to “afi“ for input and “input“ for output we have about 40 proccesses left.

This fixed the problem!
No more memory leak, no more ever growning memory usage and no more problems/freezes.

1 Like

Ok I understand that there is a number of processes limit, but what I do not understand is why exceeding that limit would then cause a leak. Maybe you want to report a bug (make a ticket) so someone can look at that.

BTW, you have many BGP peers, did you ever notice a tendency for several BGP connections to close when one hits a trigger to close? (peer reboot, connection down for >holdtime). You can see in the (refreshed) connections window that several peers have the same uptime even when there is no common cause for disconnect.

I am fighting that problem for quite some time now. Support has suggested changing those affinity settings but it doesn’t help.

I think, but this is a assumption/feeling, that every reconnect the peer becomes in another process of the existing 100 but somewhere the “old process“ does not clean-up the memory..

It doesn’t look like i have the problem you describe. the peers with same uptime are the same remote-as connected over multiple exchanges.

On thanks for the info. I still cannot understand what is happening but one clue is that all those peers affected by this are L2TP/IPsec tunnels connected to the same local IP. These are backup connections in a head-office-to-branch-office network where the main traffic is over GRE/IPsec but the L2TP/IPsec is used for backup over LTE. It appears that (in some cases) when a peer disconnects, all peers with the same local address (but different remote address) disconnect at the same time.

As long as I am not sure that is resolved, I hesitate to update another installation where a lot more peers are over L2TP/IPsec… it is still running RouterOS v6.