CRS317 Crashing v6.43

Hi all,
We have a CRS317 that we’ve recently put into our NOC. I jumped the gun a little as we needed to get it in, so I initially set it up with v6.43rc66. However, I noticed that the system was crashing; usually about every 36 hours or so.
The other night I went to site (as it had crashed again) and upgraded to v6.43. I thought it seemed stable, but this morning it’s crashed again. I see v6.43.1 is out now; but nothing in the changelogs appears to address this. The user in post #41 of this topic also reports crashing: http://forum.mikrotik.com/t/v6-41-current/114978/1 Note: We need to use v6.43 because of Q-in-Q.

The good thing is that when it crashes, it’s only the CPU that crashes and the switch hardware seems to pass traffic and operate as normal - I did hope this would be the case in the (presumed unlikely) event of a crash. There isn’t necessarily any major amount of traffic going through it when it crashes either; this morning it went down at about 5:30 - expected traffic flow around that time is only 20-30 Mb. Even still, it’s only the CPU that crashes; so it doesn’t seem to matter what the switch chips are doing. Also the first time it happened, the Watchdog Timer rebooted the switch - I have since turned this feature off as we would rather manually power cycle the switch in a time that causes minimal impact.
Has anyone else noticed this behaviour?


SOLVED: See post #4 below.

I’m seeing similar behaviour on different device: hAP ac2. Architecture (ARM) is the same though.

As I’m using it as router I can’t really let it hang until I get by, so I set up watchdog with remote IP to be pinged as well. I’m getting watchdog reboots in any odd hours, so I don’t think it correlates with traffic much. Until I set up remote IP check, it usually did not reboot itself. Existing connections kept flowing but new connections (either towards internet or router itself) failed, making cutting power the only possibility to reboot device. Most of times it doesn’t create autosupout file …

Thanks for your comment - very interesting to learn that someone else with an ARM CPU is seeing a similar/same result.

I’ve just spoken with our supplier and we’re going to RMA the unit - I’ll be sure to post if the replacement still has the same problem.

SOLVED:

This issue has now been resolved in v6.43.4 “bridge - fixed possible memory leak when VLAN filtering is used”.

After several very helpful support emails from MikroTik, I was made aware of a memory leak, resulting in the “Free Memory” under Resources getting lower and lower until the CPU would crash and go unresponsive.

I upgraded over 3 days ago, and this now looks stable with plenty of free memory.

Cheers!