Our ISP is testing Mikrotiks as customer home routers. Currently we have 60 hAP ac2 (ROS 6.47.7) devices running and we manage them through TR-069.
Everytime I change the config script and push it to our 60 devices I have a small percentage (usually 2-3 devices) that doesn't boot up. Each time, they are different devices that become unresponsive like this and the logs are all the same. These are the last logs I get from the device (and the log is the same for the devices that boot up properly):
2020-11-18 04:05:29 -0700 MST system,info resetting system configuration
2020-11-18 04:05:29 -0700 MST tr069,info performing config overwrite
Then it fails to boot and the customer needs to phisically reboot the device in order to come back. Watchdog doesn't seems to trigger because no matter how much time we wait, the device won't reboot by itself.
When I was testing at the lab, I had some episodes where rebooting or config overwriting was making the device unresponsive (no Winbox, no SSH, no nothing until unplug and plug it back) but it seemed random and I thought it was a problem with that specific device.
I really don't know if the crash happens when device is shutting down or booting up.
Does anyone have any ideas?