Hello,
During last 2 weeks I’ve upgraded several RB2011 to ROS 6.38.5 and latest firmware (3.33). All of them are dialing an openvpn connection to a central server for management purposes. Also, all have a watchdog rule to reboot if they cannot ping the VPN management IP.
Yesterday, that management IP became unavailable for a minute and 3 of the routers become unresponsive. They were answering to ping, but that’s all; I was not able to log on them anymore and they were not routing traffic. We have setup a syslog server and I could see there that all 3 of them were sending this to log, every 10s, before fully crashing: “cannot ping address , rebooting”.
The only thing that helped was a power cycle and I noticed the boot time was fairly long (about 1 minute) compared to the way they boot up after I issue a reboot command.
On one of the router I could see this lines in the log, after it was rebooted: router rebooted/kernel failure in previous boot.
Routers run the same basic configuration (notable differences are LAN range and openvpn certificates) for over a year. We did not make any change lately, except upgrading the ROS from varios 6.3x versions. We have frequent watchdog reboots due to some issues we have with uplinks (that’s the reason watchdog is highly required in our setup), but this never happened before; there are about 15 routers running the same basic configuration for 1-2 years and I’ve never had one router frozen like that, then now there are 3…
Is anyone else seeing this behavior? Any idea what I could do, besides disabling the watchdog?
I still have one I was not able to powercycle, if you see any test I could run/info I could extract after it reboots.
Thank you!

