Repeated spontaneous "rebooted without proper shutdown by watchdog timer"

Sun Sep 08, 2019 10:09 pm

Over the past couple years, we have seen an increasing number of devices spontaneously reboot, followed by the "rebooted with proper shutdown by watchdog timer" log entry. We aren't using the watchdog function to ping anything, so I can only assume this is a built-in or hardware watchdog triggering the reboot.

Needless to say, this is extremely disruptive--and especially intolerable for SIP users. We don't want to have to regress to 6.40 or earlier to increase the stability of these devices. Has anyone else been seeing this problem? Or better still, found a solution for this problem?

FWIW, on most of our production RouterBoards, this never happens; it happens only with a few specific RouterBoards, in some cases on a daily basis:

10/13 OmniTIK 5 PoE ac (RouterBOARD OmniTIK PG-5HacD)
3/18 hEX PoE (960PGS)
3/5 PowerBox Pro (960PGS)
3/8 SXT SA5 ac (SXT G-5HPacD)
1/3 hAP ac (RouterBOARD 962UiGS-5HacT2HnT)
1/1 RB921GS-5HPacD r2 (921GS-5HPacD r2) mANT19S

So, 77% of our 13 installed OmniTIK 5 PoC ac devices have this problem; 26% of our 23 installed RB960PGS boards; 38% of our 8 installed SXT SA5 ac devices; etc. Other devices do this occasionally but it's rare.

The OmniTIK 5 PoE ac is by far the most affected. The first of those started spontaneously rebooting a year or two ago--and as we have upgraded ROS in the hope of clearing up these reboots, it has only become more frequent and started occurring among other devices. In terms of configuration, for the most part they're running a handful of mangle rules and Queue Trees on some or all of their interfaces for SIP; bridging interfaces; running nv2 in ap-bridge mode.

Sending supouts to MT won't help; I have already been told that supouts generated by watchdog timeouts don't include debug information that would be useful in determining the cause of the timeout. The only way to get useful debug information is to disable the watchdog, wait for the device to lock up, and physically power-cycle the device--NOT an option in a production environment.
Re: Repeated spontaneous "rebooted without proper shutdown by watchdog timer"

Tue Sep 10, 2019 10:19 am

yes, repeated restarts from few times in one day to one in few days - last 3-4 monthes

RouterBOARD 962UiGS-5HacT2HnT

load - about 20-100 MBps, sometimes 200-300 MBps (reboot time no co-relate)
temp: +50°C
HDD 3,2 MB free off 16 MB, 0,00% bad blocks
memory 102,4 MB free off 128 MB
CPU load: 5-10-20%

a get advice's, that:
- my router rules are improper (stupid?),
- someone does DDOS

it would be nice if I can LOG (internally or to remote syslog server) something valuable and related to these restarts

