Repeated spontaneous "rebooted without proper shutdown by watchdog timer"

Hotz1 · Sun Sep 08, 2019 10:09 pm

Over the past couple years, we have seen an increasing number of devices spontaneously reboot, followed by the "rebooted with proper shutdown by watchdog timer" log entry. We aren't using the watchdog function to ping anything, so I can only assume this is a built-in or hardware watchdog triggering the reboot.

Needless to say, this is extremely disruptive--and especially intolerable for SIP users. We don't want to have to regress to 6.40 or earlier to increase the stability of these devices. Has anyone else been seeing this problem? Or better still, found a solution for this problem?

FWIW, on most of our production RouterBoards, this never happens; it happens only with a few specific RouterBoards, in some cases on a daily basis:

10/13 OmniTIK 5 PoE ac (RouterBOARD OmniTIK PG-5HacD)
3/18 hEX PoE (960PGS)
3/5 PowerBox Pro (960PGS)
3/8 SXT SA5 ac (SXT G-5HPacD)
1/3 hAP ac (RouterBOARD 962UiGS-5HacT2HnT)
1/1 RB921GS-5HPacD r2 (921GS-5HPacD r2) mANT19S

So, 77% of our 13 installed OmniTIK 5 PoC ac devices have this problem; 26% of our 23 installed RB960PGS boards; 38% of our 8 installed SXT SA5 ac devices; etc. Other devices do this occasionally but it's rare.

The OmniTIK 5 PoE ac is by far the most affected. The first of those started spontaneously rebooting a year or two ago--and as we have upgraded ROS in the hope of clearing up these reboots, it has only become more frequent and started occurring among other devices. In terms of configuration, for the most part they're running a handful of mangle rules and Queue Trees on some or all of their interfaces for SIP; bridging interfaces; running nv2 in ap-bridge mode.

Sending supouts to MT won't help; I have already been told that supouts generated by watchdog timeouts don't include debug information that would be useful in determining the cause of the timeout. The only way to get useful debug information is to disable the watchdog, wait for the device to lock up, and physically power-cycle the device--NOT an option in a production environment.

ieleja · Tue Sep 10, 2019 10:19 am

yes, repeated restarts from few times in one day to one in few days - last 3-4 monthes

RouterBOARD 962UiGS-5HacT2HnT
6.45.5

load - about 20-100 MBps, sometimes 200-300 MBps (reboot time no co-relate)
temp: +50°C
HDD 3,2 MB free off 16 MB, 0,00% bad blocks
memory 102,4 MB free off 128 MB
CPU load: 5-10-20%

a get advice's, that:
- my router rules are improper (stupid?),
- someone does DDOS

it would be nice if I can LOG (internally or to remote syslog server) something valuable and related to these restarts

Hotz1 · Wed Sep 18, 2019 8:18 am

yes, repeated restarts from few times in one day to one in few days - last 3-4 monthes

RouterBOARD 962UiGS-5HacT2HnT
6.45.5

load - about 20-100 MBps, sometimes 200-300 MBps (reboot time no co-relate)
temp: +50°C
HDD 3,2 MB free off 16 MB, 0,00% bad blocks
memory 102,4 MB free off 128 MB
CPU load: 5-10-20%

a get advice's, that:
- my router rules are improper (stupid?),
- someone does DDOS

it would be nice if I can LOG (internally or to remote syslog server) something valuable and related to these restarts

Some of our devices that do this are carrying no traffic at all; e.g., an SXT Lite5 whose far-end counterpart has been powered down for months.

aagneth · Wed Nov 20, 2019 10:53 am

Over the past couple years, we have seen an increasing number of devices spontaneously reboot, followed by the "rebooted with proper shutdown by watchdog timer" log entry. We aren't using the watchdog function to ping anything, so I can only assume this is a built-in or hardware watchdog triggering the reboot.

Needless to say, this is extremely disruptive--and especially intolerable for SIP users. We don't want to have to regress to 6.40 or earlier to increase the stability of these devices. Has anyone else been seeing this problem? Or better still, found a solution for this problem?

FWIW, on most of our production RouterBoards, this never happens; it happens only with a few specific RouterBoards, in some cases on a daily basis:

10/13 OmniTIK 5 PoE ac (RouterBOARD OmniTIK PG-5HacD)
3/18 hEX PoE (960PGS)
3/5 PowerBox Pro (960PGS)
3/8 SXT SA5 ac (SXT G-5HPacD)
1/3 hAP ac (RouterBOARD 962UiGS-5HacT2HnT)
1/1 RB921GS-5HPacD r2 (921GS-5HPacD r2) mANT19S

So, 77% of our 13 installed OmniTIK 5 PoC ac devices have this problem; 26% of our 23 installed RB960PGS boards; 38% of our 8 installed SXT SA5 ac devices; etc. Other devices do this occasionally but it's rare.

The OmniTIK 5 PoE ac is by far the most affected. The first of those started spontaneously rebooting a year or two ago--and as we have upgraded ROS in the hope of clearing up these reboots, it has only become more frequent and started occurring among other devices. In terms of configuration, for the most part they're running a handful of mangle rules and Queue Trees on some or all of their interfaces for SIP; bridging interfaces; running nv2 in ap-bridge mode.

Sending supouts to MT won't help; I have already been told that supouts generated by watchdog timeouts don't include debug information that would be useful in determining the cause of the timeout. The only way to get useful debug information is to disable the watchdog, wait for the device to lock up, and physically power-cycle the device--NOT an option in a production environment.

I have exactly the same experience.
Even numbers and percentages fit exactly. OMNItik AC is the worst.
"/interface bridge settings set use-ip-firewall=no" helps, but it's not a solution.
We need this feature because of mangling.

helipos · Fri Nov 22, 2019 2:18 am

I've had a device or two do this, they have been helped a LOT by upgrading the device firmware.
You could try the same.

unique82 · Wed Jul 29, 2020 11:14 am

Hey guys any solution ?
I can confirm Omnitik AC POE out been affected in huge amounts in our network, using 6.44.3 as ROS and Bootloader versions, still getting the same.
I even encountered issue(several times by now) where we installed a brand new omnitik and it started to reboot by watchdog all the time, we installed new cable/RJ45/new PWsource and nothing helped. So we installed another one just next to it with same ROS+FW version and setup and that one worked fine.
Im getting rly upset by this , installing SXT 90° sectors instead now, but in some cases we need omnidirectional anntenas and this state is becoming unbearable.
I think more and more in last year that we have to shift form MK network to UBNT..
Well im gonna try latest ROS version etc..again

but its obvious to me that this is 95% some HW/manufacturing problem...

ieleja · Sat Oct 31, 2020 12:12 am

for me this behavior stopped without any visible reason, for now configuration is even complicated, with proper dual WAN redundancy, uptime for now was 55 days, firmware 6.47.3, which I just now upgraded to 6.47.7

I think, that I must for this thank Mikrotik software developers

Repeated spontaneous "rebooted without proper shutdown by watchdog timer"

Repeated spontaneous "rebooted without proper shutdown by watchdog timer"

Re: Repeated spontaneous "rebooted without proper shutdown by watchdog timer"

Re: Repeated spontaneous "rebooted without proper shutdown by watchdog timer"

Re: Repeated spontaneous "rebooted without proper shutdown by watchdog timer"

Re: Repeated spontaneous "rebooted without proper shutdown by watchdog timer"

Re: Repeated spontaneous "rebooted without proper shutdown by watchdog timer"

Re: Repeated spontaneous "rebooted without proper shutdown by watchdog timer"

Who is online