I have a system that is being rebooted by the watchdog timer, I know the reason for the reboot, and I have to love with it. But my question is about why is the watchdog timer not doing the reboot properly? Every time it reboots I get an error telling me it rebooted without proper shutdown. Does watchdog time just power down the unit and then power it back up? Why is it not going thru the proper reboot procedure? Is there a script that does the same as the watchdog but reboots it properly? At least for loss of pings? The kernel failure causing a reboot error is understandable, but not for loss of ping response. Thanks for any suggestions.
Hello
Watchdog will reboot the system because it has become unresponsive, ex: CPU overload or crashed. In such situation proper shutdown is not possible. And it will generate warning indeed on restart.
I am doing it based on Pings not unresponsiveness. So it should shutdown normally and reboot normally. Or does watchdog just power it down?
I guess there are plenty of ways of detecting unresponsiveness. Pinging some other network device is one of those. If you don’t like how it reacts to failed pings, then don’t use this test for watchdog. Or don’t use watchdog to react on ping timeouts.
I know what it is supposed to do, and I want it to do that. What I am trying to understand is HOW watchdog reboots the system. Does it go thru a real reboot, or does it just power down the device. If it is supposed to be rebooting normally, then I have a problem. If it just shuts down power briefly then what I am seeing is normal. I just want to make sure one way or another.
i think watchdog is a hardware function
It has some hardware circuits in detection.
so it will shut down the power when occur some serious problem
Thanks shiyiqiang08
If the device becomes unresponsive, then it might be due to really severe reason (like kernel being stuck in some loop) and in such case it’s impossible to perform proper shutdown. I guess it’s not up to watchdog routine to discover the severity of unresponsiveness … watchdog reboot is the last possibility to get device responsive again after all.
But then, one really wonders if the system was completely stuck after all before watchdog started reboot … if you’d have logs saved to a USB storage (remote syslog probably doesn’t help here), you’d see something like this:
Oct/07/2018 15:52:34 watchdog,error,critical watchdog cannot ping address 192.168.42.11, rebooting
Oct/07/2018 15:52:48 missed 21 messages while could not open log file
The second line is there because there were 21 log messages after devices started to boot before USB storage got available again. What I’d really like is that those messages are actually flushed to the disk log (they are available in memory) instead of notice that there were that many messages …
Why are you not using Reboot function (proper shutdown) instead of watchdog.
Watchdog pulls the power of a device, and will never do a clean shutdown!
Watchdog must work even if all is stuck or blocked, unresponsive, overruning, heating etc.
So its a HW power cut of the most primitive (= reliable) way.
In other words it is like if you pull the power cord of your router out of the plug and put it in again.
This is what I wanted to know. The reason I am not using a normal reboot is because I need this to reboot when communication is lost. The other end occasionally does not get any traffic thru this radio (still trying to figure that out), but rebooting this AP fixes the problem.