Watchdog biting on an unreliable connection - queue issue

Hi everyone,

I’m looking for some advise on the best way to insure that my watchdog ping packet gets prioritized above all traffic.

This is what I’ve tried:
[username@device] > /queue tree print
Flags: X - disabled, I - invalid
0 name=“watchdogQueue” parent=global packet-mark=criticalPacket limit-at=1G queue=default priority=1 max-limit=1G burst-limit=0 burst-threshold=0 burst-time=0s bucket-size=0.1
1 name=“otherPackets” parent=global packet-mark=no-mark limit-at=1k queue=default priority=8 max-limit=10k burst-limit=0 burst-threshold=0 burst-time=0s bucket-size=0.1
2 name=“importantQueue” parent=global packet-mark=importantPacket limit-at=1k queue=default priority=4 max-limit=500M burst-limit=0 burst-threshold=0 burst-time=0s bucket-size=0.1

[username@device] > /ip firewall mangle print
Flags: X - disabled, I - invalid, D - dynamic
0 ;;; mark watchdog as critical
chain=postrouting action=mark-connection new-connection-mark=criticalConnection passthrough=yes protocol=icmp dst-address=1.2.3.4 connection-mark=no-mark log=no log-prefix=“”
1 ;;; set critical dscp
chain=postrouting action=change-dscp new-dscp=63 passthrough=yes connection-mark=criticalConnection log=no log-prefix=“”
2 ;;; mark critical packets
chain=postrouting action=mark-packet new-packet-mark=criticalPacket passthrough=no connection-mark=criticalConnection log=no log-prefix=“”
3 ;;; mark winbox as important
chain=postrouting action=mark-connection new-connection-mark=importantConnection passthrough=yes protocol=tcp connection-mark=no-mark dst-port=8291 log=no log-prefix=“”
4 ;;; mark winbox as important
chain=postrouting action=mark-connection new-connection-mark=importantConnection passthrough=yes protocol=tcp connection-mark=no-mark src-port=8291 log=no log-prefix=“”
5 ;;; mark ssh as important
chain=postrouting action=mark-connection new-connection-mark=importantConnection passthrough=yes protocol=tcp connection-mark=no-mark src-port=22 log=no log-prefix=“”
6 ;;; mark ssh as important
chain=postrouting action=mark-connection new-connection-mark=importantConnection passthrough=yes protocol=tcp connection-mark=no-mark dst-port=22 log=no log-prefix=“”
7 ;;; set important dscp
chain=postrouting action=change-dscp new-dscp=50 passthrough=yes connection-mark=importantConnection log=no log-prefix=“”
8 ;;; mark important packets
chain=postrouting action=mark-packet new-packet-mark=importantPacket passthrough=no connection-mark=importantConnection log=no log-prefix=“”

As the connection can be very flakey I’m connecting over ssh as Winbox uses a lot of data and can break the connection on its own.

Based on /queue tree print stats I can see that it is dropping packets as expected in the “otherPackets” queue - ie all unmarked packets.

I’ve also confirmed that the icmp connection in question is correctly marked in the /ip firewall connections tab.

But after a few minutes I’m getting a shutdown/reboot alert on my ssh connection due toa the watchdog having bitten and then the system goes down for a reboot anyway.

What else can I do to insure that the watchdog ping goes out above all else?

First prize would be a way to do this without having to set a specific max-limit as I’m using a mobile connection and the signal strength is always fluctuating.

My personal view (I’m sure many around here will disagree) is that ICMP with so many network admins (and “admins”) blocking it is inherently unreliable. Thus it’s unfit to depend upon for device watchdog unless you control all devices involved. E.g. it is probably fine to use pings against some other local device which is running reliably (could be some managed ethernet switch in the same LAN … that’s what I’m using). Surely you have to use remote address for checking some PtP link (e.g. PPPoE or VPN or something similar), but the ping failure in this case should not reboot router but rather restart only the interface being monitored.

I agree. It would be great if Mikrotik gave us more options under /system watchdog than just icmp, but it is the only option that I have at the moment.

I’m seeing some peculiar results where my pings work well and then just start timing out and continue timing out for a long time. Normally during this time the device reboots. But sometimes it responds in time and averts the reboot.

The strange thing is that while this ping is timing out I’m connected to the router over the Internet and my ssh session is still responding just fine. So something must be prioritizing other connections over the ICMP connections. Any ideas?