NETWATCH IS BROKEN!!!

I can not get netwatch to work properly.. im using a starlink and a cellular netgear hotspot to a L009 on 7.11.12 and netwatch is just the most unreliable thing ive ever seen..

 1   ;;; Internet Test - WAN1
     host=1.1.1.1 type=icmp interval=15s 
     up-script=/ip route enable [find where comment=WAN1] 
     down-script=/ip route disable [find where comment=WAN1]  test-script="" 
     packet-interval=50ms packet-count=30 thr-loss-percent=100% http-codes="" 
     status=up 

 2   ;;; Internet Test - WAN2
     host=1.0.0.1 type=icmp interval=15s 
     up-script=/ip route enable [find where comment=WAN1-21] 
     down-script=/ip route disable [find where comment=WAN1-21] test-script="" 
     packet-interval=50ms packet-count=30 thr-max=800ms thr-loss-percent=100% 
     http-codes="" status=up

These are the most forgiving netwatch rules ever.. and im running a ping from the router to 1.1.1.1 and there is no drop at all.. and I even set the failure percentage to 100% and still it’ll execute the down script when the netwatch loss percentage was only 30%..

What is the deal.. I would really appreciate some engagement with this, ive recently kind of put all my eggs in my fail over setup in this netwatch basket. And I know recursive is out there but ive seen that do some super weird stuff, and when I saw it, it was on a fiber connection. So I dunno, is there something im entering wrong here.

This is a peculiarity of Mikrotik, the parameter doesn’t seem to exist, but in fact it does.
https://help.mikrotik.com/docs/display/ROS/Netwatch

In fact, the default settings are as follows
Screenshot_27.jpg
Triggering is not based on the number of lost packets, but on Thr Avg. Run a long ping and set your parameters in the settings
PS These default settings cannot be seen in the device itself, they are not displayed even by the verbose key, they are described only in the documentation (and who reads it).

Thank you so much. What a critical piece of information that I did not know and I guess maybe I glossed over when trying to figure this out. I felt that it was triggered by any of the values that were declared. Now knowing it’s the “Avg” I can adapt to that.

Really appreciate it. I’ll make the changes this morning and get back to post.

still triggering down and I have the thr. avg at 600.00 ms..

I dont understand :frowning:
Screenshot 2023-11-13 at 8.06.20 AM.png

Now here is one that is a failure along with how the ping looks from the terminal, is ping from the terminal now a safe way to gauge how to configure netwatch ?
Screenshot 2023-11-13 at 8.12.56 AM.png
Screenshot 2023-11-13 at 8.13.54 AM.png

You didn’t show what kind of sills you installed.
It will be triggered when any threshold is reached.
If you claim to have observed an average ping of 600ms, you can set such thresholds.
Screenshot_31.jpg
Then the main indicator will be 85% packet loss. If false alarms continue, increase the thresholds further.
On the last screenshot above the standard deviation parameter was triggered, the threshold was 250 on the screenshot 329.

I tried those values, thank you again for engaging, been battling this for a few weeks now, im already seeing a massive improvement. now I feel better on how I should play around with this more.

Is there maybe a log to show exactly the conditions that triggered the down? or maybe a way to add a script to show more detail

You can use variables in your script to read it. See docs:
https://help.mikrotik.com/docs/display/ROS/Netwatch#Netwatch-Probestatistics/variables

It just has the count of failures ($“failed-tests”)… but you can compare it use a “get” on the each of the config variables for the netwatch to see if equal = to realtime/returned variables in each success/failure/test script.