This has come up a few times for me (and others) when using netwatch with a type=icmp check. Basically netwatch icmp is more “picky” than the type=simple check since it also check various RTT for max/min/stdev as well as jitter. This is useful, but when netwatch decides if a host= is “up” or “down” since latency is an important consideration (vs just a ping, at some point came back)
Down side is netwatch with ICMP will use default values for the RTT and jitter “thresholds” – so often folks wonder why it’s “not working” when it just the default value is too low. More recent docs suggest:
Default Netwatch values are always used - even if they were not defined by the user. Make sure to check the “status” page of the probe to see if the default thresholds are appropriate for your use case. Default threshold values can be found under the “probe options” section on this page.
I’ve previously set the all the netwatch values based on the “Status” pages, “by hand” – to avoid the above defaults. And generally a better idea to THINK about what “how bad is too bad” with the particular upstream it’s monitoring – not all links the same.
I’d want to write a script to update a netwatch’s RTT/jitter threshold values (specifically thr-jitter, thr-max, thr-stddev, thr-avg) by some percentage % from the last test’s ACTUAL value. Below is a first attempt at a script function to do this, $scalenetwatch.
If the function at bottom of post is loaded, it works like this:
# create a new ICMP netwatch if needed
/tool/netwatch add type=icmp host=1.1.1.1 comment=scaletest interval=3s
# then use "find" to locate one to update (here it's find'ing the one above)
# to call the \$scalenetwatch function, which will set the RTT and jitter to +25% from last test
$scalenetwatch [/tool/netwatch find comment=scaletest]
What this will output is this:
$scalenetwatch [/tool/netwatch find comment=scaletest]
using default denom=4 or +25% of current value to set netwatch thresholds for ICMP RTT
hint: use $$scalenetwatch denom=4 [/tool/netwatch find host=1.1.1.1]
denom= is the adjustment expressed as: 1 /
so denom=2 means 1/2 or 50% - default: denom=2 or 25%
using adjustment of 25%
changed 1.1.1.1 rtt-avg = 00:00:00.011417500 [ diff: -2ms old: 00.013786 ]
changed 1.1.1.1 rtt-jitter = 00:00:00.000827500 [ diff: 0ms old: 00.000957 ]
changed 1.1.1.1 rtt-max = 00:00:00.011918750 [ diff: -2ms old: 00.014274 ]
changed 1.1.1.1 rtt-stdev = 00:00:00.000303750 [ diff: 0ms old: 00.000315 ]
To be more conservative than default 25%, there is a denom= parameter. denom=2 will mean 50%, and denom=1 mean double the current values. So here is same netwatch adjusted at 50%:
$scalenetwatch denom=2 [/tool/netwatch find comment=scaletest]
using adjustment of 50%
changed 1.1.1.1 rtt-avg = 00:00:00.013786500 [ diff: 2ms old: 00.011461 ]
changed 1.1.1.1 rtt-jitter = 00:00:00.000957 [ diff: 0ms old: 00.000897 ]
changed 1.1.1.1 rtt-max = 00:00:00.014274 [ diff: 2ms old: 00.012040 ]
changed 1.1.1.1 rtt-stdev = 00:00:00.000315 [ diff: 0ms old: 00.000312 ]
update 1.1.1.1 done
Here is the code for $scalenetwatch function. Please comment below if you have any suggestions/bug reports.
:global scalenetwatch do={
:local scaletime do={:return [:totime "$([:tonsec [:totime $1]] + ([:tonsec [:totime $1]]/$2))ns"]}
:local setchg do={
:local attrs [/tool/netwatch/get $1]
:local attname "thr-$[:pick $2 4 12]"
:local prev ($attrs->$attname)
:local diffms (([:tonsec [:totime $3]] - [:tonsec [:totime $prev]])/1000000)
[[:parse "/tool/netwatch set $1 $attname=$3"]]
:put "changed $($attrs->"host")\t$2 = $3 \t [ diff: $($diffms)ms old: $[:pick $prev 6 99] ]"
}
:local nwattrs
:do { :set nwattrs [/tool/netwatch get $1] } on-error={
:error "\$$0 requires an .id of netwatch – use [/tool/netwatch find host=1.1.1.1 type=icmp] or similar as arg"
}
# default 1/4 or 25%
:local ldenom 4
:if ([:typeof [:tonum $denom]]~"num") do={
:set ldenom [:tonum $denom]
} else={
:put "using default denom=4 or +25% of current value to set netwatch thresholds for ICMP RTT"
:put "\thint: use \$$0 denom=4 [/tool/netwatch find host=1.1.1.1] "
:put "\t denom=<num> is the adjustment expressed as: 1 / <num>"
:put "\t so denom=2 means 1/2 or 50% - default: denom=2 or 25%"
}
:if ($ldenom > 0) do={
:put "using adjustment of $(100 / $ldenom)%"
} else={
:local perc
# handle denom=0, which causes divide by zero error
:do { :set perc (100 / $ldenom) } on-error={ :set perc 0}
:put "warn: negative adjustment of $perc% - you may get failed tests"
}
:foreach k,v in=$nwattrs do={
:if ($k~"rtt-avg|rtt-jitter|rtt-max|rtt-stdev") do={
:if ($ldenom = 0) do={ $setchg $1 $k $v } else={ $setchg $1 $k [$scaletime $v $ldenom] }
}
}
:put "update $($nwattrs->"host") done"
:return [/tool/netwatch get $1]
}
TEST CASES
:global nw [/tool/netwatch find comment=scaletest]
:put [$scalenetwatch $nw denom=2]
:put [$scalenetwatch $nw denom=0]
:put [$scalenetwatch $nw denom=-4]
:put [$scalenetwatch $nw]
_Open Issues:
- I know a % be better than using denom= (e.g. 1/) — script got more complex since /tool/netwatch has a funky CLI before I could figure out the math for % with the “time” date types (e.g. you cannot multiple a time type in script). But it actually this local function be better if it “adjust time by some percentage”:_
:local scaletime do={:return [:totime "$([:tonsec [:totime $1]] + ([:tonsec [:totime $1]]/$2))ns"]}
_If someone had ideas how to adapt to use a percentage, open to suggestions here. I might be missing some shortcut in these conversions, but took a second to come up with the above. I gave up trying to figure a percent & took the “PCC approach” to use denominators (denom=) since that was simplier .
- I’m not sure what to do abut thr-stdev, current uses same %, but that’s not great since reading may have no history esp after change._