On wireless links it is really a problem to get notification when link becomes bad. The problem is that link sometimes becomes bad long before becoming down. It still works, packets are not lost, but speed is terrible, pings are 20-50ms on ptp link.
We need to get notification (email is ok) when it happens. The netwatch tool can’t help here as packets are not lost, they are still transmitted, but too slow to make client happy. Various stats from wireless interface seems to be too variable to make some thresholds.
It seems to me that the only reliable thing that can help to identify this problem is when average ping for about 2-3 minutes is bigger than some predefined value for this link. Usually the problem is when ping becomes between 10-20ms on ptp links. This makes dude’s ping useles, as it’s time resolutions seems to be 10ms. So all pings in dude are 0, 10, 20, … ms, which makes it useless for averaging.
So, the question is: is there a way to get actual ping times (or any other round-trip-times) somehow between 2 miktorik routers via script or dude or in any other way (snmp?), than can be collected, averaged & compared to some threshold?
you can get min (min-rtt), max (max-rtt) or average (used above) times, or even lost packets (recived - sent). Play with it, and see what gives you the best indication of a faulty connection.
You can set the interval, you can set the timeout and you can make any script you want.
Now you have to decide when you consider a link to be down, bad, poor, moderate or fast.
To WirelessRudy: thanx for that information, I should have read docs more deeply. This solution might be a good workaround. While it still doesn’t give actual RTT time, it might give some approximations that can be used to count rtt times between some threshold values and I think it is really possible to build what I need based on this solution. Still it seems to be more complicated and less acurate than bburley posed above.
To psamsig & bburley: thanx a lot, this is definitely what I will try. Looks like it is exactly what I need!
netwatch is perfect, just read the documentation and care to find timeout argument and care enough to set to something small, like 10ms and set up/down script carefully. For example, on down, disable; enable link set up variable with some info, on up script send e-mail with info saved in variable. (That is, i expect this downtime to be short enough) if e-mail is empty, means router was rebooted between down/up scripts.
Yes, netwatch is perfect. Maybe I just misunderstood something. This is how I see it: netwatch is just for another purpose, it’s meant to monitor link up-down events. I don’t want to monitor link up-down status. I already do it with dude. What I want is to detect when average RTT on channel for last 10 minutes becomes more than let’s say 10ms. I don’t care about sporadic fluctuations and packet loss. Again, I can see it in dude and client will forgive me, if he has 10 seconds problem. What I really need is to see that link is constantly having high latency for quite long period of time. I want to be notified by email, saying “link between A and B seems to be quite slow for already 30 minutes, take a look at it”. That’s important, because clients expirience slow internet at this time, even if there i no packet loss at all and they hate slow internet. I can do that using series of netwatch commands with different timeouts, separate scripts for each command counting how many times each script was executed and doing some math after that. But why should I if I can have same numbers directly measured by series of flood pings each 1-2 minutes? Again, I don’t need absolute values. I don’t care about some up-down moments and fluctuations at this point. I need average for minutes. Taking the above mentioned into consideration - is it still better to do this with netwatch? What did I miss?
well, you can configure netwatch to certain latency value, like these 10ms, after 3 packets are timeouted netwatch executes on-down script, it does not mean that link is down, it means that all ICMP messages it sent was not in configured parameters. When one of probes succeeds, and status was down, status change is initiated and on-up script is executed. it has nothing to do with link up/down or anything else, but if probes are within configured parameters and that is exactly what you are looking for.
For example you have 3 routers A<---->B<---->C and you want to netwatch C from A, you can induce latency on B, that it is over 50ms, when you set up netwatch on A with timeout value of 10ms, it will be in state=down even so link is working without problems and ICMP probes are throttled on B.
if You would like to monitor link’s quality, why not trust in CCQ values? (especially if the link has traffic)
I think - with the API - it’s not too difficult to do the following:
if($link-traffic>$1mbps)
{
get ccq values
if ($ccq<80%) alert
}
else
{
flood ping
get ccq values
if ($ccq<80%) alert
}