The problem I have, is that when the device that is being probed goes down, the latency returns a negative value “-1”. The high latency proble thinks this is ok because it is not above my latency threshold of 75ms. I would like the high latency probe to show as down if the latency value returned is negative.
This way, if the device is down the snmp_description function will return an empty string (zero length) and the probe will error with a return value of “Timeout” (or whatever else you put in there).
If the device is up then the snmp_description function will return the description of the device so that “if” statement will be true and the probe you are interested in will then be evaluated.
Function:
Name: ping_rtt
Description: Returns the round-trip time of a ping request to the FirstAddres of a device
Code: round(array_element(ping(device_property("FirstAddress")), 0))
Probe:
Name: latency
Type: Function
Agent: Default
Available: and(device_property("FirstAddress") <> "", ping_rtt()>-1)
Error: if(and(ping_rtt()>-1, ping_rtt()<200), "", if(ping_rtt()>-1, concatenate("Latency above 200ms with ", ping_rtt(), "ms"), "down"))
Value: ping_rtt()
Unit: ms
It displays the current latency in the error message (syslog) when above the limit and it shows down state when device is unavailable. You could even replace the ping probe with it.
You can also use the function to display the rtt to a device on a map, just add [ping_rtt()] to the Label.
Just FYI, in the dude -1 is shown to be “FALSE” in various probes on the wiki and some manuals, -1 is not FALSE. When designing a probe make sure that 0 is used for false and if 0 is a valid result add one to the result and subtract one on the error line.
I.e. I have a cisco CPU probe and it can average 0% utilization.
Function Cisco_CPU_a - returns false if not available, returns cpu +1 if available.
if(string_size(oid(“1.3.6.1.4.1.9.2.1.57.0”, 10 ,5)), oid(“1.3.6.1.4.1.9.2.1.57.0”, 10, 5)+1 ,“False”)
Probe Cisco_CPU - detects false on the available line and subtracts the added 1 on the error line.
available: Cisco_CPU_a() <> “False”
Error: if(Cisco_CPU_a()<>“False”,if(Cisco_CPU_a() -1< 80, “”, concatenate("Warning: high CPU = ", Cisco_CPU_a() -1, “%”)), “Cisco Device down”)
It seem that yolodrew was right. I have tested the ping function in a lab environment and it returns -1 for false/down, and it returns 0 or more representing the rtt in ms as the first argument.
But thanks for the heads up!
This is actually one of the things that really annoys me about The Dude. I have to test everything in order to determine how something works so I can write something I need. There is no documentation whatsoever and they seem not to care so much about bug fixes and much needed features.
And the tool is amazing! I have used many tools from Cacti to IBM Tivoli Netcool OMNIBus & Proviso, but this tool has somehow grown on me
Yep totally agree, I spent a bunch of time learning how probes and functions work and made every mistake imaginable. Most of the stuff is documented here.
The latency probe is working well.
For some reason I’m unable to use this probe using a remote agent, the ping is made from the local Dude server and not the agent.
Maybe someone has an idea on how to accomplish the following:
Adding a latency number to a link (Maybe even color it on high value) or even do a remote ping from agents and display it on map? maybe even create a graph of it.
I seem to have a problem where a particular device says the Latency probe is “not available”, and then later says it is, but at the same time, the ping probe never had any issues. It is very frustrating because of the false alerts. I have made individual probes to see if perhaps the first address or ping_rtt were throwing errors, but they dont seem to have an issue. (I am refreshing the DynDNS on this device every minute)
I also have the same issue with trying to use the probe with a remote agent as above… just doesn’t work. I even tried adding the function and probe to the remote agent manually, and it still doesn’t work.
fbsdmon thanks for the support.
But what do u mean by the word “label” in “You can also use the function to display the rtt to a device on a map, just add [ping_rtt()] to the Label.”