I’m trying to use The Dude and I’ve reached a trouble that I’m not capable to resolve by myself.
The thing is, i’m monitoring voltage and temperature over our network, all devices that have this feature of report voltage and temperature, they are shown by the MAP and graphing.
I guess the probe is alright, but, notification is my problem.
I’ve setup an email notification but I’m receiving lots of emails every time that SNMP couldn’t receive its value.
So, I do receive a notification about “not-available” … I don’t really know why, but I guess that is because sometimes our Mikrotik routers they reboot by whatchdog and became unavailable by some time.
How to be able to only receive notification if the error matches the IF settled at the probe?
I don’t want to receive all of those notifications about unavailable, because, I just want to pay attention to this if is really an error.
Another question about that, how to avoid beein notified multiple times, if the device has multiple probes…
eg, probe for ping should be like a master probe. If ping is down, I don’t care about the others, because ping seems that the device is down so other probes will probably fail as well.
Something like the parents, if we set a parent device with probe interval shorter than child’s normal probe interval … I wont get notified about childs, just about the parent, which seems logic, if parent is down, of course childs will be down as well.
This would be super handy. We have sites and it would be good to set a master probe if ‘x’ is down do not notify about ‘y’ etc. When we are doing quick maintenance we get bombarded with snmp probes that time out for an instance.
A button in the bude to quickly silence all e-mail alerts would be cool too.
I agree that it would be nice to prevent multiple timeout notifications. From a programming standpoint though, I don’t think choosing just one master probe would do the trick. When you power cycle a device all services go down. Depending on where in the cycle each probe is in, a different probe could be the first to time out each time. You almost need to delay notification using the probe down count to delay the notification long enough for all of the services to go down during a typical power cycle. Once the first probe reaches the end of the count, the dude should be programmed to send a single notification including information for all other services that are currently in a down count and terminate the down count on the other services to prevent double notification.
What we need is the ability to script notification processing so that all probe status change events within a specific interval are combined in to a single notification. Basically, we need to be able to accumulate notifications over a period of time and send a single notification with the cumulative data at certain intervals. Scripting should be enabled so that the user can choose trigger conditions for immediate notification for certain conditions or combination of probe timeout. This would also allow the creation of a master and slave type setup mentioned above.