Hello all,
I posted about this issue awhile ago and received some helpful advice regarding the notification delay option. Now after playing round with this for awhile, the delay feature does not appear to do what I need it to.
Here’s an example of a usual network:
Cable Modem → Router → Switch → AP1 - AP24
My dependencies are set so that all the APs are children of the switch, the switch is a child of the router and the router is a child of the cable modem. With default settings, if the cable modem goes down, I receive down notifications for nearly all devices (and corresponding up notifications). I assume this is because The Dude is polling child services as different times than parent services, thus finding an AP down before it realizes the cable modem or router is down. This is a similar problem in other SNMP and network monitoring software suites and the solution on in particular (Intermapper) implements (with much success) is a delay option.
In Intermapper, I set all child devices an email delay of 3 minutes. The probing interval is still 30 seconds or so. Commonly Intermapper will find a child device down, update the devices status in the Intermapper client software to down and then wait 3 minutes. If Intermapper finds that no parent devices are down within 3 minutes, it then sends the down message. If Intermapper finds a parent device is down inside the 3 minute window, but that parent device then comes back up within the 3 minute window, then NO emails are sent for the child device.
In contrast, the behavior I’m experiencing with The Dude is as such:
I have the delay on child devices set to 3 minutes. If the parent device goes down and The Dude finds a child device down first, it starts the 3 minute timer. The parent device is detected as down and then comes back up again, the last notification email for the child device is the “up” message. Even if at the end of the 3 minutes timer, both the child device and parent device are up, the “up” notification is still sent. Therefore, when an entire network goes down (with 40 access points), I get flooded with hundreds of service up notifications for devices that were never even considered down (and down messages were never sent for).
Is this intended functionality? It really makes the product unusable for us if the basic dependency features don’t work well. It essentially makes email notification useless, since we’re sifting through hundreds of pointless emails when all we care about is that the cable modem went down. We’re only monitoring 6 of our 40 properties and I can’t imagine implementing the rest of them without finding a solution to this problem.
Unfortunately, this leaves us in a bind since we love The Dude and find that it really is unmatched in many areas. It makes it especially difficult for us because we’ve already settled on deploying Mikrotiks to all of our properties and love that we automatically get an onsite agent for monitoring. In fact, the only product that comes close to The Dudes mapping features is Intermapper, which is a solid offering, but would require port forwards (or an onsite hardware agent) for everything behind the router that we want to monitor.
Anybody have any thoughts on this issue? Anyone have this experience and come up with a solution? I’d love if The Dude’s developers could chime in as well.
Many thanks!