Child -> Parent relationships between devices and email

Hey all, new user to The Dude and I am amazed by how powerful it is already. Keep up the great work!

Now on to my issue. My department manages large hotel network installations, usually consisting of one or two guest routers and between 10 and 100 access points. At my previous employer, we used Intermapper which also allowed for the parent → child relationships to help control the amount of emails that are sent when the main modem/router/etc goes down. Right now at one of our bigger properties (30+ APs), the Charter cable modem goes down, thus the Mikrotik, and all APs are inaccessible. The Dude then proceeds to send out emails for all services on all child devices of the cable modem, essentially making the email notifications useless. All APs have their parent set to the Mikrotik, which in turn is parented to the cable modem.

Anyone have any strategies to try and avoid this. One thing I tried was to adjust the polling intervals on all child devices to 2 minutes and interval on primary devices to 20 - 30 seconds. This was done in the hope of making sure The Dude realizes the modem is down BEFORE realizing any of the cable modems are down.

Intermapper handled this situation by introducing a delay option to notifications. You could set the delay to a few minutes and even if Intermapper found that the child device was down before its parent, it would wait the “delay” amount of time before sending any email, thus giving Intermapper time to discover that the parent is down.

Any thoughts on mimicking this functionality?

Thanks!

There is a dependency setup, check top right corner switch from layer link to layer dependency. Draw your dependencies…

I have not used this feature.

The dependency layer simply shows me the child → parent relationships I’ve already defined using the “parent” field in device settings. The problem occurs when The Dude finds the access points down before the modem or guest router.

Nice, just thought you missed something… uhh damn :frowning: So there are serious issues using the default negative cache times. I found that having a higher cache time than negative cache time is very beneficial when dealing with false positives as well. The underlying issue is how long you wait to send the first email and the fact that negative cache time is 300 seconds. Since a failed ping will cause a device to be down for 5 minutes due to negative cache time you have to delay the first email to at least a little longer than that for the dude to have a chance to clean up the mess before sending it out.

AND there is a bug-feature as well, a single failed SNMP read registers in negative cache time for 300 seconds and the device will be down for 300 seconds. It seems subsequent SNMP polling will not be paid attention to(which seems like odd behavior). So I highly recommend forcing all functions and probes to have a low negative cache time ~10 and lower than the probe interval. Which makes negative cache time unused? I digress…

So before doing anything set the email delay to 15 minutes and test break the router to see if the dude is smart enough to clear out all incorrect emails if the delay is high enough. Then if that works go fix all the functions and probes and set your email to 3 minutes?

GL
Lebowski

Thanks for the tip! To be honest I didn’t even realize there was a delay option for notifications. I don’t know how I missed it but I’m playing around with it now.

Thanks again!