I am using e-mail notifications from Dude about nodes going down and up.
In situations when service is unstable, its very nice to use a programmed delays for notification sending.
Unfortunately this delay is configurable only for DOWN event and is not influencing related UP messages.
Maybe there is some workaround to not send UP messages, if related previous DOWN message is still on delay and will be automatically canceled ?
Or its suggestion for Dude development ?
As I understand, currently only one way to affect notifications in case of service flapping is:
Probe Interval, Probe Timeout, Probe Down Count on device/probe level.
If you will configure a delay you will got notification anyway, because Dude doesn’t think about current probe status before it send repeated notification.
So, you may try to increase Probe Interval or Probe Down Count in order to get rid of false positives.
With “programmed delay” i mean exactly thous things (combination of them) You are mentioning – Interval, Timeout, Down Count.
So I am in a following situations: #1
Probe is going really down and after some 10-15 min i get DOWN notification.
Service is restored and i got UP notification IMMEDIATELY.
Its OK.
#2
Probe is going down for smaller period then in situation #1. Lest say 5-8 min and it resulting in not sending out a DOWN notification at all. Its still fine.
When probe is UP again, i have UP notification without any related DOWN message before that.
Its not OK, as on flapping service during a day i receive a huge amount of useless UP messages.
There is need to filter out thous UP messages, where no DOWN message was send out.
Maybe I am doing something wrong?
Or such feature is not implemented yet.
Dude Outage time and Service down time
For example, for 10 minutes down period we may use:
Probe Interval = 30 seconds
Probe Timeout = 5 seconds
Probe Down Count = 20
In such case we should:
a) Get DOWN notification after 30 * 20 = 600 seconds (10 minutes)
b) Get UP notification after max 30 seconds after service will return back on-line.