Dude Outage time and Service down time

Hello!

Dude have an Outages tab, where we may see:
Dude-Outages-duration-and-Service-down-duration-01.png
Status: Outage state
Time: When the outage start
Duration: How long does outage is last
Service: Failed service

By using this information we should be aware about polling settings: Probe Interval, Probe Timeout, Probe Down Count
We may use the following variables: [Service.TimeSinceChanged], [Service.TimeLastUp], [Service.TimeLastDown] inside notifications

Service [Probe.Name] on [Device.Name] is now [Service.Status] at [TimeAndDate] - Service.TimeSinceChanged: [Service.TimeSinceChanged] - Service.TimeLastUp: [Service.TimeLastUp] - Service.TimeLastDown:[Service.TimeLastDown] - Service.TimeUp: [Service.TimeUp] - Service.TimeDown: [Service.TimeDown]

For example if we have:
Polling:
Probe Interval = 10 seconds
Probe Timeout = 5 seconds
Probe Down Count = 3

We assume that polling is started at 00:00:00:

  1. Poll 1 at 00:00:10 with a timeout 5 seconds - Failed - Down count = 1 and started at 00:00:10
  2. Poll 2 at 00:00:20 with a timeout 5 seconds - Failed - Down count = 2 and increased at 00:00:20
  3. Poll 3 at 00:00:30 with a timeout 5 seconds - Failed - Down count = 3 and threshold is reached at 00:00:30

Notification is sent corresponding with the “Notification: Delay” settings. If it is 00:00:00 you will receive message that service is down at 00:00:30.
But in our example service down started at Poll 1 - 00:00:10.
Variable [Service.TimeSinceChanged] will shown us service inaccessibility time minus one Probe Interval: 00:00:30 - 00:00:10 = 00:00:20

In such case, Duration we see under Outages tab shown us only service down time which depends on Polling settings.
If we want to calculate real service down time we may use the following formulas:

Service down time = [Service.TimeSinceChanged] + Probe Interval + Outage Duration
or
Service down time = (Probe Interval * Probe Down Count ) + Outage Duration

Conclusions

  1. Probe Timeout value doesn’t affect Probe Interval, it only used for service verification.
  2. Outages duration show service down time without considering Probe Interval and Probe Down Count.
  3. Probe Interval and Probe Down Count only considered to Start Down point.

Note: In presented example, we assume that service is not flapping and it is stable down.

Used information

  1. The_Dude_v6/Services
  2. The_Dude_v6/Notifications
  3. Help me on Dude Probe Interval


    Thank you!

Hi @eriitguy, I made some tests and it seems not correct with ping probe:

For example if we have:

Polling:
Probe Interval = 10 seconds
Probe Timeout = 5 seconds
Probe Down Count = 3

Ping probe:
Retry count = 3
Retry interval = 1s

You will receive a message that service is down at 00:00:23

Poll 1= poll from 00:00:00, failed at 00:00:03 due to ping probe settings
Poll 2= poll from 00:00:10, failed at 00:00:13 due to ping probe settings
Poll 3= poll from 00:00:20, failed at 00:00:23 due to ping probe settings → Notification

With the default settings, you will receive a message that service is down at 00:02:03