Netwatch - ICMP Unreachable

Curious about this function if ICMP probe,
I setup my natted router and tracerouted to 1.1.1.1, the first hop being the Main router of course.
I figured on 7 hops would get me to the internet but not quite reach Cloudflare.

So I did 7 hop TTL on the rule and the interface came back as DOWN
So I did 12 hop TTL and the interface came back as UP.

So I can confirm there is some information there to be gleaned but I think something is not right.

Here is the text from docs:

accept-icmp-time-exceeded=yes can be used together with a manually set low ttl value to monitor Internet connectivity, without relying on a specific endpoint.

For example, you can monitor a public IP address, but that address can filter your ICMP request, or just become unreachable itself, if the Netwatch probe is using this address to monitor Internet connectivity this would cause a false alarm.

To make sure you can reach the Internet, it's generally enough to make sure you can reach a device a few routing hops away. Low time to live value will expire in transit to the specified host you want to monitor - each router passing the ICMP packet will subtract "1" from TTL value, upon TTL reaching 0, ICMP "time exceeded" packet will be generated, and sent back to the Netwatch probe. If all other fail thresholds are not broken, this response will be considered a success.

Is MT actually stating that a DOWN status means you can reach the internet. ??? That seems bogus to me. In fact I tried one hop where the hop was my Main router and the same result ensued, down status.
I tried a couple of hops, just hit my ISP locally, and same result, host came back as down.

So not sure what this return message indicated "time exceeded" but it appears to be useless as there is not differentiation between reaching reaching my router, reaching the ISP, or reaching the internet but not the end Canary. it all appears to be the same result host is shown as down. Only when I increase the number of hops and actually reach the host is the host shown as UP: status.

In other words, it would appear not that effective, in determining if I have internet connectivity that doesnt rely on actually hitting a DNS canary site.
What am I missing??

Can you try to set ttl=30 ?

To what end, when I used 12, aka reached the host entered 1.1.1.1, the interface came UP.

If I enable "accept icmp time exceeded", with a small TTL (4), I get status is UP.

Can see the icmp time exceeded packets coming back.

Seems quite useful.

I get status down if something fails prior to 4 hops away. When no icmp is returned
I assume it might also be down if was getting icmp destination unreachable back.
(It likely ignores all except time exceeded)

The way I understand it, is you assign the connection an amount of "hop points", let's say 5.

A hop point is spent at each successful hop.

When there are no more hop points, the error packet is generated, and the setting should mean "consider this particular time to live exceeded error (no more hop points) as a success or UP".

This means that the ICMP packet reached the 5th address in your traceroute towards the canary address you use.

And if this happened, you have a working connection to:

  1. your router/modem <- first hop usually
  2. your ISP <- second hop usually
  3. some ISP "hub" or "concentrator" <- third hop ussually
  4. some point along the route to your canary
  5. another poiint along the same route

That is quite simple.

But when you are running a Netwatch ICMP probe there are the "other" settings that are anyway active, so I believe those need to be fine tuned to NOT cause a DOWN "accidentally".

Wrong, you have no certainty whatsoever that it will be done...

For a very simple reason:
To prevent DDoS and other threats, a response with such a unusual low TTL isn't necessarily generated...
Otherwise, simply sending a packet with a TTL of 1 and a spoofed IP would generate DDoS traffic to the victim IP...

Okay Jacklaz I will bite, I was leaning towards, useless because I get down for setting of 1 hop, I have reached my main router from my natted router (USELESS), I get down for two, (I have reached my ISP first local address), I get down for 3 hop ( I have reached my ISP local second address ), I get down for 4 hop ( I have reached an out of 'state' ISP address), I get down for 5 hop ( My ISP in yet a different province ),

I added hops until I reached 1.1.1.1 and then the probe came back as UP. So if there is no clear cut distinction between any of the DOWNS prior to reaching the canary itself, the functionality is useless.

In other words, the selection of the number of hops may not be right, and the OP will never know the results is down from having reached the internet or simply having reached further along the ISP chain of hops. Also, there is often 3-6 hops for my ISP and only 1 hop between it and cloudflare..........
So the refinement would have to be too precise to be useful for most users. How often does that route change as well................

However, maybe some other setting is not allowing the differentiation of getting the icmp message back......???

I have entered a support ticket for technical support (explanation of docs etc.)

Yep, this is my suspect.

The "other" settings in Netwatch are so complex and partly interconnected that it is difficult to say which one could cause the "false" down.

It is well possible also that some settings (on your router, but I doubt it) or on the ISP (as rextended suggested) cause the (in this case expected) error return message to never reach the netwatch/ping.

I don't have any RoS device with 7.x (x should be bigger than 7.16 or so, when the feature was introduced) but from my Windows ping -i works:

C:\>ping -i 12 8.8.8.8

Esecuzione di Ping 8.8.8.8 con 32 byte di dati:

Risposta da 8.8.8.8: byte=32 durata=48ms TTL=115
Risposta da 8.8.8.8: byte=32 durata=47ms TTL=115
Risposta da 8.8.8.8: byte=32 durata=47ms TTL=115
Risposta da 8.8.8.8: byte=32 durata=47ms TTL=115

Statistiche Ping per 8.8.8.8:
    Pacchetti: Trasmessi = 4, Ricevuti = 4, Persi = 0 (0% persi),
Tempo approssimativo percorsi andata/ritorno in millisecondi:
    Minimo = 47ms, Massimo =  48ms, Medio =  47ms

C:\>ping -i 11 8.8.8.8

Esecuzione di Ping 8.8.8.8 con 32 byte di dati:

Risposta da 8.8.8.8: byte=32 durata=47ms TTL=115
Risposta da 8.8.8.8: byte=32 durata=47ms TTL=115
Risposta da 8.8.8.8: byte=32 durata=47ms TTL=115
Risposta da 8.8.8.8: byte=32 durata=47ms TTL=115

Statistiche Ping per 8.8.8.8:
    Pacchetti: Trasmessi = 4, Ricevuti = 4, Persi = 0 (0% persi),
Tempo approssimativo percorsi andata/ritorno in millisecondi:
    Minimo = 47ms, Massimo =  47ms, Medio =  47ms

C:\>ping -i 10 8.8.8.8

Esecuzione di Ping 8.8.8.8 con 32 byte di dati:

Risposta da 108.170.232.169: TTL scaduto durante il passaggio.
Risposta da 108.170.232.169: TTL scaduto durante il passaggio.
Risposta da 108.170.232.169: TTL scaduto durante il passaggio.
Risposta da 108.170.232.169: TTL scaduto durante il passaggio.

Statistiche Ping per 8.8.8.8:
    Pacchetti: Trasmessi = 4, Ricevuti = 4, Persi = 0 (0% persi),
Tempo approssimativo percorsi andata/ritorno in millisecondi:
    Minimo = 0ms, Massimo =  0ms, Medio =  0ms

(8.8.8.8 is 11th hop from me in traceroute)

C:\>tracert 8.8.8.8

Rilevazione instradamento verso dns.google [8.8.8.8]
su un massimo di 30 punti di passaggio:

  1    <1 ms    <1 ms    <1 ms  [Redacted]
  2     *        *        *     Richiesta scaduta.
  3    44 ms    42 ms    43 ms  [Redacted]
  4    43 ms    43 ms    44 ms  [Redacted]
  5    48 ms    47 ms    47 ms  172.19.184.70
  6    46 ms    45 ms    47 ms  172.19.177.62
  7    47 ms    47 ms    47 ms  [Redacted]
  8    47 ms    46 ms    47 ms  74.125.51.148
  9    48 ms    48 ms    48 ms  108.170.255.203
 10    48 ms    51 ms    48 ms  108.170.232.169
 11    47 ms    47 ms    46 ms  dns.google [8.8.8.8]

Rilevazione completata.

OR, the return message works every time, as its only hop based, and thus quite frankly is not really all that useful. We wrongly think that magically the result should be an UP result, when in fact the only UP result will be when reaching the canary. This method would rely on ensuring the hops count to zero magically only when the ISP is no longer involved ( aka handover to a third party ).

Sorry anav, you posted while I was updating my previous post with results of ping and traceroute from Windows.
It seems that lowering TTL does get a timeout error.
You could try the same to test how it behaves, i.,e. if the issue is Ros related or not.

Please explain to me how this functionality is at all useful? How do I know if the internet has been reached when I get the same result if I have reached, the router, the ISP, or the internet........
The only differentiation is when I reach the canary and I get interface UP.

The error might be the same, but it happens at a different point in the chain, at the hop you selected.

In my example:
ping -i 10 8.8.8.8
the result
Risposta da 108.170.232.169: TTL scaduto durante il passaggio.
means:
I couldn't reach 8.8.8.8 after exactly 10 hops.

In:
ping -i 9 8..8.8.8
returns:
Risposta da 108.170.255.203: TTL scaduto durante il passaggio.
means:
I couldn't reach 8.8.8.8 after exactly 9 hops.

If there is an error/interruption BEFORE the given number of hops, the error message would be different (host unreachable, no route to host, No reply, etc.) as the TTL would still be greater than 0.

In other words, the error message is saying "there was no network error until the number of hops you selected expired".

Good, my understanding is consistent with what you have stated and seen and what I have seen.
Thus, there is no sure way of knowing whether or not, at any time, if the number of hops is sufficient to exit any natted routers, and the ISP to actually HOP OUT on reaching the WWW, before hitting the canary.

Thus we have to look at the results to see if useful.

a. interface is down ( could be upstream router, ISP, or internet before canary)
RELIES on accurate estimate of # hops to canary.

b. interface is up, we have reached to far, the canary
Q. is this so bad, or any different from pinging canary, or better than pinging ???

c. WHAT is the result when a hop at the ISP is not responding?? OR as you state
host unreachable, not route to host, no reply etc.....

What does netwatch SEE or DO? ( the probe doesnt provide those details )?????

Are you saying netwatch should be set up to monitor response messages and take actions upon those instead........ I dont see that option.

Are you saying we should make scripts that capture those responses and take action........... ??

I am not saying anything, I am trying to explain what I observe.

If you have (say) google at 11 hops distance, it is unlikely that it will get much nearer.

Let's say that for some strange internet magic in some days it can be only 9 hops away.
Now if you ping with 8 hops there are only two possibilities:

  1. you don't have connection in any of hops 2-7[1] and you get an error that is NOT "TTL expired in transit" <- this should make Netwatch DOWN
  2. you have connection up to hop 8 and server on hop 8 replies "TTL expired in transit"<- this should make Netwatch UP

In BOTH cases you don't know anything on what happens (if it happens) beyond hop 8.
But you count on the fact that routers near google tend to be up and working.

A normal ping may give "false" down if ONLY the canary is down, but internet connection is actually up.

This "reduced TTL" ping should avoid that particular false negaitive (but of course introduce a "false" positive, because the last part of the traceroute remains untested).

[1] from your PC, assuming that you have a router that is hop1

The distance varies, it is not fixed, in fact...

Exactly, and my observarion is the reverse, the interface is down if the expired message is received an the interface is up if it reaches the canary.

( also check my post here for another probe dns topic - Netwatch DNS Probe - Attempt to Replace ICMP )

@anav
Are you saying that the behaviour of netwatch is the same with :
accept-icmp-time-exceeded=yes
OR
accept-icmp-time-exceeded=no

?
:confused:

I have never tried accept-icmp-time-exceeded=no, as wouldnt that be like simply not using that feature ?? Will have a quick look.

YES results are identical, it may be my testing method.
I have a netwatch rule that selects type icmp host is 1.1.1.1
the only parameters I use are TTL and the check box for Accept ICMP Time Exceeded

I have the netwatch table ( showing all rules and the status on one side of the screen ) and I have
the actual netwatch rule on the other half, and tools traceroute open as well.

All I do is vary the TTL (number of hops) and monitor the status

Yep, the setting "no" should be default.

Then the possibility that some of the other values trigger the down should be checked.

Which is the tricky part, besides the deceiving view you have, ALL default netwatch settings remain active, any of these:

Property Description
packet-interval (Default: 50ms) The time between ICMP-request packet send
packet-count (Default: 10) Total count of ICMP packets to send out within a single test
packet-size (Default: 54 (IPv4) or 54 (IPv6)) The total size of the IP ICMP packet
thr-max (Default:1s) Fail threshold for rtt-max (a value above thr-max is a probe fail)
thr-avg (Default: 100ms) Fail threshold for rtt-avg (round trip time-avg)
thr-stdev (Default: 250ms) Fail threshold for rtt-stdev
thr-jitter (Default: 1s) Fail threshold for rtt-jitter
thr-loss-percent (Default: 85.0%) Fail threshold for loss-percent
thr-loss-count (Default: 4294967295(max)) Fail threshold for loss-count

may play a role in the down.
I would exclude the last one (thr-loss-count) and the packet-size but all the others may play a role.

We must find a set of values that surely won't trigger anything.

Here is my traceroute and my results of each status.............
Note: Based on traceroute results it appears that the www is only reached for sure when solid green is returned????????? aka hop 7

Followed by ping 2, 4, 9 (all down) and then 10 ( up ) reaches canary.

...........

.........