Community discussions

MikroTik App
 
diabolusss
just joined
Topic Author
Posts: 14
Joined: Tue Sep 27, 2022 10:55 am
Location: Latvia, Riga

Netwatch run traceroute on link down fails event

Tue Jan 10, 2023 11:55 am

Hello!
I've configured Netwatch to additionally ping host in On Down event, which works flawlessly. Additionally, i've configured a script nwatch-ltegateway that checks first n hops to provided host.
It works when called from Test or On Up events, but if called from On Down, then On Down event fails with no error in log.
I believe it should work, at least because ping works as expected, but i'm new to ROS scripting, so I'm missing something for sure...

Please, help me find the reason of On Down failure when nwatch-ltegateway is called from there and possibly fix that.
ROS 7.6 stable
Netwatch is called in 10sec interval and has 4sec timeout.
UPD230113 Thanks to @zainarbani i've captured an error, and thanks to @rextended the real cause was successfully identified and fixed. Scripts were updated.

Netwatch.OnDown:
:local hostip 1.0.0.1

:local countup value=0
:while (($countup < 10) && ([:ping address=$hostip interval=1 count=3]=0)) do={
   :set countup value=($countup+1);
   :log error "Ping No $countup failed to $hostip";
   /system script run nwatch-ltegateway;
}

:if ($countup < 10) do={
   :log info "Host $hostip recovered after ping No $countup."; 
} else={
   :log warning "Generating supout.rif before lte reset"
   /system script run nwatch-report;

   :log error "<alert> Host $hostip not recovered, will reset..."; 
   /system script run ltereset;
}

nwatch-ltegateway with "Don't require permissions" checked;
#check lte gateway
:local hostip 1.1.1.1
:local ttl value=1;

:local traceGwResult
:local lteGatewayL
:local gwLoss

:do {
 :set  traceGwResult [tool/traceroute address=$hostip max-hops=$ttl count=5 duration=4  as-value];
 :if ($ttl > 1) do={
   :set traceGwResult [:pick $traceGwResult ($ttl-1)];
 }
 
 :set lteGatewayL ($traceGwResult->"address")
 :set gwLoss  ($traceGwResult->"loss")

   :set ttl value=($ttl+1);
} while=(($ttl < 4) && ($gwLoss = 1000));

:if ($gwLoss > 0) do={    
  :set ttl value=($ttl-1);
  :if ($gwLoss = 1000) do={
     :log error "LTE:  $ttl hops are unreachable";
  } else={
     :log warning "LTE:  Link is unstable (gw:$lteGatewayL, hop:$ttl, loss: $gwLoss)";
  } 

  /system script run nwatch-ltestats;
}

:local traceStr  [:tostr $traceGwResult]
:log debug  "<trace> traceroute: ($traceStr)";

nwatch-ltestats with "Don't require permissions" checked:
/int lte monitor numbers=0 duration=1 once do={
  :set $caband [:tostr $"ca-band"];

 :log debug ("<trace> (rsrp:$rsrp;rsrq:$rsrq;rssi:$rssi;sinr:$sinr;uptime:".$"session-uptime".";cqi:$cqi;cellid:".$"current-cellid".";enb-id:".$"enb-id".";phy-cell:".$"phy-cellid".";band:".$"primary-band".";ca-band:$caband;operator:".$"current-operator".";data-class:".$"data-class".";modulation:".$"dl-modulation".";mcs:$mcs;revision:$revision;ri:$ri;sector:".$"sector-id".")");
}
traceroute_fix01.png
You do not have the required permissions to view the files attached to this post.
Last edited by diabolusss on Fri Jan 13, 2023 7:11 pm, edited 2 times in total.
 
zainarbani
Frequent Visitor
Frequent Visitor
Posts: 54
Joined: Thu Jul 22, 2021 9:42 am
Location: Pati, Indonesia

Re: Netwatch run traceroute on link down fails event

Thu Jan 12, 2023 8:26 am

Probably you're comparing (if) with nothing.

Can i see the results of up & down LTE state from:
:put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
 
diabolusss
just joined
Topic Author
Posts: 14
Joined: Tue Sep 27, 2022 10:55 am
Location: Latvia, Riga

Re: Netwatch run traceroute on link down fails event

Fri Jan 13, 2023 2:13 am

Probably you're comparing (if) with nothing.

Can i see the results of up & down LTE state from:
:put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]

Thanks for a tip, but when running checks directly i get sane results:
[colt@uTik] > :put $test 

[colt@uTik] > :put ($test > 0)
false
[colt@uTik] > :put ([:len $test]=0)
true
I'll try to catch event onDown, but it may be empty for sure.
Basically, traceroute may return a valid result if hop=1 is not ignoring ICMP packets, but if it does, then it will return nothing (timeout will happen). That's why i'm increasing hops now in do while loop - initially there wasn't any loops.

[colt@uTik] > :put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
.id=*1;address=10.87.255.1;avg=308;best=228;last=398;loss=0;sent=5;status=;std-dev=64;worst=398
[colt@uTik] > :put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
.id=*1;address=10.87.255.1;avg=1043;best=271;last=1403;loss=0;sent=5;status=;std-dev=869;worst=2592
[colt@uTik] > :put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
.id=*1;address=10.87.255.1;avg=1502;best=1063;last=2320;loss=0;sent=5;status=;std-dev=454;worst=2320
[colt@uTik] > :put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
.id=*1;address=10.87.255.1;avg=1521;best=411;last=566;loss=0;sent=5;status=;std-dev=859;worst=2493
[colt@uTik] > :put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
.id=*1;address=10.87.255.1;avg=1053;best=368;last=4294967295;loss=600;sent=5;status=;std-dev=685;worst=1738
[colt@uTik] > :put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
.id=*1;address=;last=4294967295;loss=1000;sent=5;status=
[colt@uTik] > :put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
.id=*1;address=;last=4294967295;loss=1000;sent=5;status=
[colt@uTik] > :put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
.id=*1;address=;last=4294967295;loss=1000;sent=5;status=
[colt@uTik] > :put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
.id=*1;address=;last=4294967295;loss=1000;sent=5;status=
[colt@uTik] > :put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
.id=*1;address=10.111.255.1;avg=1122;best=347;last=1784;loss=0;sent=5;status=;std-dev=795;worst=2319
[colt@uTik] > :put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
.id=*1;address=10.111.255.1;avg=1716;best=501;last=4294967295;loss=400;sent=5;status=;std-dev=861;worst=2400
[colt@uTik] > :put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
.id=*1;address=;last=4294967295;loss=1000;sent=5;status=
[colt@uTik] > :put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
.id=*1;address=;last=4294967295;loss=1000;sent=5;status=
[colt@uTik] > :put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
.id=*1;address=;last=4294967295;loss=1000;sent=5;status=
[colt@uTik] > :put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
.id=*1;address=;last=4294967295;loss=1000;sent=5;status=
[colt@uTik] > :put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
.id=*1;address=10.87.255.1;avg=1221;best=812;last=812;loss=0;sent=5;status=;std-dev=346;worst=1850
[colt@uTik] > :put [/tool traceroute address=1.0.0.1 max-hops=1 count=5 as-value]
.id=*1;address=10.87.255.1;avg=1011;best=181;last=610;loss=0;sent=5;status=;std-dev=590;worst=1909
[colt@uTik] > :put [/tool traceroute address=8.8.8.8 max-hops=9 count=5 as-value]
.id=*1;address=10.87.255.1;avg=739;best=222;last=427;loss=0;sent=5;status=;std-dev=799;worst=2332;.id=*2;address=10.192.72.37;avg=319;best=213;last=342;loss=0;sent=5;status=;std-dev=101;worst=497;.id=*3;address=10.241.140.193;avg=272;best=193;last=378;loss=0;sent=5;status=;std-dev=68;worst=378;.id=*4;address=84.15.16.29;avg=247;best=192;last=380;loss=0;sent=5;status=;std-dev=68;worst=380;.id=*5;address=;last=4294967295;loss=1000;sent=5;status=;.id=*6;address=;last=4294967295;loss=1000;sent=5;status=;.id=*7;address=142.250.229.217;avg=400;best=266;last=268;loss=0;sent=5;status=;std-dev=113;worst=526;.id=*8;address=142.251.48.39;avg=399;best=272;last=379;loss=0;sent=5;status=;std-dev=88;worst=546;.id=*9;address=8.8.8.8;avg=414;best=343;last=401;loss=0;sent=5;status=;std-dev=47;worst=479

I've captured an error when calling script from terminal while lte is down. First, i've set timeout=1 for traceroute, then i've tried timeout=0.1. Both times the script freezed for the same period until error was thrown.
[colt@uTik] > /system/script/run nwatch-ltegateway-on-down
Script Error: cannot compare if string is more than time interval
[colt@uTik] > /system/script/run nwatch-ltegateway-on-down
Script Error: cannot compare if string is more than time interval
[colt@uTik] > /system/script/run nwatch-ltegateway-on-down
Script Error: cannot compare if string is more than time interval

Finally, it worked after adding duration value. However, it's strange that even in terminal it failed with error mentioned above when duration is greater than 1. I can limit timeout more, but response from first hop may arrive after 1 second, so even this setting will not receive those packets...
 
 :set  traceGwResult [tool/traceroute address=$hostip max-hops=$ttl count=5 duration=1 timeout=1 as-value];

PS. cannot compare if string is more than time interval What this error even mean, and what causes that? (couldn't find any description about that in search, only that it's related to time comparison)
 
zainarbani
Frequent Visitor
Frequent Visitor
Posts: 54
Joined: Thu Jul 22, 2021 9:42 am
Location: Pati, Indonesia

Re: Netwatch run traceroute on link down fails event

Fri Jan 13, 2023 8:59 am

Im not sure how but this may explain,

Totally down connections:
[admin@MikroTik] > # im not using LTE env, just blindly disable WAN interface
[admin@MikroTik] > :global trc [/tool traceroute address=8.8.4.4 max-hops=1 count=5 as-value]
[admin@MikroTik] > :put $trc
.id=*1;address=;last=4294967295;loss=1000;sent=5;status=
[admin@MikroTik] > :global gwLoss [:pick $trc 5]
[admin@MikroTik] > :put $gwLoss
(no output)
[admin@MikroTik] > :if ($gwLoss > 0) do={:put "ok"}
Script Error: cannot compare if string is more than time interval

[admin@MikroTik] > # Skip error, let script finish
[admin@MikroTik] > :do { :if ($gwLoss > 0) do={:put "ok"} } on-error={:put "Skip Error"}
Skip Error

[admin@MikroTik] > # Lets say unstable conn
[admin@MikroTik] > :set gwLoss 3
[admin@MikroTik] > :do { :if ($gwLoss > 0) do={:put "ok"} } on-error={:put "Skip Error"}
ok
As you can see, possible root cause of this was treceroute output maybe differ on totally down connections.
 
User avatar
rextended
Forum Guru
Forum Guru
Posts: 11982
Joined: Tue Feb 25, 2014 12:49 pm
Location: Italy
Contact:

Re: Netwatch run traceroute on link down fails event  [SOLVED]

Fri Jan 13, 2023 10:05 am

It's time to stop programming with feet.
I tried to fix the script on OP, but this is so buggy that I gave up.

Also on your post :global gwLoss [:pick $trc 5] must be writed :global gwLoss ($trc->"loss")
Using positional syntax with fixed index, on fields that have names, do not work and is a shit.
(Note: as-value on traceroute is available only on v7)

The forum is full of samples, examples and snippets about "as-value" on something.
Is time you, and the OP, studied the examples
Last edited by rextended on Fri Jan 13, 2023 10:33 am, edited 2 times in total.
 
zainarbani
Frequent Visitor
Frequent Visitor
Posts: 54
Joined: Thu Jul 22, 2021 9:42 am
Location: Pati, Indonesia

Re: Netwatch run traceroute on link down fails event

Fri Jan 13, 2023 10:30 am

Also on your post :global gwLoss [:pick $trc 5] must be writed :global gwLoss ($trc->"loss")
yeah, thats correct.
 
User avatar
rextended
Forum Guru
Forum Guru
Posts: 11982
Joined: Tue Feb 25, 2014 12:49 pm
Location: Italy
Contact:

Re: Netwatch run traceroute on link down fails event

Fri Jan 13, 2023 10:34 am

Addendum to my previous post:

Is extremely rare the use of "on-error", if you use that, you have make one logical or syntax error for sure.

Rare example: ":resolve www.example.com" if not exist the script stop with "failure: dns name does not exist", you are forced to use on-error for skip this problem.
For not use on-error can be used :execute (and others), but we already know that, and on-error is acceptable, on this case, for not excessively complicate the script.

It is also a mistake to blindly rely on what I write.
I'm not immune to mistakes either.
 
zainarbani
Frequent Visitor
Frequent Visitor
Posts: 54
Joined: Thu Jul 22, 2021 9:42 am
Location: Pati, Indonesia

Re: Netwatch run traceroute on link down fails event

Fri Jan 13, 2023 10:48 am

Well idk Rex, Script Error: cannot compare if string is more than time interval was completely new to me.
found that if() was compared to nothing, on-error={} somehow coming on my mind :lol: coz i use it a lot maybe
 
User avatar
rextended
Forum Guru
Forum Guru
Posts: 11982
Joined: Tue Feb 25, 2014 12:49 pm
Location: Italy
Contact:

Re: Netwatch run traceroute on link down fails event

Fri Jan 13, 2023 11:03 am

trc is = to one array with .id=*1;address=;last=4294967295;loss=1000;sent=5;status=

:global gwLoss [:pick $trc 5]
define a global variable called gwLoss that containing exactly nothing because wrong [:pick $trc 5] return literally nothing

:put $gwLoss
(no output)
no output because gwLoss containing nothing

:if ($gwLoss > 0) do={:put "ok"}
Script Error: cannot compare if string is more than time interval

Resolve yourself, the compiler try to compare apples with potatos...
if nothing > 0...
 
diabolusss
just joined
Topic Author
Posts: 14
Joined: Tue Sep 27, 2022 10:55 am
Location: Latvia, Riga

Re: Netwatch run traceroute on link down fails event

Fri Jan 13, 2023 5:31 pm

It's time to stop programming with feet.
I tried to fix the script on OP, but this is so buggy that I gave up.

Also on your post :global gwLoss [:pick $trc 5] must be writed :global gwLoss ($trc->"loss")
Using positional syntax with fixed index, on fields that have names, do not work and is a shit.
(Note: as-value on traceroute is available only on v7)

The forum is full of samples, examples and snippets about "as-value" on something.
Is time you, and the OP, studied the examples
Thank you, @rextended for a useful tip. After changing access to a field by name, the script worked like a charm.

p.s. this forum contains a huge amount of scripts, and it's hard to find a needed thing in a limited time, moreover when the cause of problem was unknown and the scripting language is new to me. That's why i've asked for a help, however, I assume, I wasn't clear enough - I haven't asked to fix my script, I've asked for a help to find the issue (bugs), so that I can fix them on my own. So thank you, @rextended, @zainarbani for doing that.

@rextended it would be very kind of you, if you'll tell a little more about bugs you have noticed while trying to fix the script.
 
User avatar
rextended
Forum Guru
Forum Guru
Posts: 11982
Joined: Tue Feb 25, 2014 12:49 pm
Location: Italy
Contact:

Re: Netwatch run traceroute on link down fails event

Fri Jan 13, 2023 7:06 pm

info: after # use one space, for readability
info: do not put ; at the end

warning: indent or is not understandable to the others
for example
:do {
 :set  traceGwResult [tool/traceroute address=$hostip max-hops=$ttl count=5 duration=4  as-value];
 :if ($ttl > 1) do={
   :set traceGwResult [:pick $traceGwResult ($ttl-1)];
 }
 
 :set lteGatewayL ($traceGwResult->"address")
 :set gwLoss  ($traceGwResult->"loss")

   :set ttl value=($ttl+1);
} while=(($ttl < 4) && ($gwLoss = 1000));

is more readable on this way (not fixed other things inside):
:do {   :set  traceGwResult [tool/traceroute address=$hostip max-hops=$ttl count=5 duration=4  as-value];
        :if ($ttl > 1) do={
            :set traceGwResult [:pick $traceGwResult ($ttl-1)];
        }
        :set lteGatewayL ($traceGwResult->"address")
        :set gwLoss  ($traceGwResult->"loss")
        :set ttl value=($ttl+1);
} while=(($ttl < 4) && ($gwLoss = 1000));

warning: call the variable on pertinent way: ttl is not a ttl, but a max-hops, not the same things...
max-hops=$ttl

info: if - is not part of the number, better put spaces, for readability
[:pick $traceGwResult ($ttl-1)]
=>
[:pick $traceGwResult ($ttl - 1)]


info: better use "and" instead of && for readability, use & | etc. for bitwise operatins
(($ttl < 4) && ($gwLoss = 1000))
=>
( ($ttl < 4) and ($gwLoss = 1000) )

warning: do not use "value=" and similar when not needed
:set ttl value=($ttl-1);
=>
:set ttl ($ttl - 1)

the errors are gone when you fix the script.
 
diabolusss
just joined
Topic Author
Posts: 14
Joined: Tue Sep 27, 2022 10:55 am
Location: Latvia, Riga

Re: Netwatch run traceroute on link down fails event

Fri Jan 13, 2023 11:29 pm

@rextended thank you, for a such descriptive answer.
I appreciate your assistance, however i can't agree with all mentioned, so i hope you wouldn't mind that i'll comment some quotes.
Of course, I'm taking into account your previous note, too.
It is also a mistake to blindly rely on what I write.
I'm not immune to mistakes either.

info: do not put ; at the end
Based on https://wiki.mikrotik.com/wiki/Manual:Scripting or https://help.mikrotik.com/docs/display/ROS/Scripting
The end of command line is represented by the token “;” or NEWLINE. Sometimes “;” or NEWLINE is not required to end the command line.
Why do you offer to not to put ";" at the end of command?
I, personally, prefer to use it the same way as I use a dot to end a sentence. (of course, everyone may notice that scripts provided here are not consistent in this matter, but it's because I haven't written them from scratch, it's more like a Frankenstein… And because of loose scripting restrictions, I haven't fixed that.)

warning: indent or is not understandable to the others
Yeah, sorry for that. Winbox doesn't handle tabs, and I was a bit lazy to edit it from terminal and moreover to copy-paste into external editor... I don't like that, too, but in "code" tags it looks like normally indented.

warning: call the variable on pertinent way: ttl is not a ttl, but a max-hops, not the same things...
max-hops=$ttl
While I agree that max-hops is not the same as ttl, traceroute is altering ttl value to identify hops, so, IMHO in this context these names are interchangeable and ttl is shorter.
However, I think that a "max-hops" is not a good option, because, it can't be used without quotes, which, I personally don't like and will always avoid.
Valid characters in variable names are letters and digits. If the variable name contains any other character, then the variable name should be put in double quotes.

Once again, thank you all for your time and patience.
 
User avatar
rextended
Forum Guru
Forum Guru
Posts: 11982
Joined: Tue Feb 25, 2014 12:49 pm
Location: Italy
Contact:

Re: Netwatch run traceroute on link down fails event

Sat Jan 14, 2023 12:08 am

All my suggestions, apart from the error you fixed, are based on the fact that if you ask someone for help, if the script is unreadable, they resign...

Any excess additional signs, which can be omitted, better not put, rather sometimes it is better to write more instead of less.
For example:
better this: [find where ( (a = 1) and (b = 2) )] than generic [find a = 1 and b = 2] that is not ready readable for who do not write the script and must understand what happen.


Also this is better:
/int lte monitor numbers=0 duration=1 once do={
    :local caband $"ca-band"
    :local uptime $"session-uptime"
    :local cellid $"current-cellid"
    :local enbid  $"enb-id"
    :local phycel $"phy-cellid"
    :log debug "<trace> (rsrp:$rsrp;\
                         rsrq:$rsrq;\
                         rssi:$rssi;\
                         sinr:$sinr;\
                         uptime:$uptime;\
                         cqi:$cqi;\
                         cellid:$cellid;\
                         enb-id:$enbid;\
                         phy-cell:$phycel")
}

instead of
/int lte monitor numbers=0 duration=1 once do={
  :set $caband [:tostr $"ca-band"];

 :log debug ("<trace> (rsrp:$rsrp;rsrq:$rsrq;rssi:$rssi;sinr:$sinr;uptime:".$"session-uptime".";cqi:$cqi;cellid:".$"current-cellid".";enb-id:".$"enb-id".";phy-cell:".$"phy-cellid")");
}

(and probably a :foreach function, or :tostr is better, instead of "write all by hands"...)
 
diabolusss
just joined
Topic Author
Posts: 14
Joined: Tue Sep 27, 2022 10:55 am
Location: Latvia, Riga

Re: Netwatch run traceroute on link down fails event

Mon Jan 16, 2023 11:15 am

All my suggestions, apart from the error you fixed, are based on the fact that if you ask someone for help, if the script is unreadable, they resign...

Any excess additional signs, which can be omitted, better not put, rather sometimes it is better to write more instead of less.
For example:
better this: [find where ( (a = 1) and (b = 2) )] than generic [find a = 1 and b = 2] that is not ready readable for who do not write the script and must understand what happen.

Also this is better:
/int lte monitor numbers=0 duration=1 once do={
    :local caband $"ca-band"
    :local uptime $"session-uptime"
    :local cellid $"current-cellid"
    :local enbid  $"enb-id"
    :local phycel $"phy-cellid"
    :log debug "<trace> (rsrp:$rsrp;\
                         rsrq:$rsrq;\
                         rssi:$rssi;\
                         sinr:$sinr;\
                         uptime:$uptime;\
                         cqi:$cqi;\
                         cellid:$cellid;\
                         enb-id:$enbid;\
                         phy-cell:$phycel")
}

instead of
/int lte monitor numbers=0 duration=1 once do={
  :set $caband [:tostr $"ca-band"];

 :log debug ("<trace> (rsrp:$rsrp;rsrq:$rsrq;rssi:$rssi;sinr:$sinr;uptime:".$"session-uptime".";cqi:$cqi;cellid:".$"current-cellid".";enb-id:".$"enb-id".";phy-cell:".$"phy-cellid")");
}
That's fair, i'll note that.

(and probably a :foreach function, or :tostr is better, instead of "write all by hands"...)
That's a good point, but in this case i select only some variables, so :foreach needs additional check which will bring a little more unwanted complexity.


Hi, its best you use a software that can monitor your whole network and also internet link health. I am an ISP too and have used two softwares which only did 20% of what i wanted. I came to realise that they are more focused on administration side than the network itself and the customer so i decided to build my own software which runs my whole network and communicates to customers via whatsapp through an automated task system.
That sounds interesting, and images seems nice, however would you mind to share at least what you have build? You haven't mentioned is it open source, freeware, or paid? And, i believe it would be appropriate to list some of the solutions you have used before decided to build your own masterpiece :) (no irony here).

I completely agree with you, at least because, previously i had Huawei router with LTE modem and because of it's instability I started to build my own network monitor (Huawei have good api, but doesn't allow such freedom as Mikrotik). However, i'm not an ISP and don't have enough resources (at least free time and motivation) to completely devote myself into this task, so i haven't finished it... Then i've switched to MIkrotik which allowed to rebase most of api functions into itself, which right now is more feasible than rewriting my monitor.
On the other hand, if i had more devices to control (or would be an ISP), I believe that building a custom monitor is the way to go...

Who is online

Users browsing this forum: baragoon and 21 guests