Page 1 of 1

Interface Watchdog script

Posted: Fri Jun 16, 2017 2:06 am
by NetWorker
Time and again has it been brought up that the Netwatch tool should include a interface parameter as the ping tool does. I've been living without this feature but recently we've added a new ISP to our corporate network. We run a load balancing system with ping checking routes. Our dedicated lines are of very low bandwidth so we hired a 30 mbit home line that runs over DSL (using a pppoe client). The problem is that sometimes for some (extremely annoying) reason the connection hangs at the ISPs side. You can usually still ping the gateway but nothing else. Really annoying. So the pppoe stays connected and the routes stay up so we're dropping most of our http traffic. Did I mention how annoying this is?

Anyways, the problem, as annoying as it is, only happens about once a week and when I'm in the office. But today I wasn't and sure enough, got a call from the boss that "the internet isn't working".

Long story short, I've had it with this so I wrote a script. Should've done that in the first place, I know. But even though annoying, the problem wasn't really high priority. QoS ensured that critical traffic still went over the dedicated lines. I've tested this afternoon and it's now in production. Doesn't mean it's bug free, and as usual is provided as is, no warranties, and yada yada yada. You guys know the drill. I just wanted to share since many shared scripts on here saved me time and again from doing most of the brain work.

Script is composed of three parts that I'll post consecutively since there seems to be a maximum number of characters allowed per post. Schedule the first at startup, and the second to run at your preferred interval (8-10 minutes will mean between 24 and 30 minutes if before final action is taken if required).

Comments, bugs, mods, complaints, advice, free beer, etc. all welcome!

Re: Interface Watchdog script

Posted: Fri Jun 16, 2017 2:07 am
by NetWorker
intctrreset
#Interface counter reset
#Feel free to use or modify as needed.
#Hope this saves you work, trouble or time.
#Regards, Networker.

#This script defines and sets to zero the global variable that we will later use to define when
#to disable the interface or reset the router as needed.

#Note, if you need to watch multiple interfaces, I encourage you to run each script for each interface.
#Checking multiple interfaces in one script could get you into trouble with global variables, multiple ifs, etc.
#In other words, run int1ctrreset, int1watchdog, int1reset, int2,ctrreset, and so on.

#IMPORTANT: schedule this script at STARTUP! Otherwise, the global variable won't be initialized
#and all scripts that depend upon it might not work, even though it is defined in each one.

:global intctr;

:set intctr 0;

Re: Interface Watchdog script

Posted: Fri Jun 16, 2017 2:09 am
by NetWorker
intwatchdog
       #Interface Watchdog
       #Feel free to use or modify as needed.
       #Hope this saves you work, trouble or time.
       #Regards, Networker.

       #This script will ping a remote address from a particular interface and its gateway.
       #This way, we don't bombard the remote host with icmp requests, and make sure we have
       #a working link, before deciding the interface is actually down. First we define four
       #variables you may want to modify: the interface name of the interface you want watch,
       #the remote address to ping, the name of the interface reset script and the name of the
       #reset counter script.

       #Script logic is as follows:
       #First, check if interface is enabled. If it is, ping remote. If it responds (else),
       #execute counter reset script. If no response, ping gateway. If no response, execute
       #interface reset script. If there is a response, ping remote again. If no response,
       #execute interface reset, else (i.e. remote does respond), reset counter.


       #Interface name
       :local intname "Inet";
       #remote ip to ping
       :local remote 8.8.8.8;
       #interface reset script name
       :local intreset "intreset";
       #counter reset script name
       :local ctrres "intctrreset";

       #----------------------------

       :local gateway [/ip address get [find where interface=$intname] network];
       #When using a pppoe client, the gateway is the network address. But if you're
       #using a dhcp client, you'd want to use the following instead:
       #[/ip dhcp-client get [find where interface=intname] gateway]

       #----------------------------
       :if ([/interface get $intname disabled] = no) do={

       :if ([/ping $remote count=2 interface=$intname]=0) do={
           :if ([/ping $gateway count=10 interface=$intname]=0) do={:execute $intreset; }\
               else={ if ([/ping $remote count=10 interface=$intname]=0) do={
                   :execute $intreset; } else={:execute $ctrres; }
               }
           } else={:execute $ctrres;}
       }

Re: Interface Watchdog script

Posted: Fri Jun 16, 2017 2:15 am
by NetWorker
intreset
#Interface Reset

#Feel free to use or modify as needed.
#Hope this saves you work, trouble or time.
#Regards, Networker.

#The following script will reset the defined interface. It also checks the number of
#resets performed. Past that number, the script will permanently disable said interface
#and send an email. You may want to replace the /interface disable action with
#/system reboot. But since I run a multiwan system, I prefer to just disable and e-mail.

:global intctr;

#Maximum times to reset before permanent action
:local maxreset 2;

#Interface name
:local intname "Inet";

#email address
:local mailaddr "admin@test.com";


:local sysname [/system identity get name];
:if ($intctr<$maxreset) do={
:log warning "Watchdog: $intname will be reset. It has already been reset $intctr times.";
/interface disable $intname;
:delay 30;
/interface enable $intname;
:set intctr ($intctr+1);
} else={
:log error "$intname has been reset $intctr times and will now be disabled."
/interface disable $intname;
:delay 2;
/tool e-mail send to=$mailaddr subject="$sysname - Interface $intname disabled"\
body="Interface $intname on $sysname has been disbabled because it has been reset\
 $maxreset times by the watchdog. Contact sysadmin, ISP or verify interface setup."
}

Re: Interface Watchdog script

Posted: Thu Jun 22, 2017 6:18 pm
by NetWorker
Guys, a word of caution. I've also been running this script for our VPN links now, since they tend to crash along with the pppoe client. That is, our dedicated lines stay up but ovpn connections over the DSL line crash.

Anyways, long story short, the ovpn client drops the connection but doesn't try to reconnect. It remains at "terminating". The thing is, since the ip addres and gateway are dynamically assigned when the connection comes up, the watchdog script will fail trying to look the gateway up ("no such item"). Also, the ping tool hangs for some reason when using the crashed interface. So using the fact that the route gets only assigned when the connection is up, I simplified the watchdog script as follows:
#Interface Watchdog
#Feel free to use or modify as needed.
#Hope this saves you work, trouble or time.
#Regards, Networker.

#This script will ping a remote address from a particular interface and its gateway.
#This way, we don't bombard the remote host with icmp requests, and make sure we have
#a working link, before deciding the interface is actually down. First we define four
#variables you may want to modify: the interface name of the interface you want watch,
#the remote address to ping, the name of the interface reset script and the name of the
#reset counter script.

#Script logic is as follows:
#First, check if interface is enabled. If it is, ping remote. If it responds (else),
#execute counter reset script. If no response, ping remote again with 20 pings.
#If no response, execute interface reset, else (i.e. remote does respond),
#reset counter.

#----------modify from here-------------------------

#Interface name
:local intname "ovpn-client";
#remote ip to ping
:local remote 10.0.9.3;
#interface reset script name
:local intreset "int2reset";
#counter reset script name
:local ctrres "int2ctrreset";

#----------modify up to here------------------------


:if ([/interface get $intname disabled] = no) do={
:if ([/ping $remote count=2]=0) do={ 
    :if ([/ping $remote count=20 ]=0) do={
            :execute $intreset; } else={:execute $ctrres; }
     } else={:execute $ctrres; }
}
edit: forgot one of the else in the script, sorry.

Re: Interface Watchdog script

Posted: Thu Jun 22, 2017 6:30 pm
by NetWorker
Quick update when using multiple reset counters. In the interface reset script I just used the global variable. But if you have multiple global variables, instead of editing the variable name everywhere in all the scripts, you can use a local variable and just change the name of the global at the beginning and at the end of each script as follows:
#Interface Reset
#Feel free to use or modify as needed.
#Hope this saves you work, trouble or time.
#Regards, Networker.


#The following script will reset the defined interface. It also checks the number of
#resets performed. Past that number, the script will permanently disable said interface
#and send an email. You may want to replace the /interface disable action with
#/system reboot. But since I run a multiwan system, I prefer to just disable and e-mail.

#match global variable to interface counter here as well as at the end of the script
:global int2ctr;
:local intctr $int2ctr;

#Maximum times to reset before permanent action
:local maxreset 2;
#Interface name
:local intname "ovpn-client";
#email address
:local mailaddr "test@testdomain.com";

:local sysname [/system identity get name];

:if ($intctr<$maxreset) do={
:log warning "Watchdog: $intname will be reset. It has already been reset $intctr times.";
/interface disable $intname;
:delay 30;
/interface enable $intname;
:set intctr ($intctr+1);
} else={
:log error "$intname has been reset $intctr times and will now be disabled."
/interface disable $intname;
:delay 2;
/tool e-mail send to=$mailaddr subject="$sysname - Interface $intname disabled"\
body="Interface $intname on $sysname has been disbabled because it has been reset\
 $maxreset times by the watchdog. Contact sysadmin, ISP or verify interface setup."
}

#match global 
:set int2ctr $intctr