Page 1 of 1

Netwatch style script

Posted: Wed Mar 14, 2012 11:24 pm
by gsloop
I have created a fairly extensive script, using some of the concepts I've seen in other scripts.

This script does several things.

1) It will ping a host x number of times and will change the metric of the specified route from 1 to 3 when y number of pings aren't returned. [i.e. If <8/10 pings return, consider the route down.]

This, IMO, is a lot better than just a black/white ping return test. [i.e. Did we get ANY pings back, then consider the pipe up.]

Also, by setting the RTT [interval] value, you can also do some rough RTT evaluation, and consider the pipe down if x of y packets returned at a specific RTT value. [You can't tell what the average RTT was, but you can set the RTT to be, say 300 ms. If the packet doesn't come back in <300ms it will consider it a "missed" ping. So, if 300ms was your minimal good RTT value, you would know that either the ICMP response was outside that RTT time, or lost altogether.

It will set the primary route to a metric of 3 when it's down and return it to a metric of 1 when it's "up" - this allows you to continue to test the route to see if it comes back up - without taking the route down and de-activating the interface.

[This is also helpful if you, like I, gather stats on external interfaces with something like smokeping - this way the interface will continue to respond, even if it's officially down. This allows testing, troubleshooting and statistics gathering to go on even when the interface is "down."]

The "backup" route should be set with a metric of 2.

2) It will email you status when the specified route goes down or up.

3) It will update a DynDNS record [optionally] with the new IP of the current default route. [This way you can re-point VPN pipes etc based on a DNS record. (I have a script to repoint an IPSec tunnel on ROS that I'm nearly done with too.)]

The code is extensively documented, and I've done quite a lot of testing - so it should be pretty solid.

It will write to sd or flash to handle the IP state across reboots. [Very similarly to my DynDNS script.]
It won't make changes to DynDNS unless the IP has changed [which will happen when a route is considered down.]
It doesn't write to flash unless the state changes. [Route goes up or down.] So, very short cycle/run times won't stress flash any more than lengthy cycle times.

This script has been created in ROS 5.12, tested on a 450G
It should work on other devices, and probably other ROS versions too, but I can't say which.

If you do use this, I'd appreciate feed-back.

1) Did it work? If not, what kinds of things were a problem?
2) What hardware did you run it on.
3) What ROS version
4) Comments, questions
5) If you modified the code, could you submit it back so I can, perhaps, integrate the changes?

Finally, Karma would be nice.

2012-03-21 - Updated to v1.0.0.8 [If you have a prior version, please use the newest version, as the old one has a show-stopper bug.]

2012-03-26 - Updated to v1.0.2.4 - MANY changes.
-Will create files needed for dyndns updates if they don't exist, so the user doesn't need to hand-create/install them.
-Documentation changes on how to create a mangle rule, and script modify that rule so that the ping traffic will continue to flow over the primary connection when it goes down so we test the "down" connection until it comes back up and modifies the metric.
-Adds the ability to use a FQDN as the ping target.
-Moves :resolve script commands to "late" in the script, so that failures won't crash the most important parts of the script.
-Many other changes and bug fixes.

This should be really solid. I've done a lot of testing.
I've left quite a lot of :puts for debugging in the code - I'll probably remove them at some point, but I've tried to limit :log output except where it's useful. The log output probably won't change much.

** Update 2012/04/01 **
Minor modifications. No update needed for prior users. Better support for new users, in terms of file initialization.

The comments in the header of the code are very long - sorry - but they have to be for documentation purposes. At least they're not as terse as a man page! :)

Please let me know how it works for you. Additions, comments, code additions/changes are all welcome.
Karma would be nice!

Re: Netwatch style script

Posted: Thu Mar 15, 2012 10:37 am
by rviteri
Can the pings go out through certain interfaces?

force ping to via the interface of WAN1

Or do you have to select IP address that will only be routable via the interface of WAN1

Re: Netwatch style script

Posted: Fri Mar 16, 2012 10:22 pm
by gsloop
You're right.

[I hadn't thought about when the connection is down, the route changes...and thus we'll have a problem.]

I'll work on it, but it will be a few days before I have the time, I suspect.


Re: Netwatch style script

Posted: Wed Mar 21, 2012 7:56 pm
by gsloop
Ok, updated the script.

Sorry for the oversight.

It should work correctly now.


Re: Netwatch style script

Posted: Sat Mar 24, 2012 2:17 pm
by rviteri
great, I'll check it out!

Re: Netwatch style script

Posted: Sat Mar 24, 2012 2:28 pm
by rviteri
Hi, did you change line 174 only?

:set vPingReturns [/ping $vPingDest interface=$vPrimaryInterface interval=$vPingInterval count=$vPingCount size=$vPingSize];

did you add interface=$vPrimaryInterface to force it through the gateway that is down?

Re: Netwatch style script

Posted: Mon Mar 26, 2012 4:01 am
by gsloop
I've extensively tested this over the last week and I have a few updates to change.

There is a mangle rule you'll need and I'm trying to finalize one that will allow you to use a FQDN instead of an IP address.

[But POS that ROS scripting language is...a :resolve that doesn't resolve and has an "error" will crash the whole script. So, I'm doing some tinkering to move the :resolve pieces to the end where they are less critical and less likely to bomb the whole thing.]

I hope to have it all done tonight. But we'll see.


[As for your question, I'm not sure if that's the only line I changed. I think it is, but not sure. But the modified version that is coming is vastly better and addresses many issues that the current version doesn't. So, I'd wait for it. :) ]

Re: Netwatch style script

Posted: Mon Mar 26, 2012 12:01 pm
by gsloop
Sorry for the double-post...

But wanted to update the thread.

New version. See notes at the end of the initial post for details, and for download.


Re: Netwatch style script

Posted: Mon Mar 26, 2012 5:20 pm
by rviteri
cool will check it out

Re: Netwatch style script

Posted: Mon Mar 26, 2012 6:42 pm
by gsloop
Someone PM'd me this...
Could this be used in conjunction with a PCC load balancing script to perform remote checks on a domain and to take any routes offline where the ADSL line has failed but the ADSL router is still up (causes the MT to route traffic to non working lines)?
I'm not sure exactly what part you're wondering would work...but in general yes.

I'm sure it could. I haven't used the PCC stuff, since all my "dual" connection links are so unequal I don't load-balance, I just use fail-over - so there may be nuances to things I'm not aware of...but it should work, I think.

Depending on what you use for a destination, you don't need to use the "gateway," as the destination. [And I most definitely don't want to use the gateway, since I see many failures where the gateway is available but upstream to the internet is hosed.] Pick something that's pretty reliable/available farther upstream. When it goes down, then your link is down.

[Picking something further upstream with more correlation with a "real" outage gets easier with the FQDN option, IMO.]

In looking at what PCC does, briefly, I can't see why this wouldn't work, perhaps with a tweak or two to add/modify code/function to accomplish what you want.


Re: Netwatch style script

Posted: Mon Apr 02, 2012 6:19 am
by gsloop
Update to script - updates to comments to make a bug less likely to bite you. Also a modification to a file variable preset.
(All related to the "lastip.txt" file. If the file is stored on RB flash, don't prefix with a '/' [slash])

Re: Netwatch style script

Posted: Mon Jul 09, 2012 12:56 pm
by Nomis
This is great, will sure to use it.

I have a very similar issue.

I have a ADSL (Broadband) and a 3G and they are configured on a 4.11U running 5.6 purely because the modem does not like any other version of Router OS.

I have successfully gotten them to fail over, but need a way to know when the ADSL or 3G is up. Your script looks like it will do the job perfectly only I am not running DYNdns and not issuing DHCP from the Router as that is done by a server and firewall .
Can the DNYdns and DHCP be remove.

ADSL is set to distance 1 and 3G is set to distance 2 and each is set to
0 A S ADSL 1
1 S 3G 2
2 ADC ether1 0
3 A S ether1 1

Thanking You in Advance for your reply
Karma is gifted :)

Re: Netwatch style script

Posted: Tue Jul 10, 2012 9:37 pm
by gsloop
Read the script header/docs.
DynDNS is optional.
DHCP is also not required - this will work on static IP's.

I'll try to come back and reply when I have a little more time - but don't know when that will be.


Re: Netwatch style script

Posted: Wed Jul 11, 2012 5:06 pm
by Nomis
Thanks for your reply.

I have made all the changes and can send a mail when the ADSL route status changes to 3. But it wont change it back from 3 to 1 when the ADSL route is operational again.

I am sure that your script is utilising two ports as wan1 and wan2 are mentioned
Remember I only have one ether net port on my 411 so all traffic via the 3G (PPP) and the ADSL (PPPOE) passes through it as well as passing it on to the rest of the lan.

I also assume you have to set-up a Schedule to run the script

Again thank you for all you assistance

Re: Netwatch style script

Posted: Tue Jul 17, 2012 5:57 pm
by gsloop
Follow-up for assistance of others using the script.


So, you're controlling fail-over with the RB, even though the actual device connected to the WAN links is NOT the RB.

[I'd think doing the fail-over on the router actually connected to the WAN links would be the place to do it.]

That point not withstanding....

Check the logs and run the script interactively. It can give you a lot of information about what's going on, and where things are failing.

If I were to guess what's wrong, I would guess that once the route changes, there's no way for the pings to get to the destination and return over the new route. Thus, the primary leg will never return to service.

[There's some notes in the docs that apply to this: Essentially if you have two eth legs [say A and B], you have to make sure the routing continues to check the availability of the A leg over the A connection. If it checks it over the B connection, it may return the A leg to service when it's still down, or if it can't get to the destination at all, it will NEVER return the A leg to service, since it will always appear down, even when it's not.]

So, check
1) Do the pings go out where you think they should and route as you think they should? Do they come back?
2) Check the output of the script when run interactively, and also the log output.

I suspect you'll see what's going wrong and then you can figure out what additions to make to your configuration to make it work better...


Re: Netwatch style script

Posted: Wed Jan 09, 2013 1:39 am
by pgarred
I'm sure this an obvious answer, but being a preverbial noob here, I'm lacking the the final answer. How do I set the script to run as needed? I have the script edited and ready to apply. I realize it's a function of netwatch, but I'm unsure of how to set the script to run.

Your help is greatly appreciated.


Re: Netwatch style script

Posted: Thu Jan 22, 2015 11:51 am
by marting
Did anyone use this this script on a 6.x already?