I am using a failover script that pings an outside source for example Google DNS also the router has bandwidth based failover setup.
All is fine except when a power failure happened’s or a reboot is required. The failover works by increasing the route distances and here is where the problem lies. After a power outage or reboot the environmental variables are reset to zero but the routes remain the same. The script sends pings through the defunked ISP1 finding there is still a problem and again increases the routes from say 3 to 6 and so on.
:local InterfaceISP1 ether1-WAN1
:local InterfaceISP2 ether2-WAN2
:local ISP1 Interface1
:local ISP2 Interface2
:local ISP2s Interface2
:local PingTarget 8.8.8.8
:local FailTreshold 15
:local DistanceIncrease1 2
:local DistanceIncrease2 1
:local DistanceIncrease2s 2
:global PingFailCountISP1
:global PingFailCountISP2
:if ([:typeof $PingFailCountISP1] = "nothing") do={:set PingFailCountISP1 0}
:if ([:typeof $PingFailCountISP2] = "nothing") do={:set PingFailCountISP2 0}
:local PingResult
:set PingResult [ping $PingTarget count=1 interface=$InterfaceISP1]
:put $PingResult
:if ($PingResult = 0) do={
:if ($PingFailCountISP1 < ($FailTreshold + 2)) do={
:set PingFailCountISP1 ($PingFailCountISP1 + 1)
:if (($PingFailCountISP1 = $FailTreshold) && (/ip route find comment=$"ISP1" distance="<3")) do={
:log warning "ISP1 has a problem en route to $PingTarget - increasing distance of routes."
:foreach i in=[/ip route find comment=$"ISP1"] do=\
{/ip route set $i distance=([/ip route get $i distance] + $DistanceIncrease1)}
:log warning "Route distance increase finished."
/tool e-mail send to="my@email.com" \
subject=([/system identity get name] . "_WAN1_Down")\
password=password user=user\
body="$[/system identity get name] WAN1 Down Increasing Routes on $[/system clock get date] at $[/system clock get time]"\
server=[:resolve "smtp.live.com"] start-tls=yes port=587\
from="my@router.com"
:log info "Failure Sent to E-mail"
}
}
}
:if ($PingResult = 1) do={
:if ($PingFailCountISP1 > 0) do={
:set PingFailCountISP1 ($PingFailCountISP1 - 1)
:if ($PingFailCountISP1 = ($FailTreshold -1)) do={
:log warning "ISP1 can reach $PingTarget again - bringing back original distance of routes."
:foreach i in=[/ip route find comment=$"ISP1"] do=\
{/ip route set $i distance=([/ip route get $i distance] - $DistanceIncrease1)}
:log warning "Route distance decrease finished."
/tool e-mail send to="my@email.com" \
subject=([/system identity get name] . "_WAN1_UP")\
password=password user=user\
body="$[/system identity get name] WAN1 Up decreasing Routes on $[/system clock get date] at $[/system clock get time]"\
server=[:resolve "smtp.live.com"] start-tls=yes port=587\
from="my@router.com"
:log info "Failure Sent to E-mail"
}
}
}
:delay 5s
:set PingResult [ping $PingTarget count=1 interface=$InterfaceISP2]
:put $PingResult
:if ($PingResult = 0) do={
:if ($PingFailCountISP2 < ($FailTreshold + 2)) do={
:set PingFailCountISP2 ($PingFailCountISP2 + 1)
:if (($PingFailCountISP2 = $FailTreshold) && (/ip route find comment=$"ISP2" distance="<3")) do={
:log warning "ISP2 has a problem en route to $PingTarget - increasing distance of routes."
:foreach i in=[/ip route find comment="ISP2"] do=\
{/ip route set $i distance=([/ip route get $i distance] + $DistanceIncrease2)}
:log warning "Route distance increase finished."
/tool e-mail send to="my@email.com" \
subject=([/system identity get name] . "_WAN2_Down")\
password=password user=user\
body="$[/system identity get name] WAN2 Down Increasing Routes on $[/system clock get date] at $[/system clock get time]"\
server=[:resolve "smtp.live.com"] start-tls=yes port=587\
from="my@router.com"
:log info "Failure Sent to E-mail"
}
}
}
:set PingResult [ping $PingTarget count=1 interface=$InterfaceISP2]
:put $PingResult
:if ($PingResult = 0) do={
:if ($PingFailCountISP2 < ($FailTreshold + 2)) do={
:set PingFailCountISP2 ($PingFailCountISP2 + 1)
:if (($PingFailCountISP2 = $FailTreshold) && (/ip route find comment=$"ISP2s" distance="<3")) do={
:log warning "ISP2 has a problem en route to $PingTarget - increasing distance of routes."
:foreach i in=[/ip route find comment="ISP2s"] do=\
{/ip route set $i distance=([/ip route get $i distance] + $DistanceIncrease2s)}
:log warning "Route distance increase finished."
/tool e-mail send to="my@email.com" \
subject=([/system identity get name] . "_WAN2_Down")\
password=password user=user\
body="$[/system identity get name] WAN2 Down Increasing Routes on $[/system clock get date] at $[/system clock get time]"\
server=[:resolve "smtp.live.com"] start-tls=yes port=587\
from="my@router.com"
:log info "Failure Sent to E-mail"
}
}
}
:if ($PingResult = 1) do={
:if ($PingFailCountISP2 > 0) do={
:set PingFailCountISP2 ($PingFailCountISP2 - 1)
:if ($PingFailCountISP2 = ($FailTreshold - 1)) do={
:log warning "ISP2 can reach $PingTarget again - bringing back original distance of routes."
:foreach i in=[/ip route find comment="ISP2"] do=\
{/ip route set $i distance=([/ip route get $i distance] - $DistanceIncrease2)}
:log warning "Route distance decrease finished."
/tool e-mail send to="my@email.com" \
subject=([/system identity get name] . "_WAN2_Up")\
password=password user=user\
body="$[/system identity get name] WAN2 UP Decreasing Routes on $[/system clock get date] at $[/system clock get time]"\
server=[:resolve "smtp.live.com"] start-tls=yes port=587\
from="my@router.com"
:log info "Failure Sent to E-mail"
}
}
}
I have added this line in bold to try to resolve this but my script skills are very limited.
I am using the netwatch with static routes. Checking ip has static route trhu its wan, and second route to blackhole with distance 99 in case the respective wan is not available so the blackhole becomes active and do not leave the ping go via other wans.
Default routes to wans have distance 12 for wan1, 13 for wan2, 14 for wan3 and so on…
The netwatch has such command for going up (amongst others for logging, setting smtp servers and so on…):
/ip route set distance=12 [find distance=22];
and reverse for going down.
There are no problems with restarting or with keeping any variables. It does not matter what is the state of the connections when netwatch runs first. In case the wan is down, blackhole eats the ping and netwatch moves the distance. When the gateway becomes accessible, the pinging address static route becomes active and the blackohle inactive. Netwatch sends ping thru the wan1 to its gateway. When the address is now accessible, the netwatch changes the default route distance for this wan back from 22 to 12 and it becomes the first default route. The same happens with other routes to other wans also.
Of course, distances 12,13,14 and 22,23,24 are used only once.
I would love to see the scripting for that. But in the mean time could you confirm what I did is correct and if its not kindly give me a hint as to how to correct it. The reason for this is I know the script works. It has been tested and tried over long periods of time.
Basically you set the netwatch to ping some host through the WAN. When there is a probem in route the netwatch see’s it as being down and runs the down scrip and when the issue is resolved netwatch runs the up script.
Exactly. Sorry I hadn’t time to sit at computer to write more details. Using phone mainly… But you understood well. Does it work for you or you need some additional help?
That’s it. But 5minutes interval looks to be quite long for failover, don’t you think? I’m using 5 seconds. And, you cannot use the same ip for checking more than one wan. It is better to use some ip that is not important for you so you don’t mind if it is not accessible.
yes, I have my static routes setup. Your right 5 min interval was too long and have since set it to 30 seconds. I may have found a small issue.
I am using a bandwidth based fail over so if something happened to WAN2 and the script sees that WAN1 is saturated and starts sending data to that WAN2 the client would start to get timeouts.
My question is can the netwatch script be directed to ping out a specific WAN port so I can monitor each port individually?