fail over - ping from interface

spookman · March 6, 2011, 7:13am

Hi all

I have a RG750 configured with PCC and conencted to 2 upstream providers, every thing works great except when one the providers goes down the fail over does not happen. This is because the check-gateway only allows to check the next gateway, as I am connect to ISP controlled DSL modem I can’t configure the RG750 with the pppoe details.

So I have written a script that will periodically check say goolge and if it can’t reach it it will diabled the main route and enable the secondary, the problem I have is that when it checks google again because it can get to it through the secondary if fails back then on the next check fails over again. so it filp-flops.

Using winbox ping tool I can specify the interface to use to ping and I get the desired result but if I do this in the script I get no response.

This script from the console does not work ?

/ping 196.25.1.1 interface=ether1 interval=2 routing-table=GW1 count=3

Fail-Over Script

 0   name="GW1" owner="admin" policy=ftp,reboot,read,write,policy,test,winbox,password,sniff,sensitive last-started=Mar/05/2011 22:37:45 run-count=25
     source=
       :local i 0; {:do {:set i ($i + 1)} while (($i < 3) && ([/ping 196.25.1.1 interval=2 routing-table=GW1 count=3 interface=ether1]<=1))};
       :if ($i>=3) do={
       :log info "WAN1 Down";
       /ip route disable [find comment=GW1];
       /ip route enable [find comment="Backup1"];
       } else { :log info "WAN1 UP";
       /ip route enable [find comment=GW1];
       /ip route disable [find comment="Backup1"];
       }

IP ROUTE

 0 A S  ;;; GW1
        dst-address=0.0.0.0/0 gateway=10.0.0.1 gateway-status=10.0.0.1 reachable ether1 check-gateway=ping distance=1 scope=30 target-scope=10 routing-mark=GW1
1 X S  ;;; Backup1
        dst-address=0.0.0.0/0 gateway=10.0.1.1 gateway-status=10.0.1.1 reachable ether2 distance=10 scope=30 target-scope=10 routing-mark=GW1
2 A S  ;;; GW2
        dst-address=0.0.0.0/0 gateway=10.0.1.1 gateway-status=10.0.1.1 reachable ether2 check-gateway=ping distance=1 scope=30 target-scope=10 routing-mark=GW2
3 X S  ;;; Backup2
        dst-address=0.0.0.0/0 gateway=10.0.0.1 gateway-status=10.0.0.1 reachable ether1 distance=10 scope=30 target-scope=10 routing-mark=GW2
4 ADC  dst-address=10.0.0.0/24 pref-src=10.0.0.100 gateway=ether1 gateway-status=ether1 reachable distance=0 scope=10
5 ADC  dst-address=10.0.1.0/24 pref-src=10.0.1.100 gateway=ether2 gateway-status=ether2 reachable distance=0 scope=10
6 ADC  dst-address=10.10.0.0/24 pref-src=10.10.0.1 gateway=ether5 gateway-status=ether5 reachable distance=0 scope=10

IP FIREWALL MANGLE

add action=mark-connection chain=prerouting comment="" disabled=yes new-connection-mark=CM-GW1 passthrough=yes per-connection-classifier=src-address-and-port:2/0
add action=mark-connection chain=prerouting comment="" disabled=yes new-connection-mark=CM-GW2 passthrough=yes per-connection-classifier=src-address-and-port:2/1
add action=mark-connection chain=prerouting comment="CM for GW1" disabled=no in-interface=ether5 new-connection-mark=GW1 passthrough=yes per-connection-classifier=both-addresses-and-ports:2/0
add action=mark-connection chain=prerouting comment="CM for GW2" disabled=no in-interface=ether5 new-connection-mark=GW2 passthrough=yes per-connection-classifier=both-addresses-and-ports:2/1
add action=mark-connection chain=output comment="CM for GW1 - output" connection-mark=no-mark disabled=no new-connection-mark=GW1 passthrough=yes per-connection-classifier=both-addresses-and-ports:2/0
add action=mark-connection chain=output comment="CM for GW2 - output" connection-mark=no-mark disabled=no new-connection-mark=GW2 passthrough=yes per-connection-classifier=both-addresses-and-ports:2/1
add action=mark-connection chain=input comment="CM input GW1" connection-mark=no-mark disabled=no in-interface=ether1 new-connection-mark=GW1 passthrough=yes
add action=mark-connection chain=input comment="CM input GW2" connection-mark=no-mark disabled=no in-interface=ether2 new-connection-mark=GW2 passthrough=yes
add action=mark-routing chain=prerouting comment="RM for GW1" connection-mark=GW1 disabled=no in-interface=ether5 new-routing-mark=GW1 passthrough=yes
add action=mark-routing chain=prerouting comment="RM for GW2" connection-mark=GW2 disabled=no in-interface=ether5 new-routing-mark=GW2 passthrough=yes

any help would be appreciated

janisk · March 7, 2011, 9:07am

interface setting for ping is not working for IPv4, that is IPv6 feature only so you could pink Link-Local addresses. If it works for ipv4 that is only coincidence, and it can/will brake even without reboot or version upgrade.

so you could try to ping IPv6 LL address

spookman · March 8, 2011, 7:58am

Thanks for the help guys, I found a soulotion which does not even require a script

http://wiki.mikrotik.com/wiki/Advanced_Routing_Failover_without_Scripting

jcremin · May 2, 2011, 1:29pm

Just came across this post and wanted to share what I use.

I have the following script that is scheduled to run every 10 seconds

{
:global bandwidthsource
:global switchseconds

:if ($switchseconds>60) do={:set switchseconds 60}
:if ($switchseconds<0) do={:set switchseconds 0}

:if ($bandwidthsource="MAIN" && $switchseconds=<10) do={
:set bandwidthsource BACKUP;
/ip route set [find comment="Backup Connection"] distance=5;
:log error "SWITCHED TO BACKUP BANDWIDTH"}

:if ($bandwidthsource="BACKUP" && $switchseconds>=50) do={
:set bandwidthsource MAIN;
/ip route set [find comment="Backup Connection"] distance=15;
:log warning "SWITCHED TO PRIMARY BANDWIDTH"}

:if ($bandwidthsource="MAIN" && $switchseconds>10) do={
:if ([/ping x.x.x.x count=2]=0) do={:set switchseconds ($switchseconds - 10);
:log error "SWITCHING TO BACKUP BANDWIDTH IN $switchseconds SECONDS"} else={:set switchseconds ($switchseconds + 10);
:if ($switchseconds<60) do={:log error "SWITCHING TO BACKUP BANDWIDTH IN $switchseconds SECONDS"}}}

:if ($bandwidthsource="BACKUP" && $switchseconds<50) do={
:if ([/ping x.x.x.x count=2]=2) do={:set switchseconds ($switchseconds + 10);
:log warning "PRIMARY BANDWIDTH UP FOR $switchseconds SECONDS.  WILL SWITCH WHEN IT GETS TO 60 SECONDS"} else={:set switchseconds ($switchseconds - 10);
:if ($switchseconds>0) do={:log warning "PRIMARY BANDWIDTH UP FOR $switchseconds SECONDS.  WILL SWITCH WHEN IT GETS TO 60 SECONDS"}}}

}

I have two routes to my bandwidth, my main source has a distance of 10 with the comment “Main Connection” and my backup source has a distance of 15 and a comment of “Backup Connection”. The basic function of the script is to ping a certain IP address every 10 seconds. There is another route that forces pings to that IP address to always route through the main connection. There is also a 60 second delay for switching between sources to keep it from flopping back and forth too quickly. The timer starts at 60 seconds and each time a ping to the IP address fails, it reduces the timer by 10 seconds. Once it reaches 0, it will change the distance of the backup connection route to 5, which is identified by the comment for the route, essentially making that the higher priority route. The script continues to run, and when pings start going through again, it simply counts back up to 60, then switches the distance back to 15.

I also have a second script that runs on startup to ensure that the bandwidth source and timer are reset.

:global switchseconds 60
:global bandwidthsource MAIN

It seems to work quite well for me, and it adds the comments into the system log so I can go back and check to see if it has switched back and forth at all. If the timer starts counting down and back up again without switching, I know that something upstream dropped for a few seconds, but not long enough to trigger the switchover. If I see a lot of “Will switch to backup in 50 seconds” messages, I know that I’m getting a lot of random packet loss, but not enough to cause any major problems.

The only real problem with this scripts is the dependence of the IP address you choose to ping. I’m using the IP of a sever of mine that I have in a datacenter, but once in awhile the datacenter does something that knocks it offline, or has a small outage of their own. That will cause me to switch to backup bandwidth, even though nothing is actually wrong. A slightly better method would be to require the script to check 2 IP addresses and require that both fail before invoking the timer change, but I haven’t had time to figure that out yet.