Page 1 of 1

Dual FailOver Script takes too long to switch over!

Posted: Thu Dec 15, 2016 9:30 pm
by Bumbaa
Hello, I am going to explain the scenario and what is happening. I have dual wan set up with static routes, Wan#1 and Wan#2. By default Wan#1 is the main Wan and Wan#2 is the failover Wan.

Basically there is also a L2TP tunnel that is going from the Mikrotik to the Server side and is restarted when Wan#1 fails over to Wan#2. The problem that I am having is that, sometimes when the Wan#1 fails and switches over to Wan#2 the Tunnel, after being restarted, does not build instantly and takes about 10-15 seconds but sometimes it is instant. It is the same principle of being able to browse the internet via a browser, Wan#1 Fails fails over to Wan#2 ----> Sometimes takes 10-15 seconds before being able to browse and sometime it is instant. When the Wan#1 fails over to Wan#2 I am always able to instantly ping from the Mikrotik or from my computer using the cmd to ping, but I still won't be able to browse or make the tunnel connect instantly.

When Wan#2 fails back over to Wan#1 , the tunnel will always be able to rebuild itself right away and you'll be also be able to browse straight away.

Why, when the Wan#1 fails over to Wan#2 that sometimes the tunnel won't rebuild instantly and take 10-15 seconds and sometime will rebuild instantly. (Same thing with the browsing) But you're always able to ping a target.

Any help is appreciated.

Thanks,
Bumbaa

This is the script:
#########################################################
#########################################################
##      Dual Wan FailOver For Telephony V0.3	       ##
##       											   ##
##       											   ##
##       		Creator:Dimension10					   ##
##       		Date: 14/12/16						   ##
##       											   ##
##       											   ##
##       											   ##
##       											   ##
##       	  										   ##
##       											   ##
#########################################################
#########################################################

#Wan interface name
:local InterfaceISP1 Wan#1

:local InterfaceISP2 Wan#2

#Routing Marks
:local Route1 Wan#1
:local Route2 Wan#2

# Gateway of ISP1 and ISP2

:local GatewayISP1 192.168.15.5

:local GatewayISP2 192.168.2.1

#Target to test gateway
:local PingTarget 8.8.8.8

#Store value of PingResut default:1
:local PingResult 

#L2TP Tunnel name
:local Tunnel Julien

#Up/down
:local Up 10

#Variable
:local Test1

#Global Variable to see if the script is running
:global Running

#Count of Checkping
:local Counter 0

########DO NOT MODIFY ANYTHING BELOW THIS COMMENT#########

#Check Status of main Wan#1
:log warning "test"
:set PingResult [ping $PingTarget interface=$InterfaceISP1 routing-table=$Route1 count=5 ] 
	:if (($PingResult = 5) || ($PingResult = 4)) do={ 
		:if ($PingResult = 4) do={ :set PingResult [ping $PingTarget interface=$InterfaceISP1 routing-table=$Route1 count=5] 
		:log warning "$PingResult"
		:if ($PingResult = 5) do={} else={
			:log warning "Increasing the route of interface $InterfaceISP1 the main Wan will be $InterfaceISP2" 
			:foreach i in=[ip route find gateway=$GatewayISP1] do={:put [ip route set numbers=$i distance=5] }
			:foreach i in=[ip route find gateway=$GatewayISP2] do={:put [ip route set numbers=$i distance=1] }
			:put [interface disable $Tunnel]
			:delay 500ms
			:put [interface enable $Tunnel]
			:log warning "Distance of route $InterfaceISP1 as been increased to 5 and tunnel restarted" 
			:while ($Counter < 10)  do={
				:if ( [ping $PingTarget interface=$InterfaceISP1 routing-table=$Route1 count=5] = 5) do={ :set Counter ($Counter + 1) 
}  				else={ :if ($Counter = 0) do={} else={ :set Counter ($Counter - 1) } } 
				:log warning "$Counter" } 
			:log warning "Increasing route distance of $InterfaceISP1" 
			:foreach i in=[ip route find gateway=$GatewayISP1] do={:put [ip route set numbers=$i distance=1] }
			:foreach i in=[ip route find gateway=$GatewayISP2] do={:put [ip route set numbers=$i distance=5] }
			:delay 1
			:put [interface disable $Tunnel]
			:delay 1
			:put [interface enable $Tunnel]
						 
						 
						

} } }  else={ 
			:log warning "Increasing the route of interface $InterfaceISP1 the main Wan will be $InterfaceISP2" 
			:foreach i in=[ip route find gateway=$GatewayISP1] do={:put [ip route set numbers=$i distance=5] }
			:foreach i in=[ip route find gateway=$GatewayISP2] do={:put [ip route set numbers=$i distance=1] }
			:put [interface disable $Tunnel]
			:delay 500ms
			:put [interface enable $Tunnel]
			:log warning "Distance of route $InterfaceISP1 as been increased to 5 and tunnel restarted" 
			:while ($Counter < 10)  do={
										:if ( [ping $PingTarget interface=$InterfaceISP1 routing-table=$Route1 count=5] = 5) do={ :set Counter ($Counter + 1) 
}  										else={ :if ($Counter = 0) do={} else={ :set Counter ($Counter - 1) } } 
										:log warning "$Counter" } 
			:log warning "Increasing route distance of $InterfaceISP1" 
			:foreach i in=[ip route find gateway=$GatewayISP1] do={:put [ip route set numbers=$i distance=1] }
			:foreach i in=[ip route find gateway=$GatewayISP2] do={:put [ip route set numbers=$i distance=5] }
			:delay 1
			:put [interface disable $Tunnel]
			:delay 1
			:put [interface enable $Tunnel]
}


My Mikrotik cfg:
# software id = LYZE-IR97
#
/interface ethernet
set [ find default-name=ether2 ] name=Phone-Port
set [ find default-name=ether1 ] name=Wan#1
set [ find default-name=ether3 ] name=Wan#2
set [ find default-name=ether4 ] master-port=Phone-Port name=ether4-slave-local
set [ find default-name=ether5 ] master-port=Phone-Port name=ether5-slave-local
/interface l2tp-client
add connect-to=184.107.96.115 max-mru=1410 max-mtu=1410 name=Luminet password=\
    user=Julien
set [ find default=yes ] supplicant-identity=MikroTik
/ip dhcp-server option
add code=66 name=Polycom value="'tftp://172.24.0.1'"
add code=42 name=DHCP value="'172.24.0.1'"
/ip dhcp-server option sets
add name=set1 options=Polycom,DHCP
/ip hotspot profile
set [ find default=yes ] html-directory=flash/hotspot
/ip pool
add name=dhcp ranges=172.16.
/ip dhcp-server
add address-pool=dhcp disabled=no interface=Phone-Port name=default
/snmp community
add addresses=67/32 name="$"
/ip address
add address=172.16.156.1/24 comment="default configuration" interface=\
    Phone-Port network=172.16.156.0
/ip cloud
set ddns-enabled=yes
/ip dhcp-client
add add-default-route=no comment="default configuration" dhcp-options=\
    hostname,clientid disabled=no interface=Wan#1
add add-default-route=no dhcp-options=hostname,clientid disabled=no interface=\
    Wan#2
/ip dhcp-server network
add address=172.16.156.0/24 comment="default configuration" dhcp-option=Polycom \
    gateway=172.16.156.1 netmask=24
/ip dns
set allow-remote-requests=yes
/ip dns static
add address=172.16.156.1 name=router
/ip firewall address-list
add address=66. list=access
add address=66.1 list=access
add address=172.16.200.0/24 list=access
/ip firewall nat
add action=masquerade chain=srcnat comment="default configuration" \
    out-interface=all-ethernet
/ip firewall service-port
set ftp disabled=yes
set tftp disabled=yes
set irc disabled=yes
set h323 disabled=yes
set sip disabled=yes
set pptp disabled=yes
/ip route
add distance=1 gateway=192.168.15.5 routing-mark=Wan#1
add distance=1 gateway=192.168.2.1 routing-mark=Wan#2
add distance=1 gateway=192.168.2.1
add distance=1 gateway=192.168.15.5
/ip service
set telnet disabled=yes
set ftp disabled=yes
set www address=0.0.0.0/0,190/32,7.30/32
set ssh disabled=yes
set api disabled=yes
set winbox address=172.16.200.0/24,30/32,66.158.190/32
set api-ssl disabled=yes
/system clock
set time-zone-name=America/Toronto
/system identity
set name=Labo
/system logging
add topics=script
/system routerboard settings
set cpu-frequency=850MHz protected-routerboot=disabled
/system scheduler
add interval=1s name=schedule1 on-event=script1 policy=\
    ftp,reboot,read,write,policy,test,password,sniff,sensitive start-time=\
    startup

Re: Dual FailOver Script takes too long to switch over!

Posted: Fri Dec 16, 2016 4:39 pm
by mrz
Because there are still old connection tracking entries and route cache entries that need to time out.

Re: Dual FailOver Script takes too long to switch over!

Posted: Fri Dec 16, 2016 6:01 pm
by Bumbaa
Because there are still old connection tracking entries and route cache entries that need to time out.

Hello, thanks for the fast reply!

Ok cool, is there a possibility to force time it out or to refresh it appropriately using a command or there is no other option than to just wait?

This is a picture of the log when Wan#1 failsover to Wan#1 and you can see it is trying to connect the tunnel from the port of Wan#1. It hasn't updated the new port for Wan#2
DualWanSetup.jpg
What is the solution for this?

Cheers,
Bumbaa

Re: Dual FailOver Script takes too long to switch over!

Posted: Fri Dec 16, 2016 9:00 pm
by pe1chl
Try to arrange your routing rules and firewall in such a way that those packets with wrong source address cannot go out and are rejected e.g. with "host unreachable".
Maybe this will make the L2TP setup fail quickly.

Re: Dual FailOver Script takes too long to switch over!

Posted: Mon Dec 19, 2016 3:56 pm
by Bumbaa
Isn't there a: "flush route cache" command or something around those lines?


Thanks,
Bumbaa

Re: Dual FailOver Script takes too long to switch over!

Posted: Tue Dec 20, 2016 12:45 am
by che
I had the same question once regarding dual wan L2TP tunnel, and my workaround was adding script routine (netwatch would work as well) to reset the connection tracking table on gateway switch events:
/ip firewall connection tracking set enabled=no
/ip firewall connection tracking set enabled=yes

Re: Dual FailOver Script takes too long to switch over!

Posted: Tue Dec 20, 2016 1:28 am
by Bumbaa
I had the same question once regarding dual wan L2TP tunnel, and my workaround was adding script routine (netwatch would work as well) to reset the connection tracking table on gateway switch events:
/ip firewall connection tracking set enabled=no
/ip firewall connection tracking set enabled=yes
So you have a script routine/netwatch that detects when the Gateway switches and the script does what you quoted?

Just making sure I understood clearly and I will try it tomorrow.

Thanks,
Bumbaa

Re: Dual FailOver Script takes too long to switch over!

Posted: Tue Dec 20, 2016 2:12 am
by che
I see you already have a scheduler that runs a script every second. You can just add those two lines I've posted to your script in places where code is reacting to a dead gateway and when it's comming back to original state (I didn't examine your script).

Re: Dual FailOver Script takes too long to switch over!

Posted: Tue Dec 20, 2016 7:12 pm
by Bumbaa
I see you already have a scheduler that runs a script every second. You can just add those two lines I've posted to your script in places where code is reacting to a dead gateway and when it's comming back to original state (I didn't examine your script).
Sadly it did not work. The L2TP client is still sending from a dead wan port and it takes about 10-20 seconds for it to reset and realize that it should not be sending.


Thanks,
Bumbaa

Re: Dual FailOver Script takes too long to switch over!

Posted: Tue Dec 20, 2016 7:14 pm
by mrz
You can try to disable route cache in ip settings. Maybe it helps

Re: Dual FailOver Script takes too long to switch over!

Posted: Tue Dec 20, 2016 8:05 pm
by Bumbaa
You can try to disable route cache in ip settings. Maybe it helps

That did not do anything :( . To add more information: It will automatically switch over is the tunnel is closed before the Wan#1 is lost. If it isn't closed before Wan#1 loses connection then Mikrotik waits for a timeout or something and keeps trying to send a request to the server side of the tunnel from the dead wan port.

Any suggestion is welcomed.

Thanks,
Bumbaa

Re: Dual FailOver Script takes too long to switch over!

Posted: Tue Dec 20, 2016 8:19 pm
by pe1chl
Any suggestion is welcomed.
What did you do with my suggestion?

Re: Dual FailOver Script takes too long to switch over!

Posted: Tue Dec 20, 2016 8:45 pm
by Bumbaa
Any suggestion is welcomed.
What did you do with my suggestion?
Hey,I wasn't sure how to do it exactly and properly so I just did it this way:


I activated a firewall rules that would drop all packets coming from source:Wan#1 going to L2TP Server side.

The tunnel client was still trying to re-establish/connect from a dead wan port still :/.

The wan#1 port was disconnected and you could see the firewall still blocking packets coming from dead wan port trying to reach the l2tp server. After 10-20 seconds, the L2TP client started sending connection from the backup Wan#2

Re: Dual FailOver Script takes too long to switch over!

Posted: Tue Dec 20, 2016 9:17 pm
by pe1chl
Any suggestion is welcomed.
What did you do with my suggestion?
Hey,I wasn't sure how to do it exactly and properly so I just did it this way:


I activated a firewall rules that would drop all packets coming from source:Wan#1 going to L2TP Server side.
Don't use drop. Use reject.

Re: Dual FailOver Script takes too long to switch over!

Posted: Thu Dec 22, 2016 1:19 am
by Bumbaa
Any suggestion is welcomed.
What did you do with my suggestion?
Hey,I wasn't sure how to do it exactly and properly so I just did it this way:


I activated a firewall rules that would drop all packets coming from source:Wan#1 going to L2TP Server side.
Don't use drop. Use reject.
Will try it out tomorrow.

Thanks,
Bumbaa

Re: Dual FailOver Script takes too long to switch over!

Posted: Tue Dec 27, 2016 3:50 pm
by Bumbaa
Any suggestion is welcomed.
What did you do with my suggestion?
Hey,I wasn't sure how to do it exactly and properly so I just did it this way:


I activated a firewall rules that would drop all packets coming from source:Wan#1 going to L2TP Server side.
Don't use drop. Use reject.
Sadly, it does the same thing and continues to send tunnel request through a dead wan port I.P.

Cheers,
Bumbaa

Re: Dual FailOver Script takes too long to switch over!

Posted: Tue Dec 27, 2016 7:11 pm
by pe1chl
Sadly, it does the same thing and continues to send tunnel request through a dead wan port I.P.

Cheers,
Bumbaa
That is a pity. But you could ask MikroTik to place that on the list of things to fix. A "network unreachable"
reply on an outgoing tunnel setup attempt should cause the tunnel to fail immediately, not after re-trying.

However, I also think you are a bit too picky. Failover after 15 seconds is quite good, usually it takes
longer for failover procedures to work when they rely on failure of keepalive packets. Failover is to restore
service after interruption of a critical connection, not to build a reliable internet service out of several
crappy ones.