Community discussions

MikroTik App
 
dave12
newbie
Topic Author
Posts: 31
Joined: Sat Oct 09, 2021 2:35 pm

Mikrotik failover script strange behavior

Fri Dec 03, 2021 6:04 pm

Hi all!

I wrote a script for seamless failover (useful for WebRTC) and it works flawlessly switching from ISP1 to ISP2.
But I noticed something strange.
When I send 3 pings with a timeout of 100ms, then every 3 hours there is no response from the destination IP. So unnecessary failover occurs.
When I send 3 packets with a timeout of 200ms, then every 6 hours, there is no response from the destination IP.
What could be the cause of this?
:local i false
:local j true
:local ipaddr 1.1.1.1

#ISP1 TTL must equal ISP2 TTL

do {

if (([/ip route find distance=1]=0) || ([/ping $ipaddr count=3 size=32 interval=200ms interface=ether1]=0)) do={
    if (j=true) do={
    /ip dhcp-client set ether1 default-route-distance=3;
    /ip firewall connection remove [find];
    :set i true;
    :set j false;
    :log warning "ISP1 down!Failover!";}    
} else={
    :if (i=true) do={
    /ip dhcp-client set ether1 default-route-distance=1;
    /ip firewall connection remove [find];
    :set i false;
    :set j true;
    :log warning "ISP1 up!";} 
}

} while=(i=i)
(Mikrotik does not document the return values of commands. For example ping returns the number of successful pings.
Also I managed to crash RouterOS 6.49 with a similar script.)
 
User avatar
smyers119
Member Candidate
Member Candidate
Posts: 232
Joined: Sat Feb 27, 2021 8:16 pm
Location: USA

Re: Mikrotik failover script strange behavior

Fri Dec 03, 2021 7:44 pm

Does this happen with any destination IP or just 1.1.1.1? Have you tried making the destination ip an array, so you ping a destination every x amount of times. And only fail over if all x destinations come back with no ping. This will help with false positives.
 
dave12
newbie
Topic Author
Posts: 31
Joined: Sat Oct 09, 2021 2:35 pm

Re: Mikrotik failover script strange behavior

Fri Dec 03, 2021 8:19 pm

Yes this happens with any IP. I tried rotating 2 IP addresses (because RouterOS does not have a swap command so I need 3 variables to rotate IP addresses).
So if 3 pings fail for one IP, then the script switches to the second IP. Also if I ping 2 times the first IP, then 2 times the second IP, the same timeout happens every 3 hours if the sum of the ping intervals is less than 500ms, if it is greater, every 6 hours there is a timeout. But the timeouts do not depend on the time when the script was started.
If there is a timeout every 3 hours, the timeout moments are constant: 8:46:37, 11:46:37, 14:46:37, 17:46:37, 20:46:37, 23:36:47, 2:46:37, 5:46:37.
If the timeouts are every 6 hours: 9:54:06, 15:54:06, 21:54:06, 3:54:06.
 
User avatar
smyers119
Member Candidate
Member Candidate
Posts: 232
Joined: Sat Feb 27, 2021 8:16 pm
Location: USA

Re: Mikrotik failover script strange behavior

Fri Dec 03, 2021 8:47 pm

Is ICMP stateless on the mikrotik? (I know the protocol itself is) Maybe your filling up the table, and if that is the case maybe you can turn tracking off for icmp or reduce the timer for it.
 
dave12
newbie
Topic Author
Posts: 31
Joined: Sat Oct 09, 2021 2:35 pm

Re: Mikrotik failover script strange behavior

Fri Dec 03, 2021 9:07 pm

I do not know how to check if ICMP is stateless or not.
I have fasttrack enabled by default and I created a reject filter for 1.1.1.1 for testing, which is disabled.
Also I have a route to 1.1.1.1 in /ip routes.

Edit: I disabled fasttrack. Waiting if something pops up in the log.
 
dave12
newbie
Topic Author
Posts: 31
Joined: Sat Oct 09, 2021 2:35 pm

Re: Mikrotik failover script strange behavior

Fri Dec 03, 2021 10:59 pm

OK, so there was no unecessary failover at 21:54:06. It seems that fasttrack tracks ICMP packets. So perhaps I can enable fasttrack and delete the route to 1.1.1.1. in /ip routes and also the reject filter to 1.1.1.1. So when you have the interface option set in ping, then you do not need the additional route.

Tomorrow I will try to reduce the ping interval to 100ms. Will post here the results. I will leave the current config as is just in case I missed something.
 
User avatar
smyers119
Member Candidate
Member Candidate
Posts: 232
Joined: Sat Feb 27, 2021 8:16 pm
Location: USA

Re: Mikrotik failover script strange behavior

Fri Dec 03, 2021 11:16 pm

OK, so there was no unecessary failover at 21:54:06. It seems that fasttrack tracks ICMP packets. So perhaps I can enable fasttrack and delete the route to 1.1.1.1. in /ip routes and also the reject filter to 1.1.1.1. So when you have the interface option set in ping, then you do not need the additional route.

Tomorrow I will try to reduce the ping interval to 100ms. Will post here the results. I will leave the current config as is just in case I missed something.
If this fixes the problem I would recommend creating a rule that enables fasttrack for everything but icmp if possible. Or even better find out what table you are overflowing and increase it or reduce the timeout timer.
 
dave12
newbie
Topic Author
Posts: 31
Joined: Sat Oct 09, 2021 2:35 pm

Re: Mikrotik failover script strange behavior

Fri Dec 03, 2021 11:28 pm

Thanks I will do a research on this. And will post the results pf the tests.
 
dave12
newbie
Topic Author
Posts: 31
Joined: Sat Oct 09, 2021 2:35 pm

Re: Mikrotik failover script strange behavior

Sat Dec 04, 2021 11:26 pm

I was wrong. The unnecessary failover returned at 3:54:04, and a 6 hours after that. I reduced the sum of the ping interval under 500ms, enabled fasttrack and got the same results, now every 3 hours.
I enadbled debug, route, interface, firewall in my log and I concluded that the ping timeout does not come from the Mikrotik router, but rather from my Cisco modem+router or from the internet.

Here is a screenshot of the log:https://imgur.com/knWDcmL

How could I mitigate this problem? Obviously I need to check the time and let a series of ping to the test IP to fail. Any suggestions?
 
User avatar
smyers119
Member Candidate
Member Candidate
Posts: 232
Joined: Sat Feb 27, 2021 8:16 pm
Location: USA

Re: Mikrotik failover script strange behavior

Sat Dec 04, 2021 11:43 pm

can you run a mtr for 6 hours at the same 200ms and see what you find out.
 
dave12
newbie
Topic Author
Posts: 31
Joined: Sat Oct 09, 2021 2:35 pm

Re: Mikrotik failover script strange behavior

Sun Dec 05, 2021 12:06 am

I can run Mikrotik Trace Route for 6 hours, but I am not an expert, I already invested 2 months into that script, for which experts wanted 500 euros from me.
I am getting impatient.
Ok, sorry I wont do this now, but tomorrow before 9am. please describe the details of mtr or provide a link.
 
User avatar
smyers119
Member Candidate
Member Candidate
Posts: 232
Joined: Sat Feb 27, 2021 8:16 pm
Location: USA

Re: Mikrotik failover script strange behavior

Sun Dec 05, 2021 12:29 am

Here is a newer comercialized version of what I am asking you to do.

It's like ping and traceroute all in one, so we will know where the ping fails

https://www.pingplotter.com/products/free.html
 
dave12
newbie
Topic Author
Posts: 31
Joined: Sat Oct 09, 2021 2:35 pm

Re: Mikrotik failover script strange behavior

Sun Dec 05, 2021 12:42 am

Thanks I will try this. Also I will do a research on mtr. Sorry if I am impatient, but I did not anticipated these problems. Because these problems should not exist. Commercial LABS solutions wont help, nor their price tag.
 
dave12
newbie
Topic Author
Posts: 31
Joined: Sat Oct 09, 2021 2:35 pm

Re: Mikrotik failover script strange behavior

Sun Dec 05, 2021 11:00 pm

Ok, I did the traceroute, here are the results:
Image
 
User avatar
smyers119
Member Candidate
Member Candidate
Posts: 232
Joined: Sat Feb 27, 2021 8:16 pm
Location: USA

Re: Mikrotik failover script strange behavior

Sun Dec 05, 2021 11:05 pm

So it looks like you have some issues upstream, but nothing that would cause your problem.

So it appears you only have a problem when this is coming from the router, but not through the router. Still leads me to believe some kind of table / memory limit / something is filling up in the router itself. Have you looked into doing the failover with recursive routes instead?

viewtopic.php?f=23&t=157048
 
dave12
newbie
Topic Author
Posts: 31
Joined: Sat Oct 09, 2021 2:35 pm

Re: Mikrotik failover script strange behavior

Sun Dec 05, 2021 11:44 pm

Thanks for your answer. it seems that you are correct. My guess is that ping somehow causes stack overflow, because the commands return value must be stored on stack. Modern C-like languages compare values on stack, they do not pop off the values into registers.

Another thing is that I forgot to mention, because I thought it was irrelevant, that first when I set the ping interval to 100ms and the count to 4 I got failovers at 8:45:37 or near and then every 3 hours, but when I reduced the count to 3 and increased the interval to 200ms I had failovers every 6 hours from 9:54:06. This would suggest that again something in the stack is filling up. I did not think that ping count has anything with this. I focused only on time interval. I will try to replicate this.

I can bypass this unnecessary failover if I check for a condition when failover occurs, is time between 9:54:03 and 9:54:06? If yes do nothing (no failover), if no then failover. So forth for the other moments.

I know how to do recursive routing. If I would implement this in a script to achieve sub 1 sec failover, I would have the same problem with overflow.

I need seamless failover, to have smooth video call, in case I have a power or internet outage.
 
User avatar
smyers119
Member Candidate
Member Candidate
Posts: 232
Joined: Sat Feb 27, 2021 8:16 pm
Location: USA

Re: Mikrotik failover script strange behavior

Sun Dec 05, 2021 11:52 pm

Thanks for your answer. it seems that you are correct. My guess is that ping somehow causes stack overflow, because the commands return value must be stored on stack. Modern C-like languages compare values on stack, they do not pop off the values into registers.

Another thing is that I forgot to mention, because I thought it was irrelevant, that first when I set the ping interval to 100ms and the count to 4 I got failovers at 8:45:37 or near and then every 3 hours, but when I reduced the count to 3 and increased the interval to 200ms I had failovers every 6 hours from 9:54:06. This would suggest that again something in the stack is filling up. I did not think that ping count has anything with this. I focused only on time interval. I will try to replicate this.

I can bypass this unnecessary failover if I check for a condition when failover occurs, is time between 9:54:03 and 9:54:06? If yes do nothing (no failover), if no then failover. So forth for the other moments.

I know how to do recursive routing. If I would implement this in a script to achieve sub 1 sec failover, I would have the same problem with overflow.

I need seamless failover, to have smooth video call, in case I have a power or internet outage.
Your only going to get that kind of seamless failover from a SDWAN solution. I use zerotier with a gateway in the cloud for something similar.
 
dave12
newbie
Topic Author
Posts: 31
Joined: Sat Oct 09, 2021 2:35 pm

Re: Mikrotik failover script strange behavior

Mon Dec 06, 2021 12:18 am

Yes I know about SDWAN and Cloud routers, but the problem is that I intend to do online teaching with a company and they do not accept VPN and I have already got accepted with my current (ISP1) internet IP range and location. So because I did this, I do not have an option to use a service with different location or a cloud location for failover, also LTE internet yearly plan costs me more, and the IP range is different and slower speed.
So I bought this Mikrotik router because I heard tales about it and decided to program it (it took me 2 months) to do sub 1 second failover. I can do failover under 100ms and hijack my WebRTC session only if ISP1 TTL = ISP2 TTL, because ISP2 TTL is higher. So I will investigate this overflow, I will try to mitigate it by hit-or-miss command execution (perhaps something clears the stack) or by checking system time on every failover event.
I mean I could use this script like this, but I do not want unnecessary failovers, because it would expose my secondary IP. Also I could send supout.rif to Mikrotik.
 
dave12
newbie
Topic Author
Posts: 31
Joined: Sat Oct 09, 2021 2:35 pm

Re: Mikrotik failover script strange behavior

Tue Dec 07, 2021 2:28 pm

I did not manage to replicate the unnecessary failover at 8:46:37 repeating every 3 hours. The ping count number does not affect the moment of failover.

I concluded that if the sum of the ping intervals are below 500ms, every 3 hours there is a unnecessary failover, if the count is over 500ms, then every 6 hours.
The time value when the failover will occur seems to be determined by script start time or something.

Now I have a failover every 24 hours at 6:54:06. Clearly the script is causing overflow somewhere in RouterOS, I do not think it is a stack overflow. I am out of ideas and patience.

I wont reboot the router, it is running of off a 12V battery (interestingly on 12V it heats up much less then on 24V), and I will send Supout.rif to Mikrotik.

Who is online

Users browsing this forum: orionren and 33 guests