Failover based on bandwidth not on route gateway ping

Hi Forum Gurus,

Sometimes we find that an internet service will partially fail allowing ping replies from the gateway making the router think all is well and create a severe bottleneck causing massive timeouts for our customers.

The idea is to utilize bandwidth based load balancing as both load balancing and failover.

My Idea would be to incorporate some kind of timer along with some kind of traffic monitor to achieve this.

For example:
ISP1 has a partial failure only allowing a couple of bytes of data to pass through. Some kind of traffic monitoring script sees that only a couple of bytes of data got through for a set time period. After the set time period expires the traffic monitoring script would change a key load balancing mangle rule and force all traffic out the good “ISP2” until “ISP1” has been fully restored.

Any thoughts? Is there a script out there that can monitor bandwidth consumption and adjust the mangle rule accordingly?

Thanks in advance…

Just an idea: to have other mikrotik device somewhere out and regularly test both wans against it by script. If it is possible.

Thanks for the response Jarda. But I don’t understand.

Any Takers?

I was thinking of a script to do the following:

The set bitrate will be the minimum threshold to consider the ISP service is down.

step 1: check bitrate @ 2 sec intervals compared against set bit rate
outcome 1: If > then repeat step 1
outcome 2: If < then set bitrate record, wait 4 seconds then go to step 2
step 2: check bitrate again compared against step 1 outcome 2 result
outcome 1: If > then set bit rate go back to step 1
outcome 2: If < then set bit rate go to step 3
step 3: check at 1 second intervals for 15 seconds
outcome 1: if > then set bit rate at end of 15 seconds go to step 1
outcome 2: if < then set bit rate at end of 15 seconds save bit rate and change mangle rule go to step 5
step 4: check bit rate @ 5 sec intervals compared against step 3 variable 2 record
outcome 1: if > than step 3 variable 2 record change mangle rule back and go to step 1
outcome 2: If < than step 3 variable 2 record repeat step 4

One of my questions about this is can a script store a small piece of data on the mikrotik router and later retrieve it for comparison? The other would be how to transform this into a script? I am very green when it comes to writing scripts. I need an experts help on this.

thanks

Hi,
I meant it so: You put somewhere with good throughput to internet, but outside your network, your mikrotik device.
Then you can measure the achievable speed to this device by bandwidth test. You can store the results into variables and set the WAN1:WAN2 ratio according this.

You have to also count with the actual traffic generated from your network that will have influence on your measurements.

I have a similar setup in I have two wan connections that I want to failover with.

The first is very reliable and very stable, but slow, and expensive to use if you exceed the low data cap.
The second is much much cheaper to use, much much faster, but stable as a wind vane.
We do not want load balancing at all. Strictly fail over.


Problem #1
In many cases, I will need to use dhcp-clients on the wan connections, but unfortunately
in RouterOS 6.12 on a RB2011, you cannot adjust the distance on a dynamically created route.

Problem #2
In all cases, the modems will present a gateway even if there is no connection to the internet past the modem,
so check-gateway, as I understand it, doesn’t meet our needs.


Everything I’ve found on failover uses static ip’s and check-gateway.

My google-fu has failed me and I’m putting my head through the ringer trying to develop a script
that will handle this for me.

We plan to deploy about a hundred of these routers over the next two years so I really need to find a solution,
before my boss decides to go with a more expensive less capable product.

I can post my script if anyone wants it, but I’m hoping a smarter person than me has a better solution.

I can’t help feel that this is something so glaringly obvious to experienced users that no one ever
felt the need to document it.

LL

This is probably for other thread as the problem is different. Failover is not balancing. To your problems:

  1. not true. It is possible. Look again at dhcp client settigs…
  2. you have to check internet accessibility, not the gateway. So you have to involve netwatch and change the default gw distances when no ping to internet arrives back. Search the forum or my old posts…

Anyone?

Have you already tried anything?

Yes I ended up using the traffic monitor in a similar manner as i do with my bandwidth based load balancing.

here is the traffic-monitor script:

Monitor1
add comment=“Data1->Data2 Failover” interface=“WAN1” name=tmon5
on-event=“:log warning "FO Debug: Data1 Failed Switching to Data2"\r
\n/ip firewall mangle set [find comment="Load-Balancing here"] new-routi
ng-mark=Data2-rout” threshold=50 traffic=received trigger=below

Monitor2
add comment=“Data2->Data1 Failover” interface=“WAN1” name=tmon6
on-event=“:log warning "FO Debug: Data1 OK"\r
\n/ip firewall mangle set [find comment="Load-Balancing here"] new-routi
ng-mark=Data1-rout” threshold=50 traffic=received

This isn’t what I was looking for but works. What I really need is some sort of timer along with traffic-monitor that can see the bandwidth had dropped below the set threshold for a designated amount of time and react accordingly and revert when services are restored. The point of the timer is to not change the rule every time the the bandwidth drops below the set threshold but only when a partial failure occurs. Without the timer it will be changing constantly and I only want it to change when needed.

Thanks

Is it possible to modify this script to get what I am looking for?

/system script
add name=“tl_down” source=“/interface monitor-traffic [/interface find name “interface”] once do {
:if ($received-bits-per-second > 1048576) do {
/queue simple add name=(username . “_” . [:pick [/system clock get date] 4 6] . “-” . [:pick [/system clock get date] 0 3]
. “-” . [:pick [/system clock get date] 7 11] . “-” . [:pick [/system clock get time] 0 9]) limit-at=262144/262144
max-limit=262144/262144 target-addresses=“ip address” priority=1
:log warning “Traffic limit added for ip address/network/interface”
}
}”
add name=“tl_remove_down” source=“/interface monitor-traffic [/interface find name “interface”] once do {
:if ($received-bits-per-second < 1048576) do {
/queue simple remove [/queue simple find target-addresses=“ip address”]
:log warning “Traffic limit removed for ip address/network/interface”
}
}”

Thanks

I am trying this script with my traffic monitor. Can anyone point out what’s wrong with this? I am really ignorant when it comes to scripting so any help would be greatly appreciated.

:if ($duration = 5s) do={ /ip firewall mangle set [find comment=“Load-Balancing here”]
new-routing-mark=wan2_rout
:log warning “FO Debug: wan1 Failed, switching to wan2” }

Thanks