Please forgive me if I don’t make much since, I’m new to networking.
I was able to convince the bosses that we need an actual router vs a home router, we went with MikroTik for the lower cost (vs Cisco) and the amount of power you get. During the set up I am coming across a question about configuration that I can’t seem to figure out on my own. I have a Cloud Core Router that I would like to be able use for Dual WAN automated fail over. I have looked over the failover script, and even tried to use it, but I am having some difficulty getting it to function as properly.
I have the router set up so that it is able to connect to the internet through either ISP, however I can only connect (ping) to the internet one ISP at a time, which ever route is currently active. With ISP_1 set to a distance of 1 and ISP_2 set to a distance of 2, I can ping from ISP_1 but not ISP_2 unless I either edit the distance of ISP_1 or disable that route (which then means I can’t ping from ISP_1). I have tried using a distance of 1 on ISP_2, but the same problem persists. I am not able to have more than one route active at the same time. This is a problem because the script needs to be able to ping out of both ISPs at the same time in order to function properly. When I ran the script, it increased the distance of ISP_2 because it though the connection was down.
Typically you would ping the ISP gateway to test the link and if the ISP link networks have unique numbering you will avoid this problem since both connections will be active as far as their own link networks are concerned.
If a script forces which interface to use then it can ping addresses using either ISP. However if you are pinging from a client the router will only be using the currently active ISP. If you ping the ISP gateways that traffic will go via the corresponding ISP interface because the routes to those link nets do not overlap so both can be active.
This is what is not happening on my router, I posted a screenshot in my first post. In the screenshot, ISP_1 is active and ISP_2 is not. I am not able to ping anything from the router on ISP_2 while it is not active.
It is to my understanding that if ISP_1 should fail, and the route is increased, the script will not be able to see when ISP_1 is back up because it will not be able to ping out of the interface whose route is not active. When I manually set ISP_1 route to a further distance to test this theory, I am proven correct. http://i.imgur.com/Uc61WYQ.png
How do I make sure that when ISP_1 returns to normal service, the router will switch back to using it as the default route.
Try using the ISP gateway addresses as the test addresses. You should always be able to ping those using the relevant ISP connections when they are available even if the default route for that ISP is not active.
Use some address in Internet for testing and create for it individual static route. Than you can check it even the default route is not active or with high distance.
So our ISP had an outage last night, and predictably so, the script didn’t detect the outage because it was checking the gateway, not an actual internet IP address. There wasn’t a problem with the gateway, it was something internal with the ISP.
I created a static route for an IP address as suggested, but it doesn’t seem to work when I test the ping on the gateway that is not active.
Well, in this case you need second rule that sends ping to blackhole if the ISP has outage.
For example:
add check-gateway=ping comment=“testing ip address” distance=1 dst-address=
ip.ip.ip.ip/32 gateway=gw.gw.gw.gw
add comment=“testip address - blackhole” distance=99 dst-address=
ip.ip.ip.ip/32 type=blackholeThe blackhole rule will be normally not active, as it has high distance, but when the gw is not accessible, the checking route becomes invalid and does not count. In this case there is normal “default” (0.0.0.0/0) route that decides about the way where to send checking ping. It fails for first tryout so netwatch should switch to second default route. Then the testing address becomes accessible by second default route and netwatch can “think” that the first route is on again and switch back to it. So it will be flapping if you are switching default routes back and forth.
If you add blackhole for this specific address you ensure that all packets to testing address will be sent thu first wan (if gw looks responding) or to blackhole. It means - no false positive pings when first wan does not have access to internet.
On the other way there could be false negative decision in case the testing address stops responding even the wan connection to internet is fine. Then you have to change your testing ip…
Actually, I apparently messed up when I created the static route the first time. Simply creating the static route (correctly) allowed the script to function as intended. Thank you for your input.