Community discussions

MikroTik App
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

most effective failover?

Sat Sep 17, 2022 7:47 pm

Which scenarios are you using for dual wan failover? Through just routes, ibgp, or something else?
Anykind of examples will be helpful :)

I am trying to find replacment for this kind of failover
/ip route
add check-gateway=ping distance=1 gateway=8.8.8.8
add check-gateway=ping distance=2 gateway=8.8.4.4
add distance=2 dst-address=8.8.4.4/32 gateway=192.168.2.1 scope=10
add distance=1 dst-address=8.8.8.8/32 gateway=192.168.1.1 scope=10
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Sat Sep 17, 2022 8:19 pm

It depends on what kind of WAN you have - unless you've got public addresses also in LAN and your own AS number, the only thing you can do is to speed up the failure detection using the method described by means of some scripts pinging the canary addresses more frequently than in those hardcoded 10s intervals.

If you trust the availability of your favourite data center much more than the one of your ISP's uplinks, you can run a CHR in that datacenter, use OSPF and BFD to connect one tunnel per each WAN to that CHR, and do the NAT there.
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 20818
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: most effective failover?

Sat Sep 17, 2022 9:43 pm

What is wrong with it?
What has changed in your requirements??
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Sun Sep 18, 2022 12:48 am

It depends on what kind of WAN you have - unless you've got public addresses also in LAN and your own AS number, the only thing you can do is to speed up the failure detection using the method described by means of some scripts pinging the canary addresses more frequently than in those hardcoded 10s intervals.

If you trust the availability of your favourite data center much more than the one of your ISP's uplinks, you can run a CHR in that datacenter, use OSPF and BFD to connect one tunnel per each WAN to that CHR, and do the NAT there.
i have mostly public static ip on one wan interface and one NAT-ed ip on another.
Problem lies where I can go to gateway, its pingable, but there is a ISP problem and it wont go outside of router, then this kind of failover don't work, that's why i try to find something better for me.


maybe to add some netwatch?
/tool netwatch
add down-script="ip route disable [find dst-address=0.0.0.0/0 gateway=8.8.8.8] \
    \r\
    \n:log error \"ISP_1 is up\"\r\
    \n/ip firewall connection remove [find]\r\
    \n" host=1.1.1.1 interval=30s up-script="ip route enable [find dst-address=0\
    .0.0.0/0 gateway=8.8.8.8] \r\
    \n:log error \"ISP_1 is up\"\r\
    \n"
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Sun Sep 18, 2022 11:22 am

i have mostly public static ip on one wan interface and one NAT-ed ip on another.
So the typical SOHO scenario with no own AS number and with two ISPs, i. e. the public addresses from which you connect to servers in the internet are different and thus a failover to the secondary uplink means that all existing sessions break down.

Problem lies where I can go to gateway, its pingable, but there is a ISP problem and it wont go outside of router, then this kind of failover don't work, that's why i try to find something better for me.
That's surprising - this setup (recursive routing where the "canary" (path transparency check) IP addresses are routed via the actual gateways and everything else is routed via the "canary" IPs) deals exactly with the issue you describe, i.e. that the actual gateway stays up but the network behind it loses connection to the rest of the internet. If that happens, the check-gateway ping stops getting responses from the canary IP (virtual gateway) and the route thus becomes inactive.

So if this "doesn't work" for you, something else must be broken (I can e.g. imagine that ping keeps getting through an uplink but other traffic doesn't), or "doesn't work" must mean something else than how I understand it.

maybe to add some netwatch?
...
Ah, the /ip firewall connection remove [find] in your netwatch script maybe gives a hint on what "doesn't work" means? Whereas TCP sessions time out eventually once the remote server stops responding, UDP sessions (IPsec, L2TP, SIP, ...) that get refreshed from the LAN side more frequently than once in 3 minutes stay stuck with the same reply-dst-address. If this is indeed the issue you need to address, do concentrate at that - use a scheduled script to remove these connections whenever the route through their respective WAN becomes inactive.

There is also a follow-up question - what to do when the WAN becomes available again. The answer to this one depends on the usage strategy of the WANs. If the strategy is load distribution, nothing needs to be done - connections that migrated to WAN B due to failure of WAN A may be left running via WAN B even after WAN A recovers. If the strategy is pure backup because WAN B is more expensive and/or offers less bandwidth than WAN A, the script has to remove connections from WAN B once WAN A recovers, but maybe after some guard time rather than immediately.

As compared to netwatch, a scheduled script gives you more flexibility in what it tracks. So e.g. it can inspect the current state of all the WANs at each run, compare it with the state detected during the previous one, and execute actions best matching the particular state change detected (there may be more than two WANs and more than one usage strategy).
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Sun Sep 18, 2022 3:37 pm

there is no l2tp or any tunneling on those routers, pure LAN and WAN usage.
WAN_B is only for backup, its usually copper isp provider, and WAN A is fiber.

So my direct problem is, first there is a about 60 packets, when i ping from dude to that site, before internet gets back, but it is a situation where everything seems ok.
So if i test it myself, and just plug out cable from isp modem, my route table shows that gw 192.168.0.1 (for example) is still alive and keeps that route enabled even though there is no internet.
With this routes I can't do it like that.
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Sun Sep 18, 2022 4:38 pm

there is no l2tp or any tunneling on those routers, pure LAN and WAN usage.
That was just an example of long-term UDP connections that need to be treated specially to properly migrate to the backup WAN. A continuous ping is yet another example of the same - if you run a continuous ping from the LAN side, it keeps updating the existing tracked connection so the packets keep getting translated to the IP address of the dead WAN until you either stop pinging or remove that connection so that the next echo request packet could create a new one although it has the same ping ID.

So if i test it myself, and just plug out cable from isp modem, my route table shows that gw 192.168.0.1 (for example) is still alive and keeps that route enabled even though there is no internet.
With this routes I can't do it like that.
Of course the route dst-address=8.8.8.8 gateway=192.168.0.1 remains active - the Ethernet interface stays up if you disconnect the cable from the other side of the modem. But the route dst-address=0.0.0.0/0 gateway=8.8.8.8 must go down if your scope and target-scope values on all the routes involved are set up properly, i.e. if no other route to 8.8.8.8, which could be used by the 0.0.0.0/0 via 8.8.8.8 one, is available in the system.

I use this at multiple places and there is no issue - in 10 seconds at the latest the loss of internet access via the WAN gets noticed and the 0.0.0.0/0 via 8.8.8.8 route gets deactivated. So I'd suggest you run /route print interval=1s where gateway=8.8.8.8 and watch the status flags while you repeat the experiment with disconnecting the DSL cable. You should see the A to disappear. If it does, the mechanism itself works and the issue you encounter are the surviving tracked connections. If it doesn't, something is wrong with the scope and target-scope values and 8.8.8.8 remains reachable via some other route.
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 20818
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: most effective failover?

Sun Sep 18, 2022 4:51 pm

Okay, your failover is a bit confused, in the sense that there is no need to check the failover through a public DNS server site.
The reason being, if the primary is down, then if the secondary has no access, regardless you have no internet.
However, perhaps there is some logic to knowing ????

Please try with these settings ( NO recursive on failover wan) but checking two DNS sites (cloudfare and then quad9) for main primary WAN.
Pay attention to all of the scope entries!!
.......
/ip route
add check-gateway=ping distance=2 dst-address=0.0.0.0/0 gateway=1.0.0.1 scope=10 target-scope=12
add distance=2 dst-address=1.0.0.1/32 gateway=192.168.1.1 scope=10 target-scope=11
add check-gateway=ping distance=3 dst-address=0.0.0.0/0 gateway=9.9.9.9 scope=10 target-scope=12
add distance=3 dst-address=9.9.9.9/32 gateway=192.168.1 scope=10 target-scope=11
+++++++++++++++++++
add comment=SecondaryISP distance=5 dst-address=0.0.0.0/0 gateway=192.168.2.1 scope=10 target-scope=30
......


If perchance you want to check both routes recursively then try the following ( the primary WAN checks cloudfare and the secondary WAN quad9):
....
/ip route
add check-gateway=ping distance=2 dst-address=0.0.0.0/0 gateway=1.0.0.1 scope=10 target-scope=12
add distance=2 dst-address=1.0.0.1/32 gateway=192.168.1.1 scope=10 target-scope=11
add distance=5 dst-address=0.0.0.0/0 gateway=9.9.9.9 scope=10 target-scope=12
add distance=5 dst-address=9.9.9.9/32 gateway=192.168.2.1 scope=10 target-scope=11
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Mon Sep 19, 2022 11:58 am

there is no l2tp or any tunneling on those routers, pure LAN and WAN usage.
That was just an example of long-term UDP connections that need to be treated specially to properly migrate to the backup WAN. A continuous ping is yet another example of the same - if you run a continuous ping from the LAN side, it keeps updating the existing tracked connection so the packets keep getting translated to the IP address of the dead WAN until you either stop pinging or remove that connection so that the next echo request packet could create a new one although it has the same ping ID.

So if i test it myself, and just plug out cable from isp modem, my route table shows that gw 192.168.0.1 (for example) is still alive and keeps that route enabled even though there is no internet.
With this routes I can't do it like that.
Of course the route dst-address=8.8.8.8 gateway=192.168.0.1 remains active - the Ethernet interface stays up if you disconnect the cable from the other side of the modem. But the route dst-address=0.0.0.0/0 gateway=8.8.8.8 must go down if your scope and target-scope values on all the routes involved are set up properly, i.e. if no other route to 8.8.8.8, which could be used by the 0.0.0.0/0 via 8.8.8.8 one, is available in the system.

I use this at multiple places and there is no issue - in 10 seconds at the latest the loss of internet access via the WAN gets noticed and the 0.0.0.0/0 via 8.8.8.8 route gets deactivated. So I'd suggest you run /route print interval=1s where gateway=8.8.8.8 and watch the status flags while you repeat the experiment with disconnecting the DSL cable. You should see the A to disappear. If it does, the mechanism itself works and the issue you encounter are the surviving tracked connections. If it doesn't, something is wrong with the scope and target-scope values and 8.8.8.8 remains reachable via some other route.
thanks, i will try it. Its a bit hard do find a moment, cause i have a dozent of routers. But i'll keep watching and try to find problem.

Can I just solve this problem and remove all connections when wan1 goes down/up?
Last edited by tomislav91 on Mon Sep 19, 2022 12:02 pm, edited 1 time in total.
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Mon Sep 19, 2022 12:01 pm

Okay, your failover is a bit confused, in the sense that there is no need to check the failover through a public DNS server site.
The reason being, if the primary is down, then if the secondary has no access, regardless you have no internet.
However, perhaps there is some logic to knowing ????

Please try with these settings ( NO recursive on failover wan) but checking two DNS sites (cloudfare and then quad9) for main primary WAN.
Pay attention to all of the scope entries!!
.......
/ip route
add check-gateway=ping distance=2 dst-address=0.0.0.0/0 gateway=1.0.0.1 scope=10 target-scope=12
add distance=2 dst-address=1.0.0.1/32 gateway=192.168.1.1 scope=10 target-scope=11
add check-gateway=ping distance=3 dst-address=0.0.0.0/0 gateway=9.9.9.9 scope=10 target-scope=12
add distance=3 dst-address=9.9.9.9/32 gateway=192.168.1 scope=10 target-scope=11
+++++++++++++++++++
add comment=SecondaryISP distance=5 dst-address=0.0.0.0/0 gateway=192.168.2.1 scope=10 target-scope=30
......


If perchance you want to check both routes recursively then try the following ( the primary WAN checks cloudfare and the secondary WAN quad9):
....
/ip route
add check-gateway=ping distance=2 dst-address=0.0.0.0/0 gateway=1.0.0.1 scope=10 target-scope=12
add distance=2 dst-address=1.0.0.1/32 gateway=192.168.1.1 scope=10 target-scope=11
add distance=5 dst-address=0.0.0.0/0 gateway=9.9.9.9 scope=10 target-scope=12
add distance=5 dst-address=9.9.9.9/32 gateway=192.168.2.1 scope=10 target-scope=11

i will try this, but not sure are those Dns are quite safe ?
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 20818
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: most effective failover?

Mon Sep 19, 2022 8:00 pm

Hmm only billions of people use them yes they are safe. You are not sending any data on them simply checking if its available.
Just ensure you use different DNS servers than these ones under /IP DNS, dont remember why though.
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Mon Sep 19, 2022 8:41 pm

dont remember why though.
Because if you use x.x.x.x as the "canary" address (to monitor the transparency of a articular WAN), it is only reachable through that WAN. So strictly speaking you can use the canary address also as DNS servers but not as the only one.
 
S8T8
Member Candidate
Member Candidate
Posts: 123
Joined: Thu Sep 15, 2022 7:15 pm

Re: most effective failover?

Tue Sep 20, 2022 1:09 am

Apologies for adding question not too relevant, I tried to set up a failover but never understood how to properly do using @anav example, I have a PPPoE and DHCP Client connection, or a static IP and DHCP Client connections, should work this easy way?
/interface pppoe-client add add-default-route=yes default-route-distance=1 interface=ether1...
/ip dhcp-client add add-default-route=yes default-route-distance=2 interface=ether2...
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 20818
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: most effective failover?

Tue Sep 20, 2022 1:35 am

I would change the pppoe connection and wan2 connection such that ADD DEFAULT ROUTE=NO.

Then you can add the routes needed manually as per the examples.
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Tue Sep 20, 2022 9:00 am

Apologies for adding question not too relevant
If you have to piggyback a topic, choose a long-sleeping one rather than making a live one a multi-threaded mess. Or, even better, just create your own and wait for the moderators to approve it, it rarely takes longer than a day.

To your question - @anav's advice will work in a limited number of cases. If you want a better one, do what I've suggested above.
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Tue Sep 20, 2022 12:47 pm

I will change it, also I'll try solution for faster/better changing manually wan, if I have problems with that ISP (some pings goes around 200+) and i want to change it mannualy, i must disable rules in Ip-routes and thats only way. Can we maybe somehow make it more easily, for support team. Problem is if you dont do it fast, it stuck.
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Tue Sep 20, 2022 12:52 pm

dont remember why though.
Because if you use x.x.x.x as the "canary" address (to monitor the transparency of a articular WAN), it is only reachable through that WAN. So strictly speaking you can use the canary address also as DNS servers but not as the only one.
so If i use DNS for IP-ROUTE , thaose DNS i shouldn't use for IP-DNS?
2022-09-20 11_49_46-Window.png
2022-09-20 11_50_01-Window.png
You do not have the required permissions to view the files attached to this post.
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Tue Sep 20, 2022 1:52 pm

so If i use DNS for IP-ROUTE , thaose DNS i shouldn't use for IP-DNS?
You can but while the WAN whose transparency is monitored using 8.8.4.4 has no connection to the internet, DNS requests to 8.8.4.4 fail, so the DNS cliens switch to the other DNS server. Most of them do not use round robin but keep using the same server as long as it is reachable; once it stops responding, they move to the next one in the list and keep using it even if the previous one becomes rechable again. So if the primary link fails now and then whereas the backup is stable, your DNS traffic stays on the backup link.
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 20818
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: most effective failover?

Tue Sep 20, 2022 2:16 pm

Interesting, in any case there are lots of decent DNS servers out there.
My advice is to stick with the good DNS servers for your IP DNS service, like cloudfare/quad9
and use google for the routes checking.........
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Tue Sep 20, 2022 2:39 pm

so better idea is to use for example 1.0.0.1 and 9.9.9.9 for checking in IP-DNS and for IP-DNS 8.8.4.4 and 8.8.8.8?

From your post
/ip route
add check-gateway=ping distance=2 dst-address=0.0.0.0/0 gateway=1.0.0.1 scope=10 target-scope=12
add distance=2 dst-address=1.0.0.1/32 gateway=192.168.1.1 scope=10 target-scope=11
add check-gateway=ping distance=3 dst-address=0.0.0.0/0 gateway=9.9.9.9 scope=10 target-scope=12
add distance=3 dst-address=9.9.9.9/32 gateway=192.168.1 scope=10 target-scope=11
+++++++++++++++++++
add comment=SecondaryISP distance=5 dst-address=0.0.0.0/0 gateway=192.168.2.1 scope=10 target-scope=30



or recursive



/ip route
add check-gateway=ping distance=2 dst-address=0.0.0.0/0 gateway=1.0.0.1 scope=10 target-scope=12
add distance=2 dst-address=1.0.0.1/32 gateway=192.168.1.1 scope=10 target-scope=11
add distance=5 dst-address=0.0.0.0/0 gateway=9.9.9.9 scope=10 target-scope=12
add distance=5 dst-address=9.9.9.9/32 gateway=192.168.2.1 scope=10 target-scope=11
What is better to use for our setup? Should we add some connection tracking removing or something? Is there anykind of way to change routes manually faster than disabling in the routes?
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Tue Sep 20, 2022 3:00 pm

Should we add some connection tracking removing or something? Is there anykind of way to change routes manually faster than disabling in the routes?
One thing is how fast you detect that a route needs to be disabled, another thing is how fast you actually disable it. With the recursive routing approach, the detection takes place once every 10 seconds and the actual disabling is immediate; with scripting, the detection may be faster (like once per second but it may take longer now and then) and the actual disabling will take slightly more time (but still well below a second) and will cause a configuration update (i.e. a disk write operation).

Dynamic routing protocols can be even faster but they cannot be used with SOHO-type WANs.

No matter which approach you use and how fast the routes get disabled, existing connections will not be removed automatically. The only case when connections are removed automatically is if their reply-dst-address has been set using an action=masquerade rule and gets lost (DHCP lease expiration, interface physically down, ...). So yes, removal of existing connections is needed so that persistent connections could re-establish via the backup WAN. /ip dhcp-client release or /interface ethernet disable etherX ; interface ethernet enable etherX is faster than /ip firewall connection remove [find where ...] provided that you use action=masquerade rules for src-nat.
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Tue Sep 20, 2022 3:05 pm

Should we add some connection tracking removing or something? Is there anykind of way to change routes manually faster than disabling in the routes?
One thing is how fast you detect that a route needs to be disabled, another thing is how fast you actually disable it. With the recursive routing approach, the detection takes place once every 10 seconds and the actual disabling is immediate; with scripting, the detection may be faster (like once per second but it may take longer now and then) and the actual disabling will take slightly more time (but still well below a second) and will cause a configuration update (i.e. a disk write operation).

Dynamic routing protocols can be even faster but they cannot be used with SOHO-type WANs.

No matter which approach you use and how fast the routes get disabled, existing connections will not be removed automatically. The only case when connections are removed automatically is if their reply-dst-address has been set using an action=masquerade rule and gets lost (DHCP lease expiration, interface physically down, ...). So yes, removal of existing connections is needed so that persistent connections could re-establish via the backup WAN. /ip dhcp-client release or /interface ethernet disable etherX ; interface ethernet enable etherX is faster than /ip firewall connection remove [find where ...] provided that you use action=masquerade rules for src-nat.
So does this gonna be faster. First, instead of masquarade use src-nat? I assume that connections are behavior different when you use mascuarade and source nat in ACTION part of configuring.
With that, maybe to add netwatch with for ex 8.8.8.8 and two scripts UP and DOWN, UP will enable WAN1 and DOWN will disable it.
Am i understand it good?
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 20818
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: most effective failover?

Tue Sep 20, 2022 3:19 pm

I am not sure, to use action=src-nat vice masquerade in the general rule sense ONLY applies to a static Fixed WANIP.
It may have some benefit to not hanging onto older sessions for a long time, but is beyond my memory/knowledge scope.
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Tue Sep 20, 2022 3:24 pm

masquerade causes the auto-destruction of connections if their reply-dst-address disappears, but it has no relationship to the speed of detection. It can only speed up the removal of connections if combined with disabling and re-enabling the interface bearing the address.

You can replace recursive routing by what you suggest (netwatch with on-up and on-down) to get faster detection. And you can include the connection removal to the on-down script.
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Tue Sep 20, 2022 3:42 pm

masquerade causes the auto-destruction of connections if their reply-dst-address disappears, but it has no relationship to the speed of detection. It can only speed up the removal of connections if combined with disabling and re-enabling the interface bearing the address.

You can replace recursive routing by what you suggest (netwatch with on-up and on-down) to get faster detection. And you can include the connection removal to the on-down script.
So, i can do it like this.

add this to down script to remove connection tracking
/ ip firewall connection {:foreach r in=[find] do={remove $r}}
and use this kind of ip-route which anav propose
/ip route
add check-gateway=ping distance=2 dst-address=0.0.0.0/0 gateway=1.0.0.1 scope=10 target-scope=12
add distance=2 dst-address=1.0.0.1/32 gateway=192.168.1.1 scope=10 target-scope=11
add check-gateway=ping distance=3 dst-address=0.0.0.0/0 gateway=9.9.9.9 scope=10 target-scope=12
add distance=3 dst-address=9.9.9.9/32 gateway=192.168.1 scope=10 target-scope=11
+++++++++++++++++++
add comment=SecondaryISP distance=5 dst-address=0.0.0.0/0 gateway=192.168.2.1 scope=10 target-scope=30
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Tue Sep 20, 2022 3:58 pm

add this to down script to remove connection tracking
/ ip firewall connection {:foreach r in=[find] do={remove $r}}
You don't need the foreach, the remove can work with a list directly:
/ip firewall connection remove [find]
and use this kind of ip-route which anav propose
If you are going to disable and enable the routes manually, they need not be recursive via the virtual gateways:
/ip route
add dst-address=0.0.0.0/0 gateway=192.168.0.1
add dst-address=8.8.8.8 gateway=192.168.0.1
add dst-address=8.8.8.8 type=blackhole distance=20
add dst-address=0.0.0.0/0 gateway=192.168.1.1 distance=2
add dst-address=8.8.4.4 gateway=192.168.1.1
add dst-address=8.8.4.4 type=blackhole distance=20
You need the canary addresses to use only a single WAN each, hence the blackhole routes, 8.8.8.8 must not become rechable via 192.168.1.1 if the WAN through which 192.168.0.1 is accessible goes down.
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 20818
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: most effective failover?

Tue Sep 20, 2022 4:10 pm

@tomislav please post your config at the end when its all working correctly.
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Tue Sep 20, 2022 8:50 pm

add this to down script to remove connection tracking
/ ip firewall connection {:foreach r in=[find] do={remove $r}}
You don't need the foreach, the remove can work with a list directly:
/ip firewall connection remove [find]
and use this kind of ip-route which anav propose
If you are going to disable and enable the routes manually, they need not be recursive via the virtual gateways:
/ip route
add dst-address=0.0.0.0/0 gateway=192.168.0.1
add dst-address=8.8.8.8 gateway=192.168.0.1
add dst-address=8.8.8.8 type=blackhole distance=20
add dst-address=0.0.0.0/0 gateway=192.168.1.1 distance=2
add dst-address=8.8.4.4 gateway=192.168.1.1
add dst-address=8.8.4.4 type=blackhole distance=20
You need the canary addresses to use only a single WAN each, hence the blackhole routes, 8.8.8.8 must not become rechable via 192.168.1.1 if the WAN through which 192.168.0.1 is accessible goes down.
I need it to change manually only when I notice that ISP signal is bad, ping is high or something like that, but it need to automaticlly change when ISP is down.
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Tue Sep 20, 2022 10:57 pm

it need to automaticlly change when ISP is down.
That's what your netwatch will take care about.
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Wed Sep 21, 2022 9:15 am

it need to automaticlly change when ISP is down.
That's what your netwatch will take care about.
So...Is this final thoughs:
Set IP-DNS to for example 8.8.8.8 and 8.8.4.4

IP address for WAN1 is 192.168.0.1
IP address for WAN2 is 192.168.1.1
add dst-address=0.0.0.0/0 gateway=192.168.0.1
add dst-address=9.9.9.9 gateway=192.168.0.1
add dst-address=9.9.9.9 type=blackhole distance=20
add dst-address=0.0.0.0/0 gateway=192.168.1.1 distance=2
add dst-address=1.1.1.1 gateway=192.168.1.1
add dst-address=1.1.1.1 type=blackhole distance=20
netwatch script, is it better to ping google dns or some dns used in ip-route (9.9.9.9 and 1.1.1.1)?
UP script enable WAN1_interface
DOWN script disable WAN1_interface and
/ip firewall connection remove [find]
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Wed Sep 21, 2022 9:25 am

netwatch script, is it better to ping google dns or some dns used in ip-route (9.9.9.9 and 1.1.1.1)?
The very essence of the method of detecting WAN availability (transparency through the ISP all the way to the internet) is that you let netwatch ping an address that is only reachable through the WAN being tested, so if internet stops being reachable through that WAN, netwatch stops getting the ping responses.

UP script enable WAN1_interface
DOWN script disable WAN1_interface and
/ip firewall connection remove [find]
You cannot disable the WAN1_interface forever using the on-down script - if you do that, on-up will never happen. on-up must only enable the default route via WAN1; on-down must disable the default route via WAN1 (but not the /32 route to the monitored address!). It can also disable the WAN1_interface as a way to quickly remove the firewall connections, but it has to re-enable it again immediately.
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Wed Sep 21, 2022 9:36 am

netwatch script, is it better to ping google dns or some dns used in ip-route (9.9.9.9 and 1.1.1.1)?
The very essence of the method of detecting WAN availability (transparency through the ISP all the way to the internet) is that you let netwatch ping an address that is only reachable through the WAN being tested, so if internet stops being reachable through that WAN, netwatch stops getting the ping responses.

UP script enable WAN1_interface
DOWN script disable WAN1_interface and
/ip firewall connection remove [find]
You cannot disable the WAN1_interface forever using the on-down script - if you do that, on-up will never happen. on-up must only enable the default route via WAN1; on-down must disable the default route via WAN1 (but not the /32 route to the monitored address!). It can also disable the WAN1_interface as a way to quickly remove the firewall connections, but it has to re-enable it again immediately.
So, this is corrected one?
add dst-address=0.0.0.0/0 gateway=192.168.0.1 comment"ISP_1"
add dst-address=9.9.9.9 gateway=192.168.0.1
add dst-address=9.9.9.9 type=blackhole distance=20
add dst-address=0.0.0.0/0 gateway=192.168.1.1 distance=2 comment"ISP_2"
add dst-address=1.1.1.1 gateway=192.168.1.1
add dst-address=1.1.1.1 type=blackhole distance=20


on-up
ip route enable [find comment="ISP_1"] 



on-down
ip route disable [find comment="ISP_1"] 
/ip firewall connection remove [find]

Which address should I use for this?
that is only reachable through the WAN being tested
I am not sure what is reachable only through that wan?
 
User avatar
genesispro
Member
Member
Posts: 300
Joined: Fri Mar 14, 2014 12:33 pm

Re: most effective failover?

Wed Sep 21, 2022 1:55 pm

It depends on what kind of WAN you have - unless you've got public addresses also in LAN and your own AS number, the only thing you can do is to speed up the failure detection using the method described by means of some scripts pinging the canary addresses more frequently than in those hardcoded 10s intervals.

If you trust the availability of your favourite data center much more than the one of your ISP's uplinks, you can run a CHR in that datacenter, use OSPF and BFD to connect one tunnel per each WAN to that CHR, and do the NAT there.
@Sindy I have been looking for such a setup a while now(OSPF -> CHR). Would you share some information on how to accomplish that? I mean how to setup OSPF/BFD. Thank you in advance
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Wed Sep 21, 2022 3:27 pm

Which address should I use for this?
that is only reachable through the WAN being tested
I am not sure what is reachable only through that wan?
When routing a packet using a routing table, among all active routes whose dst-address matches the destination address of the packet, the one with the longest dst-address prefix is chosen. So although a route to 0.0.0.0/0 is active, if a route to 9.9.9.9 (i.e. with a /32 prefix) is active too, the latter will be chosen for a packet whose destination address is 9.9.9.9.

As the route to 9.9.9.9 via 192.168.0.1 becomes inactive if the interface goes down, we need the blackhole route to 9.9.9.9 that becomes active next (as it has a higher metric than the one via 192.168.0.1); if it wasn't for this blackhole route, if the ISP1 interface went down, packets to 9.9.9.9 would be delivered using the default route via 192.168.0.1. Which would not be much of an issue in your particular setup, but in general it's something you don't want to happen.
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Wed Sep 21, 2022 3:46 pm

I have been looking for such a setup a while now(OSPF -> CHR). Would you share some information on how to accomplish that? I mean how to setup OSPF/BFD. Thank you in advance
Same remark like to @S8T8 - a separate topic would be better. So if my response here is not suitable, please create a new one. With your amount of karma, it should not have to wait for moderation.

To the subject - what level of detail do you need?

Roughly:
  • BFD does not work at ROS 7 yet (7.5 is the stable version as of writing this).
  • create two tunnels, one via each WAN, to the CHR. L2TP, IPIP, GRE - whatever you prefer and can handle the WAN topology. I tend to use L2TP as it can provide MTU of 1500 bytes no matter what. If there is NAT and you don't use IPsec as a transport, there's no other choice than L2TP (neither PPTP nor SSTP are suitable for various reasons).
  • attach a /30 or larger subnet to each of these tunnels at each end (OSPF insists on a subnet and Mikrotik doesn't support /31 subnets) and set up OSPF as such, where the CHR advertises itself as a default gateway. On the "homeTik", you have to make sure that the default route obtained via OSPF will be used for the traffic you want to be sent via the tunnel - requirements may differ so you may or may not need multiple routing tables.
  • once OSPF as such is working, just add put the tunnel interfaces to the BFD list at both ends and set the probe rates and timeouts as needed. BFD can only be run along OSPF or BGP links as only these protocols can handle the BFD status change notifications.
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Wed Sep 21, 2022 4:10 pm

Which address should I use for this?
I am not sure what is reachable only through that wan?
When routing a packet using a routing table, among all active routes whose dst-address matches the destination address of the packet, the one with the longest dst-address prefix is chosen. So although a route to 0.0.0.0/0 is active, if a route to 9.9.9.9 (i.e. with a /32 prefix) is active too, the latter will be chosen for a packet whose destination address is 9.9.9.9.

As the route to 9.9.9.9 via 192.168.0.1 becomes inactive if the interface goes down, we need the blackhole route to 9.9.9.9 that becomes active next (as it has a higher metric than the one via 192.168.0.1); if it wasn't for this blackhole route, if the ISP1 interface went down, packets to 9.9.9.9 would be delivered using the default route via 192.168.0.1. Which would not be much of an issue in your particular setup, but in general it's something you don't want to happen.
I understand that, but which IP to use in netwatch? You said
that is only reachable through the WAN being tested
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Wed Sep 21, 2022 4:15 pm

I understand that, but which IP to use in netwatch? You said
that is only reachable through the WAN being tested
Since you have defined a route to 9.9.9.9 to test ISP1, netwatch must test 9.9.9.9.
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Wed Sep 21, 2022 4:26 pm

I understand that, but which IP to use in netwatch? You said
Since you have defined a route to 9.9.9.9 to test ISP1, netwatch must test 9.9.9.9.
Thanks, whole setup is
/ip route 
add dst-address=0.0.0.0/0 gateway=192.168.0.1 comment"ISP_1"
add dst-address=9.9.9.9 gateway=192.168.0.1
add dst-address=9.9.9.9 type=blackhole distance=20
add dst-address=0.0.0.0/0 gateway=192.168.1.1 distance=2 comment"ISP_2"
add dst-address=1.1.1.1 gateway=192.168.1.1
add dst-address=1.1.1.1 type=blackhole distance=20





/tool netwatch
add  down-script="ip route disable [find comment=\"ISP_1\"] \r\
    \n/ip firewall connection remove [find]\r\
    \n" host=9.9.9.9 interval=30s up-script=\
    "ip route enable [find comment=\"ISP_1\"] 
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 20818
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: most effective failover?

Wed Sep 21, 2022 9:42 pm

What would be the purpose of the other DNS inquiry 1.1.1.1 ?
The script does not refer to it and I don't see it being used and can probably be removed.

Also, if the idea was to be better than check-gateway (checks every 10 seconds), why not change netwatch to every 5 seconds vice 30 ????.

/ip route
add dst-address=0.0.0.0/0 gateway=192.168.0.1 comment"ISP_1"
add dst-address=9.9.9.9 gateway=192.168.0.1
add dst-address=9.9.9.9 type=blackhole distance=20
add dst-address=0.0.0.0/0 gateway=192.168.1.1 distance=2 comment"ISP_2"
/tool netwatch
add down-script="ip route disable [find comment=\"ISP_1\"] \r\
\n/ip firewall connection remove [find]\r\
\n" host=9.9.9.9 interval=5s up-script=\
"ip route enable [find comment=\"ISP_1\"]
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Fri Sep 23, 2022 9:36 am

What would be the purpose of the other DNS inquiry 1.1.1.1 ?
The script does not refer to it and I don't see it being used and can probably be removed.

Also, if the idea was to be better than check-gateway (checks every 10 seconds), why not change netwatch to every 5 seconds vice 30 ????.

/ip route
add dst-address=0.0.0.0/0 gateway=192.168.0.1 comment"ISP_1"
add dst-address=9.9.9.9 gateway=192.168.0.1
add dst-address=9.9.9.9 type=blackhole distance=20
add dst-address=0.0.0.0/0 gateway=192.168.1.1 distance=2 comment"ISP_2"
/tool netwatch
add down-script="ip route disable [find comment=\"ISP_1\"] \r\
\n/ip firewall connection remove [find]\r\
\n" host=9.9.9.9 interval=5s up-script=\
"ip route enable [find comment=\"ISP_1\"]

you first refer to use 2 check, from post #8 and @sindy at post #26
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Fri Sep 23, 2022 10:00 am

@tomislav91, @anav has turned my attention (via another channel) to this post by @rextended- in another words, I was wrong and :remove [find] is not reliable. But it is also not sufficient to use just :foreach x in=[find ...] do={:remove $x} - the remove must use another find because removing an empty list is not handled as an error while removing of a non-existent item is. So making use of the properties of masquerade and switching off and on the interface is a brutal method of removing the stuck connections, but probably a lot faster. On the other hand, the slow way may spread the CPU load more evenly over time.
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Fri Sep 23, 2022 10:05 am

@tomislav91, @anav has turned my attention (via another channel) to this post by @rextended- in another words, I was wrong and :remove [find] is not reliable. But it is also not sufficient to use just :foreach x in=[find ...] do={:remove $x} - the remove must use another find because removing an empty list is not handled as an error while removing of a non-existent item is. So making use of the properties of masquerade and switching off and on the interface is a brutal method of removing the stuck connections, but probably a lot faster. On the other hand, the slow way may spread the CPU load more evenly over time.
yeah i read it.
So it's better to use
/ip fire conn
:foreach idc in=[find where timeout>60] do={
 remove [find where .id=$idc]
}
which script is more usable of those two?
/ip route 
add dst-address=0.0.0.0/0 gateway=192.168.0.1 comment"ISP_1"
add dst-address=9.9.9.9 gateway=192.168.0.1
add dst-address=9.9.9.9 type=blackhole distance=20
add dst-address=0.0.0.0/0 gateway=192.168.1.1 distance=2 comment"ISP_2"
add dst-address=1.1.1.1 gateway=192.168.1.1
add dst-address=1.1.1.1 type=blackhole distance=20

Or
	/ip route
add dst-address=0.0.0.0/0 gateway=192.168.0.1 comment"ISP_1"
add dst-address=9.9.9.9 gateway=192.168.0.1
add dst-address=9.9.9.9 type=blackhole distance=20
add dst-address=0.0.0.0/0 gateway=192.168.1.1 distance=2 comment"ISP_2"
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?  [SOLVED]

Fri Sep 23, 2022 10:20 am

which script is more usable of those two?
It depends on the use case:
  • if the second WAN is used solely as a backup and there is no further backup, there is no point in tracking availability of the second WAN because there's nothing you can do when it becomes unavailable.
  • if you have some traffic that should prefer WAN1 but use WAN2 if WAN1 fails, and also some traffic that should prefer WAN2 and use WAN1 if WAN2 fails, then tracking availability of WAN2 makes sense.
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Fri Sep 23, 2022 10:23 am

which script is more usable of those two?
It depends on the use case:
  • if the second WAN is used solely as a backup and there is no further backup, there is no point in tracking availability of the second WAN because there's nothing you can do when it becomes unavailable.
  • if you have some traffic that should prefer WAN1 but use WAN2 if WAN1 fails, and also some traffic that should prefer WAN2 and use WAN1 if WAN2 fails, then tracking availability of WAN2 makes sense.
yeah, it's just pure backup. I will try with this
/ip route
add dst-address=0.0.0.0/0 gateway=192.168.0.1 comment"ISP_1"
add dst-address=9.9.9.9 gateway=192.168.0.1
add dst-address=9.9.9.9 type=blackhole distance=20
add dst-address=0.0.0.0/0 gateway=192.168.1.1 distance=2 comment"ISP_2"
Thanks a lot!
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 20818
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: most effective failover?

Fri Sep 23, 2022 4:08 pm

Hi tomi, my comment on rejigging your config was based on the new non-recursive CHANGED config you presented. You cannot compare my comments on that , to some post made way back when talking about a completely different config. Just keep that in mind for future configurations. When you drastically change Routes, previous comments relating to a different config are probably no longer germane!!!

Good question, and waiting for Sindy's reply...........

/ip fire conn
:foreach idc in=[find where timeout>60] do={
remove [find where .id=$idc]
}

But I thought rextended has this......

:global previousIP "18.11.23.33"

/ip fire conn
:foreach idc in=[find where timeout>60 and reply-dst-address~$previousIP] do={
remove [find where .id=$idc]
}

So my question is why is yours missing the part in blue?
How do we add this previous variable in netwatch?



Another option.... :-)))))

/ip fire conn
:foreach idc in=[find where timeout>60 and reply-dst-address~$previousIP]
do command={remove $idc} on-error={:nothing}

At least the text speaks more plainly.......
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Fri Sep 23, 2022 4:35 pm

So my question is why is yours missing the part in blue?
If you don't mind collateral damages, it works as well - removing all connections, not just those using the dead WAN, does the job too (and the collateral damages are not fatal and will heal automatically).
 
User avatar
rextended
Forum Guru
Forum Guru
Posts: 12330
Joined: Tue Feb 25, 2014 12:49 pm
Location: Italy
Contact:

Re: most effective failover?

Sat Sep 24, 2022 12:59 am

(thanks for mentioning me)

Please @anav do not add on-horror-resume-next to my script... 🤮

Another alternative is to delete the connection if is NOT from current IP directly where the IP is obtained:

If WAN IP is coming from any ppp, put this on on-up script of the used ppp profile
/ip fire conn remove [find where timeout>60 and (!(reply-dst-address~[:tostr $"local-address"]))]

If WAN IP is coming from dhcp-client, put this on on-lease script of the dhcp-client used
/ip fire conn remove [find where timeout>60 and (!(reply-dst-address~[:tostr $"lease-address"]))]

If is dual or more WAN must be considered all IPs like:
/ip fire conn remove [find where timeout>60 and (!(reply-dst-address~$wlanoneip)) and (!(reply-dst-address~$wlantwoip))] 

Is also possible to create one array with all the WAN IPs, and after converted it to one string,
search inside the string if the IP of the connection tracked is present/keeped or not/deleted.
/ip fire conn remove [find where timeout>60 and (!($stringofallwanips~reply-dst-address))] 
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 20818
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: most effective failover?

Sat Sep 24, 2022 4:07 pm

Okay now you are just showing off LOL............
When I have some coherent questions in order to improve some guides, will ask them.
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Tue Sep 27, 2022 11:54 am

so you all consider that this netwatch will do the job
/ip fire conn
:foreach idc in=[find where timeout>60] do={
 remove [find where .id=$idc]
}
2022-09-27 14_04_48-Window.png
2022-09-27 14_05_01-Window.png
It's also duable to do with previous IP but not see for my setup more effective than this one. Stick it simple but working.
Better than my first setup config, i think.

Do you think maybe I will get better result with this (if 0.1 and 1.1 are my wan addresses)
 global wanoneip 192.168.0.1
  global wantwoip 192.168.1.1
  /ip fire conn remove [find where timeout>60 and (!(reply-dst-address~$wanoneip)) and (!(reply-dst-address~$wantwoip))] 
You do not have the required permissions to view the files attached to this post.
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Wed Oct 11, 2023 10:04 am

Which address should I use for this?
I am not sure what is reachable only through that wan?
When routing a packet using a routing table, among all active routes whose dst-address matches the destination address of the packet, the one with the longest dst-address prefix is chosen. So although a route to 0.0.0.0/0 is active, if a route to 9.9.9.9 (i.e. with a /32 prefix) is active too, the latter will be chosen for a packet whose destination address is 9.9.9.9.

As the route to 9.9.9.9 via 192.168.0.1 becomes inactive if the interface goes down, we need the blackhole route to 9.9.9.9 that becomes active next (as it has a higher metric than the one via 192.168.0.1); if it wasn't for this blackhole route, if the ISP1 interface went down, packets to 9.9.9.9 would be delivered using the default route via 192.168.0.1. Which would not be much of an issue in your particular setup, but in general it's something you don't want to happen.
sindy can you a little more expand your answer about blackhole. So if script is:
/ip route
add dst-address=0.0.0.0/0 gateway=192.168.0.1 comment"ISP_1"
add dst-address=9.9.9.9 gateway=192.168.0.1
add dst-address=9.9.9.9 type=blackhole distance=20
add dst-address=0.0.0.0/0 gateway=192.168.1.1 distance=2 comment"ISP_2"
and ISP_1 goes down, we need blackhole for ?
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Wed Oct 11, 2023 12:12 pm

If ISP_1 "goes down" in terms that internet is not reachable via ISP_1 but the physical interface connected to the CPE provided by ISP_1 stays up (which is your main concern as per post #4, you don't need the blackhole route because in such a situation, the route to 9.9.9.9 via 192.168.0.1 remains active.
But if ISP_1 "goes down" in terms that their CPE goes down and thus the port of the Mikrotik connected to that CPE goes down too, the specific route to 9.9.9.9 via 192.168.0.1 becomes inactive, and if there is no other specific route to 9.9.9.9, the packets to 9.9.9.9 start being routed using the currently active route with a shorter prefix that also matches 9.9.9.9. So depending on the rest of your routing, the netwatch may keep getting its ping responses from 9.9.9.9 and therefore keep thinking ISP_1 is working all the way to the internet.

In your particular case above, when the physical interface to ISP_1 CPE goes down, all the traffic migrates to ISP_2 which is what you want in general, but without the blackhole route to 9.9.9.9, the netwatch would keep seing 9.9.9.9 as reachable so it would take no action.

With the recursive next-hop search approach, the blackhole route to the canary IP address (9.9.9.9) is not necessary because correct settings of scope and target-scope on the routes prevent the wrong route from being used recursively.
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Thu Oct 12, 2023 9:19 am

If ISP_1 "goes down" in terms that internet is not reachable via ISP_1 but the physical interface connected to the CPE provided by ISP_1 stays up (which is your main concern as per post #4, you don't need the blackhole route because in such a situation, the route to 9.9.9.9 via 192.168.0.1 remains active.
But if ISP_1 "goes down" in terms that their CPE goes down and thus the port of the Mikrotik connected to that CPE goes down too, the specific route to 9.9.9.9 via 192.168.0.1 becomes inactive, and if there is no other specific route to 9.9.9.9, the packets to 9.9.9.9 start being routed using the currently active route with a shorter prefix that also matches 9.9.9.9. So depending on the rest of your routing, the netwatch may keep getting its ping responses from 9.9.9.9 and therefore keep thinking ISP_1 is working all the way to the internet.

In your particular case above, when the physical interface to ISP_1 CPE goes down, all the traffic migrates to ISP_2 which is what you want in general, but without the blackhole route to 9.9.9.9, the netwatch would keep seing 9.9.9.9 as reachable so it would take no action.

With the recursive next-hop search approach, the blackhole route to the canary IP address (9.9.9.9) is not necessary because correct settings of scope and target-scope on the routes prevent the wrong route from being used recursively.
thanks, so we have 2 possible situation, first is that my cpe is not reachable, then we need blackhole? And if my cpe is not reachable then we need blackhole ?
So how can i improve my firewall settings further on, can we prevent maybe to users use port scanners for sniffing etc? Or it should be best to have some real firewall behind it :D
 
User avatar
sindy
Forum Guru
Forum Guru
Posts: 10745
Joined: Mon Dec 04, 2017 9:19 pm

Re: most effective failover?

Thu Oct 12, 2023 12:12 pm

thanks, so we have 2 possible situation, first is that
my cpe is not reachable, then we need blackhole?
And if
my cpe is not reachable then we need blackhole ?
I'm not sure what you actually wanted to express, so please reword that.

So how can i improve my firewall settings further on, can we prevent maybe to users use port scanners for sniffing etc? Or it should be best to have some real firewall behind it :D
In the context of this topic, this is about as relevant as a question what is the best way to replace brake pads.

The firewall functionality of Mikrotik only goes as far as L4 (protocols and ports). Regarding what you can use it for, first of all you have to decide who is the potential attacker and who is the potential target, so your firewall rules will look different if the router is used to connect unrelated customers to the internet vs if it is a border router of a SOHO network vs if it is a mid-network router on public addresses. If you don't mind breaking into users' privacy, firewalls that can break into TLS connections and can detect malware inside the TLS payload can identify and block much more threats than the Mikrotik. If you have a modern antivirus on the endpoints doing the same thing, you can stay with Mikrotik as an "up to L4" firewall; if you don't, you indeed have to consider a device whose primary purpose is security. But again, they can break into TLS sessions flowing through them but not into VPNs the users set up on the endpoints, so the endpoint antivirus is in a better position from this point of view - until the VPN is built into the browser and the browser doesn't provide an API for the antivirus to scan the payload. Which is the same API one would use to spy on the payload, so go figure.
 
tomislav91
Member
Member
Topic Author
Posts: 303
Joined: Fri May 26, 2017 12:47 pm

Re: most effective failover?

Mon Oct 16, 2023 1:40 pm

thanks, so we have 2 possible situation, first is that
my cpe is not reachable, then we need blackhole?
And if
my cpe is not reachable then we need blackhole ?
I'm not sure what you actually wanted to express, so please reword that.

So how can i improve my firewall settings further on, can we prevent maybe to users use port scanners for sniffing etc? Or it should be best to have some real firewall behind it :D
In the context of this topic, this is about as relevant as a question what is the best way to replace brake pads.

The firewall functionality of Mikrotik only goes as far as L4 (protocols and ports). Regarding what you can use it for, first of all you have to decide who is the potential attacker and who is the potential target, so your firewall rules will look different if the router is used to connect unrelated customers to the internet vs if it is a border router of a SOHO network vs if it is a mid-network router on public addresses. If you don't mind breaking into users' privacy, firewalls that can break into TLS connections and can detect malware inside the TLS payload can identify and block much more threats than the Mikrotik. If you have a modern antivirus on the endpoints doing the same thing, you can stay with Mikrotik as an "up to L4" firewall; if you don't, you indeed have to consider a device whose primary purpose is security. But again, they can break into TLS sessions flowing through them but not into VPNs the users set up on the endpoints, so the endpoint antivirus is in a better position from this point of view - until the VPN is built into the browser and the browser doesn't provide an API for the antivirus to scan the payload. Which is the same API one would use to spy on the payload, so go figure.
In the context of this topic, this is about as relevant as a question what is the best way to replace brake pads.
Sure.
I'm not sure what you actually wanted to express, so please reword that.
If my ISP_CPE is not available, then I need that blackhole route?
Regarding what you can use it for
Mostly about accessing to internet, and allowing to "see" internal servers through some ovpn interfaces. Idea is to prevent as much as possible possible breach from inside and outside.
So, if some malicios thing went to some offices, maybe to prevent some scans, like NMAP or advanced ip scanner, angry ip scanner or any type of scanns.
The firewall functionality of Mikrotik only goes as far as L4 (protocols and ports).
i cannot use L2 funcionality of blocking? even though its all connected to same mikrotik? Switches are not smart, but yeah, L2 traffic doesn't go through router itself......

Who is online

Users browsing this forum: mtmunozs and 24 guests