Netwatch failover wont work because route to external ip gets bypassed

I can’t understand what is happening with the router, I’m trying to setup failover with two WAN connections

these are my routes:

/ip route
add distance=1 gateway=10.0.2.1
add distance=2 gateway=192.168.10.1
add distance=1 dst-address=8.8.4.4/32 gateway=192.168.10.1
add distance=1 dst-address=8.8.8.8/32 gateway=10.0.2.1
add distance=1 dst-address=10.0.0.0/24 gateway=192.168.10.1
add distance=1 dst-address=10.0.1.0/24 gateway=192.168.10.1
add distance=1 dst-address=172.16.0.0/24 gateway=vlan100-clients
add distance=1 dst-address=192.168.14.0/24 gateway=vlan14-restaurant
add distance=1 dst-address=192.168.15.0/24 gateway=vlan15-aps

and netwatch:

/tool netwatch
add down-script="ip route disable [find dst-address=0.0.0.0/0 gateway=10.0.2.1\
    ]\
    \n\r\
    \n/ip firewall connection remove [find]\r\
    \n/ip firewall mangle { disable [/ip firewall mangle find new-packet-mark~\
    \"lte3_client\"] }\r\
    \nlog error \"ISP_lte3 is down!\"\
    \n" host=8.8.8.8 interval=10s up-script="ip route enable [find dst-address\
    =0.0.0.0/0 gateway=10.0.2.1]\
    \n\r\
    \n/ip firewall mangle { enable [/ip firewall mangle find new-packet-mark~\
    \"lte3_client\"] }\r\
    \nlog error \"ISP_lte3 is up!\"\r\
    \n\
    \n"
add down-script="ip route disable [find dst-address=0.0.0.0/0 gateway=192.168.\
    10.1]\
    \n\r\
    \n/ip firewall connection remove [find]\r\
    \nlog error \"ISP_res_rtr is down!\"\
    \n" host=8.8.4.4 interval=10s up-script="ip route enable [find dst-address\
    =0.0.0.0/0 gateway=192.168.10.1]\
    \n\r\
    \nlog error \"ISP_res_rtr is up!\""

but I’m able to ping 8.8.8.8 even if I unplug the 10.0.2.1 interface, its passes through the alternative route and netwatch can’t see that it got disabled

how is this possible?

thank you
failover.rsc (7.2 KB)

When you ping a public IP address it will route it if you have set up failover… that is the aim of failover!
If you do netwatch in failover mode to check if “something is unplugged” then ping local IP address of the box (like upstream router).
If you ping a public address it will find it (you have a generic distance 2 route via 192.168…).

You can also do the recursive GW definition as in this post:
http://forum.mikrotik.com/t/advanced-routing-failover-without-scripting/136599/1

Question: Why would you need the netwatch script? It seems doing unnecessary things in my opinion which can be done directly with routing table…

based on this:
http://forum.mikrotik.com/t/dual-wan-failover-using-recursive-routing/148232/17

after read, understand, and check, paste this on terminal:

{
:global isp1gateway 10.0.2.1
:global isp2gateway 192.168.10.1

/ip dhcp-client
set [find] add-default-route=no

/ip dns
set allow-remote-requests=yes servers=1.1.1.1,8.8.8.8

/ip route
remove [find where dynamic=no]

add comment="ISP1 is preferred Gateway" distance=1 gateway=$isp1gateway
add comment="ISP2 is alternative Gateway" distance=2 gateway=$isp2gateway

add comment="1.1.1.1 must be reachabble only from ISP1" distance=1 dst-address=1.1.1.1/32 gateway=$isp1gateway scope=10
add comment="8.8.8.8 must be reachabble only from ISP2" distance=1 dst-address=8.8.8.8/32 gateway=$isp2gateway scope=10

add check-gateway=ping comment="Check if reachable 1.1.1.1 = ISP1 Working" distance=1 gateway=1.1.1.1
add check-gateway=ping comment="Check if reachable 8.8.8.8 = ISP2 Working" distance=1 gateway=8.8.8.8

add check-gateway=ping comment="If ISP1 fail, still check when is reachable again 1.1.1.1" distance=2 gateway=1.1.1.1
add check-gateway=ping comment="If ISP2 fail, still check when is reachable again 8.8.8.8" distance=2 gateway=8.8.8.8

add comment="Virtual ping to maintain router calc for ISP1" distance=20 dst-address=1.1.1.1/32 type=blackhole
add comment="Virtual ping to maintain router calc for ISP2" distance=20 dst-address=8.8.8.8/32 type=blackhole

add distance=1 dst-address=172.16.0.0/24 gateway=vlan100-clients
add distance=1 dst-address=192.168.14.0/24 gateway=vlan14-restaurant
add distance=1 dst-address=192.168.15.0/24 gateway=vlan15-aps

/tool netwatch
remove [find]
}

The whole config is messed up and thus getting to a properly setup failover is going to be difficult.
Lets clean up what we can and then troubleshooting one area will be much easier.

(1) The first thing to point out is that your bridge setup is erroneous.
a. the vlans interface should be the BRIDGE not ether2
b. you define six vlans and then provide only 4 pools, 4 server networks, and 3 dhcps server etc AKA → your vlan setups are incomplete!!
c. I see only four addresses for six vlans, and only one WAN address and was expecting a second backup WAN address?
d. I see two admin networks, do you really need two, and a third one, .102. which is not defined anywhere??? (OKAY I see its for vpn access??)

(2) Keep the mac Winbox-mac server for winbox access but
this one should be set to none as its a security risk.
/tool mac-server
set allowed-interface-list=MANAGEMENT

(3) What is the purpose of having this enabled.. WHY??
/ip upnp interfaces
add interface=lte3 type=external
add interface=ether2 type=internal

(4) I dont understand your failover routing at all??
Can you explain the purpose of each line, so I can understand why you added them as such??

(5) If this is the sum total of your firewall rules and this device is directly connected to the internet you should be fired LOL. In other words you should disconnect immediately and at least install the basic defaults.
/ip firewall filter
add action=drop chain=output dst-address=8.8.8.8 out-interface=ether2

(6) Where is your SOURCE NAT RULE???
(7) There are no bridge vlan settings???
(8) With respect to (7), which ports are trunk ports and which ports are access ports on your router???

In summary much confusion is also caused by seemingly have ether2 be many things, aka bridge port but secondary wan port etc…
A network diagram well labelled will clear much up!!

here is a config attempt (and where missing pieces are identified) assuming ether2 is simply another port to be used and not a wan port.
Also one has to better define the following
a. who has access to the router itself to config the router
b. who needs access to the internet
c. who needs access to shared devices, perhaps a printer for example.

/interface bridge
add admin-mac=08:55:31:D0:6B:99 auto-mac=no comment=defconf disabled=yes \
    name=bridge
/interface ethernet
set [ find default-name=ether1 ] name=lte3
/interface vlan
add interface=bridge name=vlan11-reception vlan-id=11
add interface=bridge name=vlan12-beach vlan-id=12
add interface=bridge name=vlan13-telephones vlan-id=13
add interface=bridge name=vlan14-restaurant vlan-id=14
add interface=bridge name=vlan15-aps vlan-id=15
add interface=bridge name=vlan100-clients vlan-id=100
add interface=bridge name=vlantrusted vlan-id=10
/interface list
add comment=defconf name=WAN
add comment=defconf name=LAN
add name=MANAGEMENT
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/ip ipsec peer
add name=l2tpserver passive=yes
/ip ipsec proposal
set [ find default=yes ] enc-algorithms=3des
/ip pool
add name=pool-vlan11-reception ranges=192.168.11.100-192.168.11.254
add name=pool-vlan12-beach ranges=192.168.12.100-192.168.12.254
add name=pool-vlan13-telephones ranges=192.168.13.100-192.168.13.254
add name=pool-vlan14-restaurant ranges=192.168.14.14-192.168.14.254
add name=pool-vlan15-aps ranges=192.168.15.15-192.168.15.254
add name=pool-vlan100- ranges=192.168.100.10-192.168.100.254
add name=pool-trusted ranges=192.168.10.2-192.168.10.254
add name=pool-vpn ranges=192.168.102.100-192.168.102.254
/ip dhcp-server
add address-pool=pool-vlan11-reception disabled=no interface=vlan11-reception \
    name=dhcp-vlan11-reception
add address-pool=pool-vlan12-beach disabled=no interface=vlan12-beach name=\
    dhcp-vlan12-beach
add address-pool=pool-vlan13-telephones disabled=no interface=\
    vlan13-telephones name=dhcp-vlan13-telephones
add address-pool=pool-vlan14-restaurant disabled=no interface=vlan14-restaurant \
    name=dhcp-vlan14-restaurant
add address-pool=pool-vlan15-aps disabled=no interface=vlan15-aps name=\
    dhcp-vlan15-aps
add address-pool=pool-vlan100-clients disabled=no interface=\
    vlan100-clients name=dhcp-vlan100-clients
add address-pool=pool-trusted disabled=no interface=\
    vlantrusted name=dhcp-vlantrusted
/ppp profile
add dns-server=192.168.102.1 local-address=192.168.102.1 name=ipsec_vpn
/queue type
add kind=pcq name=PCQ_lte3_download pcq-classifier=dst-address pcq-rate=1500k
add kind=pcq name=PCQ_lte3_upload pcq-classifier=src-address pcq-rate=700k
/queue tree
add disabled=yes max-limit=38M name=lte3_client_download packet-mark=\
    lte3_client_download parent=global queue=PCQ_lte3_download
add disabled=yes max-limit=9M name=lte3_client_upload packet-mark=\
    lte3_client_upload parent=global queue=PCQ_lte3_upload
/interface bridge port  (INCOMPLETE ????)
add bridge=bridge comment=defconf interface=ether2
add bridge=bridge comment=defconf interface=ether3
add bridge=bridge comment=defconf interface=ether4
add bridge=bridge comment=defconf interface=ether5
add bridge=bridge comment=defconf interface=ether6
add bridge=bridge comment=defconf interface=ether7
add bridge=bridge comment=defconf interface=ether8
add bridge=bridge comment=defconf interface=ether9
add bridge=bridge comment=defconf interface=ether10
add bridge=bridge comment=defconf interface=sfp1
/interface bridge vlan
WHERE ARE BRIDGE VLAN SETTINGS???
/ip neighbor discovery-settings
set discover-interface-list=LAN
/interface l2tp-server server
set authentication=mschap1,mschap2 default-profile=ipsec_vpn enabled=yes
/interface list member
add comment=defconf interface=bridge list=LAN
add comment=defconf interface=lte3 list=WAN  {where is second wan??}
add interface=vlantrusted list=MANAGEMENT
/ip address
add address=192.168.11.1/24 interface=vlan11-reception network=192.168.11.0
add address=192.168.12.1/24 interface=vlan12-beach network=192.168.12.0
add address=192.168.13.1/24 interface=vlan13-telephones network=192.168.13.0
add address=192.168.14.1/24 interface=vlan14-restaurant network=192.168.14.0
add address=192.168.15.1/24 interface=vlan15-aps network=192.168.15.0
add address=192.168.100.1/24 interface=vlan100-clients network=192.168.100.0
add address=192.168.10.2/24 comment=defconf interface=vlantrusted network=\
    192.168.10.0
add address=10.0.2.10/24 interface=lte3 network=10.0.2.0
/ip cloud
set ddns-enabled=yes
/ip dhcp-client
add comment=defconf interface=lte3
/ip dhcp-server network
add address=192.168.11.0/24 dns-server=192.168.11.1 gateway=192.168.11.1
add address=192.168.12.0/24 dns-server=192.168.12.1 gateway=192.168.12.1
add address=192.168.13.0/24 dns-server=192.168.13.1 gateway=192.168.13.1
add address=192.168.14.0/24 dns-server=192.168.14.1 gateway=192.168.14.1
add address=192.168.15.0/24 dns-server=192.168.15.1 gateway=192.168.15.1
add address=192.168.100.0/24 dns-server=192.168.100.1 gateway=192.168.100.1
add address=192.168.10.0/24 dns-server=192.168.10.1 gateway=192.168.10.1
/ip dns
set allow-remote-requests=yes servers=1.1.1.1
/ip firewall address-list
add address=192.168.102.0/24 list=adminaccess
add address=192.168.xx.x/24 list=adminaccess
add address=192.168.xx.y/24 list=adminaccess
add address=192.168.yy.x/24 list=adminaccess
etc....
/ip firewall filter
NEEDS DEFAULT FIREWALL RULES
/IP FIRWALL NAT
needs SRCNAT RULE
/ip ipsec identity
add generate-policy=port-override peer=l2tpserver
/ip ipsec policy
set 0 dst-address=0.0.0.0/0 src-address=0.0.0.0/0
/ip route
WAN1 ROUTE  ping gateway distance=5
WAN2 ROUTE  distance=10
/ppp secret
add name=chris profile=ipsec_vpn remote-address=192.168.103.2 service=l2tp
/system clock
set time-zone-name=Europe/Athens
/system identity
set name=rec-rtr
/system logging
add action=disk topics=critical
add action=disk topics=error
add action=disk topics=info
add action=disk topics=warning
/system ntp client
set enabled=yes primary-ntp=216.239.35.0 secondary-ntp=216.239.35.4
/system ntp server
set enabled=yes
/tool mac-server
set allowed-interface-list=NONE
/tool mac-server mac-winbox
set allowed-interface-list=MANAGEMENT

ANAV… I think he mistyped the parts of recursive.

The config needs a makeover and I dont think there was an attempt at recursive, lets walk before running, okay crawl first!!!

Hello everyone, first of all thank you that you took the time to review my config, I really appreciate it!


I thought that this route would restrict the 8.8.8.8 IP


add distance=1 dst-address=8.8.8.8/32 gateway=10.0.2.1



I decided to use a netwatch script because if the 10.0.2.1 router loses its Internet connection the route is still considered valid, with netwatch I could ping a internet address and make sure that the 10.0.2.1 router still has Internet access. I tried recursive routing failover but when I tested it I had some problems with it (sometimes it would not use the primary route even if that route was up).
I plan to try the following solution, thank you rextended, you even took the time to write me a script, you are great! I will post the results as soon as I can test this in a lab.


Hi anav, thank you for taking the time to review my whole config


1a) I don't think I need a bridge, WAN (10.0.2.1) is connected to ether1 (access port) and everything else is connected to ether2 (hybrid port)
1b) true, but some of the vlans use only static addresses
1c) some of the vlans were moved to the other router 192.168.10.1, that router serves as backup
1d) I only really need two, but it made my life easier to include another one while I was building the network, 192.168.102.0 is the VPN network.


You are right, I misunderstood this setting, thus allowing telnet access, I was able to remedy that with the firewall but now I understand this better


True, I don't need UPNP




/ip route
add distance=1 gateway=10.0.2.1
add distance=2 gateway=192.168.10.1

add two routes to the internet with different distance


add distance=1 dst-address=8.8.4.4/32 gateway=192.168.10.1
add distance=1 dst-address=8.8.8.8/32 gateway=10.0.2.1

route 8.8.4.4 throught the secondary gateway and 8.8.8.8 through the primary (I thought this would be restrictive and 8.8.8.8 would only be routed to 10.0.2.1)


/tool netwatch
add down-script="ip route disable [find dst-address=0.0.0.0/0 gateway=10.0.2.1\
    ]\
    \n\r\
    \n/ip firewall connection remove [find]\r\
    \n/ip firewall mangle { disable [/ip firewall mangle find new-packet-mark~\
    \"lte3_client\"] }\r\
    \nlog error \"ISP_lte3 is down!\"\
    \n" host=8.8.8.8 interval=10s

if 8.8.8.8 gets down disable the 10.0.2.1 route and clear all connections


    up-script="ip route enable [find dst-address\
    =0.0.0.0/0 gateway=10.0.2.1]\
    \n\r\
    \n/ip firewall mangle { enable [/ip firewall mangle find new-packet-mark~\
    \"lte3_client\"] }\r\
    \nlog error \"ISP_lte3 is up!\"\r\
    \n\
    \n"

if 8.8.8.8 gets up enable the 10.0.2.1 route
the rest has to do with queues and is irrelevant


no I posted this before completing the setup, I didn't need it for the purposes of this post, but I'm sure you will find a more valid reason for me to get fired :slight_smile:


I don't need one, the next router does NAT


I don't need them, there's only one hybrid port, bridge VLAN settings would make more sense if I used the router as a switch, I use a layer 2 switch for that


eth1 is WAN access and eth2 is hybrid


this is actually the case, I only use ether2


ether2 is also a WAN port albeit secondary and I don't need to clearly define it because I didn't use the default firewall configuration (in the final setup)


I believe I covered this when I finished the configuration and I don't feel that I need help with that

Okay post your latest complete config to compare to the diagram etc…
/export hide-sensitive file=anynameyouwish

Screenshot 2021-06-04 at 4.25.14 PM.png
i-wish-for-world-peace.rsc (9.81 KB)

I forgot to mention that all my ISP provided IP’s are dynamic, two of them aren’t even public, noted in the diagram with ‘BEHIND NAT’

I dont quite get the network diagram,
Just to confirm you are showing two instances of the same router, to differentiate between the one dynamic WANIP (not natted - Cosmote top bubble)
and the two dynamic WANIPs that are natted lower two COSMOTE bubbles.

OR

Do you have two routers one for COSMOTE1 and a second router for COSMOTE2/3

Two routers:

rec-rtr uses one router with a lte cat6 modem (10.0.2.1)
the 192.168.10.2 router serves the important hosts and is located in building one

res-rtr uses two routers with lte cat6 modems (10.0.1.1, 10.0.0.1)
the 192.168.10.1 router is in building two

the lte routers just do NAT but are very weak to do anything else

Are the two main routers physically connected by ethernet? If so how have you decided to connect them??

The routers are connected via ethernet, a wireless 60G ptp link, eth2 :wink:

Switches.png

Not what I would call a beginner network LOL.
That is some major work you have!! Bravo, I would be running away LOL

What i was really asking was, are the two routers sharing a subnet, as I am not conversant on how to best connect two devices as such.
Assuming you need to route Layer 3 some users or devices so they can see each other.

thank you!!!

still I’m a web dev, self trained in Mikrotik, hence the beginner thread :slight_smile:

the two routers use the 192.169.10.0/24 subnet to route traffic between them. eg


/ip route
add distance=1 dst-address=10.0.0.0/24 gateway=192.168.10.1
add distance=1 dst-address=10.0.1.0/24 gateway=192.168.10.1

routes for the lte routers of res-rtr


add distance=1 dst-address=192.168.14.0/24 gateway=192.168.10.1
add distance=1 dst-address=192.168.15.0/25 gateway=192.168.10.1

routes for the ap and restaurant vlans in res-rtr

example:
192.168.11.101 sends packet to 10.0.0.1 with gateway 192.168.11.1
the switch (rec-swi) ads a vlan id 11 tag to the packet

192.168.11.1 (aka 192.168.10.2 aka rec-rtr) gets the packet from eth2
rec-rtr removes the vlan tag from the packet
rec-rtr uses the route

/ip route
add distance=1 dst-address=10.0.0.0/24 gateway=192.168.10.1

and sends the packet to 192.168.10.1 from eth2 (src address is unchanged)

192.168.10.1 (aka res-rtr) receives the packet from eth3
res-rtr uses a dynamic route created when I assigned the 10.0.0.10 ip to eth1 interface (src address is unchanged)

10.0.0.1 (aka lte1-cosmote) receives the packet and decides to reply (dst address is 192.168.11.101)
lte1-cosmote uses the route

/ip route
add distance=1 dst-address=192.168.11.0/24 gateway=10.0.0.10

and sends the packet to 10.0.0.10 (aka res-rtr)

192.168.10.1 (aka res-rtr) receives the packet from the eth1 interface with address 10.0.0.10, uses the route

/ip route
add distance=1 dst-address=192.168.11.0/24 gateway=192.168.10.2

and sends the packet to 192.168.10.2 via eth3

192.168.10.2 (aka rec-rtr) receives the packet from eth2
rec-rtr adds a vlan tag with id 11 and sends it to eth2

192.168.10.101 receives the packet after the vlan tag gets removed from the switch (rec-swi) access port

So is the question how to setup failover for the router with two modems.

In basic terms
0.0.0.0/0 gwy=ISP1 gateway IP check-gateway=ping distance=5
0.0.0.0/0 gwy=ISP2 gateway IP distance =10


In this scenario all traffic will go out isp1 and if it goes down ISP2 will take over.
Normally this would be useless for the same ISP but I will assume that they are different sources and different equipment and dependencies that are different to make it a feasible idea.

Next you want to do recursive so it looks slightly different.
/ip route
add comment=PrimaryRecursive distance=5 dst-address=1.0.0.1/32 gateway=
ISP1gatewayIP
add comment=SecondaryWAN distance=10 gateway=ISP2gatewayIP
add check-gateway=ping distance=5 gateway=1.0.0.1

If you wanted to have the router check two different dns addresses for extra redundancy.
add comment=PrimaryRecursive distance=5 dst-address=1.0.0.1/32 gateway=
ISP1gatewayIP
add comment=PrimaryRecursive distance=8 dst-address=9.9.9.9/32 gateway=
ISP1gatewayIP
add comment=SecondaryWAN distance=10 gateway=ISP2gatewayIP
add check-gateway=ping distance=5 gateway=1.0.0.1
add check-gateway=ping distance=8 gateway=9.9.9.9


Is this what you have setup??

[deleted]

Actually my question was, when the gateway 10.0.2.1 is down why can I still ping 8.8.8.8 when I have this route

add distance=1 dst-address=8.8.8.8/32 gateway=10.0.2.1

I value all your answers but I still don’t understand how this happens, is the rule disabled when the gateway is unreachable? Does the router fallback to the lowest distance 0.0.0.0/0 route when another route fails? Did I mess up my config? how does this work?



I took the liberty of rewriting this so I can understand it better

{

:global isp1gatewayip 10.0.2.1
:global isp2gatewayip 192.168.10.1

/ip route
#add check-gateway=ping comment="ISP1 is preferred Gateway" distance=5 gateway=$isp1gatewayip
#add comment="ISP2 is alternative Gateway" distance=10 gateway=$isp2gatewayip

add comment="PrimaryRecursive" distance=5 dst-address=1.0.0.1/32 gateway=$isp1gatewayip
add comment="SecondaryWAN" distance=10 gateway=$isp2gatewayip
#add check-gateway=ping distance=5 gateway=1.0.0.1

add comment="PrimaryRecursive" distance=5 dst-address=1.0.0.1/32 gateway=$isp1gatewayip
add comment="PrimaryRecursive" distance=8 dst-address=9.9.9.9/32 gateway=$isp1gatewayip

add comment="SecondaryWAN" distance=10 gateway=$isp2gatewayip
add check-gateway=ping distance=5 gateway=1.0.0.1
add check-gateway=ping distance=5 gateway=9.9.9.9

}

so if I understand this correctly you only test the isp1gateway and then use two external IPs to test against with accenting distances,

take a look at the solution proposed by rextended

notice the blackhole routes and the use of scope, (I have to read about scope to understand this), and If you follow the links provided you will see that the blackhole routes are important for this type of solution. I plan to test this out on a lab first because the router is now in production, netwatch although frowned upon, with my clumsy firewall rule is working.

add action=drop chain=output dst-address=8.8.8.8 out-interface=ether2



I’m not sure I understand, I posted my current config attached along with a screenshot of the netwatch script for ISP1gateway. I used netwatch.