Simple failover on dhcp server

Hi to all, i have installed a wireless bridge with the lggh 60ghz antennas, the two routerboards run different dhcp servers with the dhcp snooping on the link, in a way that half ip’s assign by one antenna, and another half by the other. I have two internet accesses, the adsl router on one building and a lte connection on the other, is possible to configure the system to switch the gateway in the routing table and in the dhcp servers in function of the lte connection that go down? I haven’t a way to put another router with two wan ports for the distance between the internet accesses, i won’t waste the bandwidth of the bridge for centralize the routing. I don’t know if is the best solution, i search all possible advice. Thanks for all for your time,
Gerry

I am not sure to understand your setup/what you want to obtain.

Normally (with both internet connections working) clients in building A go through internet access A and clients in building B go through internet access B (lte)?

And when the lte connection is down you want cliients in building B to switch to internet access A?

While switching to a different route is (relatively) quick (particularly if you delete connections so that they are renewed) changing the gateway served by the DHCP won’t have effect until the leases expire or you would need to disconnect all clients and re-negotiate the DHCP leases, I believe.

I would say that what you need to look at is VRRP.

Assuming you now have physical.ip.A and physical.ip.L as the addresses of the two routers on sites A(DSL) and L(TE) respectively, you would add two VRRP interfaces to each router, one with virtual.ip.A and the other one with virtual.ip.L, but the priorities would be set in such a way that when the ADSL router would be up, the VRRP interface bearing virtual.ip.A would be the master one on site A whereas the VRRP interface with virtual.ip.L would be the master on site L as long as the LTE interface would be up. The DHCP server on each site would indicate the respective virtual address of that site as the default gateway.

And the WAN monitoring script on each site would have to disable the respective VRRP interface when the uplink would fail and re-enable it once the uplink would recover. So when the ADSL uplink would fail, virtual.ip.A would get up on site L.

Thanks for all for the reply, the two buildings has internet accesses on a separate router, one for adsl (a fritz box 7390) in building A, and a zyxel NR5103E for 5g-lte in a building b, only the dhcp for flexibility is hosted on the antennas. I preferr the lte connection for speed reasons, the adsl is necessary when the lte go down because have traffic limitation or other problems. The switching time isn’t a problem, is homes, aren’t criticals, is more important to not overload the routerboards on the antennas and not waste bandwidth on the link also because the antennas has one lan port, if i use them for routing the bandwidtn of the ethernet go down. Unfortunatly i think that can’t use the VRRP because my routers aren’t from mikrotik.
Gerry

Yes, but I still miss something :confused: .

In a normal situation (both ADSL and LTE up) is :

  1. both buildings A and B access internet via LTE and ADSL is only a backup
    or
  2. building A goes through ADSL and building B goes through LTE

Or - in other words - does only building B need to switch from LTE to ADSL (and of course need to go through the wireless link) or both should switch from LTE to ADSL (so normally there is the traffic on the wireless link between building A and LTE connection)?

LTE Is for use, adsl Is backup, the link transport also the LTE to the other Building,

The way you describe it you indeed need a scheduled script at each LHGG that will check the transparency of the local WAN by pinging some addresses in the internet via the local ISP router, and if none of them responds, update the gateway item of /ip dhcp-server network with the IP address of the remote ISP router.

Do you have any experience with Mikrotik scripting?

I think that with some clever scripting it is possible to change the gateway provided by the two DHCP servers, but the change in the two MIkrotiks DHCP settings will not be actually propagated until next DHCP request/negoitiation/lease.

So, when the LTE goes down:

  1. all clients in both buildings will have no internet
  2. the netwatch (or whatever) script will change the gateway served by the DHCP to the one of the ADSL
  3. each and every client will need to disconnect and reconnect (wireless devices) or renew IP config (cabled devices) or it will remain with the old gateway (and no internet)

And the same in reverse should happen when the LTE connection returns available again.

For “unmanned” devices (let’s say IOT devices) the only way out would probably be making their DHCP lease very short (thus creating a lot of otherwise unneeded reconnections.

It doesn’t seem to me “practical” or even doable in practice.

Do check your Zyxel device, cannot really say specifically your NR5103E device, but similar ones do have some possible “failover” (what Zyxel in their simplicity call “WAN backup”) settings.

See here (page 105):
https://download.zyxel.com/NR5101/user_guide/NR5101_Version%204.40%20Ed%204.pdf

If you have that option (and it does actually work) the problem would be solved the “right” way (a “main” router that is “fixed gateway” and changes route in case of failed connection).
Still there would be more traffic from bulding A and back when the ADSL is active, but probably it is acceptable.

Thanks all for the help, i think the same thing for the scripting solution, unfortnatly e haven’t experience of writing code with It, i have to read some documentation about. Unfortnatly in my router the Wan backup option isn’t present. The dhcp standard validity at 10 minutes Is sufficent for renew the ip of clients? I preferr to evitate to having the Main router on One side because if the elecricity on One side or the link go down i haven’t possibility to navigate without modify the ip configuration. I have also two managed switches from zyxel that can be rebooted remotly via interface.

A “normal”/common Netwatch script would do.
Usually:
The on-down script/commands would change the route to the alternate one
The on-up script/commands would restore the previous route.

In your case, instead of changing/enabling/disabling routes, you would change the gateway in the dhcp server setting.

Provided that you have now something like.:

/ip pool
add name=default-dhcp ranges=10.10.0.100-10.10.0.199
/ip dhcp-server
add address-pool=default-dhcp interface=bridge lease-time=1d name=defconf
/ip dhcp-server network
add address=10.10.0.0/24 dns-server=10.10.0.1 gateway=10.10.0.1

You only need to edit the gateway value, but if you also have the dns-server set maybe you want to change that one too, so In practice it is (IMHO) easier to add another disabled entry and enable it (while disabling the current one) and viceversa.

I,.e, something like:

/ip dhcp-server network
add address=10.10.0.0/24 dns-server=10.10.0.1 gateway=10.10.0.1
add disabled=yes address=10.10.0.0/24 dns-server=10.10.0.2 gateway=10.10.0.2

so the code for the on-down should amount to:

/ip dhcp-server set [ find gateway="10.10.0.1" ] disabled=yes
/ip dhcp-server set [ find gateway="10.10.0.2" ] disabled=no

And the opposite for the on-up.

You can test the commands on terminal, before putting them into netwatch scripts, so that you can evaluate the effects beforehand.
If the above or a variation of it will work, then you will have to think about the settings of the Netwatch (interval, type of check,etc.) that can trigger the execution of the script.

Copy-paste the following commands into a command line window on each LHGG - it is not dangerous as the scheduler row will be added as disabled and a script does nothing until you execute it manually or using the scheduler (or another script).

/system script add name=test-wan source=":local locGw 10.2.1.1\
    \n:local remGw 10.2.1.254\
    \n:local dsn [ip dhcp-server network find where address~\"10.2.1.0/24\"]\
    \n:local targets {8.8.8.8;9.9.9.9;1.1.1.1}\
    \n:local outcome 0\
    \n\
    \n:foreach target in=\$targets do={\
    \n  :if (\$outcome=0) do={\
    \n    :set outcome [:ping \$target count=1]\
    \n  }\
    \n}\
    \n/ip dhcp-server network {\
    \n  :if ((\$outcome=0) and ([get \$dsn gateway]!=\$remGw)) do={\
    \n    set \$dsn gateway=\$remGw\
    \n  }\
    \n  :if ((\$outcome>0) and ([get \$dsn gateway]!=\$locGw)) do={\
    \n    set \$dsn gateway=\$locGw\
    \n  }\
    \n\
    \n}"

/system scheduler add disabled=yes interval=1m name=test-wan on-event=test-wan start-time=startup

Then edit the scripts on each LHGG so that the LAN IP addresses of the local and remote router and the DHCP server network would match your environment. Then set the gateway on the /ip dhcp-server network row to some value that matches neither the local gateway nor the remote one, and run the script manually. If running the script fixes the gateway item properly, you can enable the scheduler item.

The own gateway of each LHGG must always be the local one; if that interferes with something you already use (like updating the DynDNS address of the Mikrotik), you will have to add a dedicated routing table and make the ping in the script use that routing table.

Thanks for all for the possible solution, i have another question, the ping or the netwatch functions use the system gateway for the instradation, i think that i have to set first it in the script otherwise i have a infinite loop, is correct?

Sorry, I don’t understand the question, can you try expanding on it?

EDIT:

Wait, maybe I understand what you mean.

You decide which (presumably high availability) internet address you use for netwatch, then make a static /32 route to it, with the gateway of choice.
Example, to have netwatch ping 1.1.1.1 through gateway 10.0.0.1:
/ip route add dst-address=1.1.1.1/32 gateway=10.0.0.1
like it happens with recursive failover checking.
Of course the chosen destination address will only be reachable through that given gateway, so it cannot be used (as an example) as general purpose DNS.

I have found this solution, sorry but i’m a noob in this environment, a basic troggle on the down statment of netwatch script

:local gw5g 192.168.1.2
:local gwadsl 192.168.1.1
:local gwact [/ip route get number=0 gateway ]

:if ($gwact != $gw5g) do={
	/ip dhcp-server network set gateway=$gw5g numbers=0
	/ip route set gateway=$gw5g numbers=0
} else={
	/ip dhcp-server network set gateway=$gwadsl numbers=0
	/ip route set gateway=$gwadsl numbers=0
}

You think is a good way? Thanks for all again
Gerry

As said it would IMHO be advisable to add a /32 route for the destination and leave the general 0.0.0.0/0 gateway alone.
The gateway set in your lggh devices is only used for (say) NTP or something else, unless you expect days long interruptions of the lte, the 2 wireless devices can live without internet connection until the main lte connection is re-established, the network devices will get the “current” one from DHCP.

You are missing a / in the second command inside your else statement.

Generally speaking, it is not a good idea to use number in a script, the only exception is when there is surely a single item in the list, but it is a bad habit, use the find or find where syntax.

I am not sure to understand the logic of your script.

If the netwatch down script is triggered, it means that the current gateway/connection (gw5g) is not working, no matter which is the current gateway the only thing you can do is set the other one (gwadsl).

If/when the netwatch up script is triggered (as well no matter what is the current gateway) it means that the “main” gateway/connection is working and thus you set the “main” gateway (gw5g) as active.

I.e. the choice of the gateway needs not to be conditional, as anyway the condition is in the triggering of the scripts.

OK thanks, i have corrected the / (sorry i don’t undertend the syntax, can explain these opens of lines?) in the line, I have tried but if i create a script with ping, the instradation go on the system gateway and fail (also netwatch use the system gw?), my installation is very basic and on my config only one item is listed on the route network, i have also created this list to insert in the up section :

:local gw5g 192.168.1.2
:local gwadsl 192.168.1.1
:local gwact [/ip route get number=0 gateway ]
:local targets {8.8.8.8;9.9.9.9;1.1.1.1}
:local outcome 0

:if ($gwact != $gw5g) do={
	/ip route set gateway=$gw5g numbers=0
	:foreach target in=$targets do={
		:if ($outcome=0) do={
		:set outcome [:ping $target count=1]
		}
	
	:if ($outcome>0) do={
		/ip dhcp-server network set gateway=$gw5g numbers=0
		} else={
			/ip route set gateway=$gwadsl numbers=0
			}
	}
}

The down script troggle the gateways and the up script try if the gw5g is up and use it if is ok

Is possible to insert a dynamc varable in the gateway settings of the routing table and the dhcp server that can be modified via netwatch script? The flash wear is a problem troggling the parameters hundered at day?

The / only means “start from root”, if you prefer without it it is a “relative path”, with it it is an “absolute path”:

The good thing about RoS (and not only) scripting is that everyone can (within limits) write anything the way he/she likes :slight_smile: (including overcomplicating it with superfluous conditionals).

The example Sindy posted, making use of the scheduler, runs every x time so it needs both a check and a conditional execution inside the script.

Netwatch runs - like scheduler - every x time as well but in itself it is already both a check AND provides conditional execution based on previous status.
IF the check (netwatch ping) succeeds AND the previous status was “up”, nothing happens.
IF the check (netwatch ping) succeeds BUT the previous status was “down”, the up script is executed.
IF the check (netwatch ping) fails AND the previous status was “up”, the down script is executed.
IF the check (netwatch ping) fails BUT the previous status was “down”, nothing happens.

So, starting from the lte $gw5g gateway up and running, nothing happens, no matter how much time passes and how many times the netwatch probes it.
Then, the first time the netwatch runs and finds the gateway connection down, it executes the down script.
What is the only possible remedy to the $gw5g down? Switching the DHCP gateway to $gwadsl. (no IF’s, no BUT’s), so now the $gwadsl is the current gateway for the network and (hopefully) the network devices reach the internet via this alternative gateway. The $gw5g gateway for the device (in /IP route) has not been changed, the device has no internet and the next netwatch run will result in down state (i.e. nothing will happen).
The first next run of netwatch that succeeds will run the “up” script. And what is the intended setting when $gw5g is working? To use it, so the “up” script is to switch back from DHCP $gw5g to DHCP $gwadsl.
No need for checks or conditional execution.

As said, if you also want to switch the gateway in route of the device you can add a narrow route for the destination, so that netwatch will always use this latter.

About checking multiple destinations, it is a good idea, though 1.1.1.1 or 9.9.9.9 tend to be much more online than your lte connection will be, but with the newish netwatch options with ICMP probe, I would rather experiment with accept-icmp-time-exceeded=yes and a low TTL value. In practice you switch from “check if I am able to reach a specific destination” to “check if the few first hops towards destination are reachable”, so, ultimately a more accurate check of “I can get past my ISP”.

The only reason why I prefer scheduled scripts to netwatch is that netwatch can monitor only a single host, so if that host becomes unreachable (and I have seen even 8.8.8.8 to be regionally unreachable), we get a “false positive”. And as soon as you start trying to combine multiple netwatch instances together, a scheduled script becomes simpler :slight_smile:

Very thanks to all for the help, i have configured the gateway in the routing table to point to 5g router and this simple script in the netwatch up and down, respectively 5g for up and adsl for down

ip dhcp-server network set gateway=192.168.1.X [find where address="192.168.1.0/24"]

It work very well, i didn’t understand that the execution of the up or down scripts is once on state change, in this way the default gw fixed to 5g contribute to the state machine for preferring the 5g gw, isn’t extremly elegant but work and is simple.