VRRP Riddle [Help Needed]

Hey there,

I need some help from you guys. I am trying to set up a simple VRRP between two RBs. The problem is that when master drops, slave comes into play but does not make clients to be up.
Slave can ping out. I am not sure why this does happen.
Master: 10.50.10.1/24
Slave: 10.50.10.2/24
VRRP LAN: 10.50.10.3/32 (mask disappears once set)
Same ID, same interval.

Slave is a clone from a backup so, both have DHCP server of 10.50.10.0/24.

ISP’s modem has 4 wan ports so each RB is connected independently to the modem. Although, RBs receive different public IPs.

Does this have something to do with routes?

If you have really restored a backup rather than imported an export, you have cloned also the MAC addresses, so this is the first point to clarify.

Well, I did that. Export the Master backup, restore that backup using import from PC. Both RBs show their respective MAC addresses at Winbox.

I know that the gateway may be the problem. The VRRP is x.x.x.3.
I am using RB Master (which is x.x.x.1) as DNS and Webproxy servers. Both RBs have x.x.x.1 as gateways. Should I use x.x.x.x.3? If so, how can I bond the DNS and Webproxy seemlessly?
Can we use DHCP Relay as x.x.x.3 so VRRP works properly without distressing the static clients?

If previous is not possible, do we also have to delete the ClientIDs at Leases so VRRP can identify the devices?

You can do either an import of .rsc file (which is a plaintext script) or a restore of .backup file (which is a compressed and ciphered binary file).
If you did the latter and it didn’t rewrite MAC addresses, something must have changed in the way how the restore is done.


I am lost. The idea of VRRP is that the addresses used by external devices are the virtual ones. So the two Mikrotiks using VRRP to backup each other should have their physical addresses e.g. like .253 and .254, and the virtual one should be .1. Or you have to configure the hosts (or the DHCP server) to have .3 as gateway and DNS.

You can use the virtual address as gateway for the other devices as a gateway is stateless. DNS is halfway to stateless - if the primary Tik stops responding and the DNS requests start coming to the secondary one, its cache will be empty so some DNS requests regarding records cached by the primary one will be forwarded to upstream server, but the client won’t notice anything but slightly higher response time. But DHCP is completely stateful, so unless you’d find a way to synchronize the leases granted by the primary to the secondary (I don’t know any such way), the clients may get different addresses from the secondary than which they got from the primary.

I don’t use webproxy but I’d say it is similar to DNS behaviour; however, as you have different internet connection with a different public IP, existing connections will be broken and will have to be re-established by the client.


As said above. DNS works the same from the perspective of the client regardless on which Tik the virtual IP is currently up. Webproxy will work the same but existing TCP sessions will break when VRRP moves the virtual IP.


If you mean a DHCP relay from these Tiks to some external DHCP server, then it doesn’t matter which of the Tiks forwards the client’s requests. But it just moves the issue one floor higher - if that server dies, you’re left without the DHCP service. So for full redundancy, you need two DHCP servers which synchronize the leases.


VRRP does not know anything about DHCP or the external devices’ IDs. It just moves the virtual addresses among the VRRP group members. If the virtual address migrates, the virtual MAC address migrates too, so the external devices do not notice a change. The VRRP group member which become active will have to use ARP to determine MAC addresses of those external devices but that’s also nothing worth worrying.

First of all, I thank your for your dedication explaining all of these. I beg pardon first and foremost because I struggling figuring everything out (I am a Teacher. Nothing related to Networking).


You can do either an import of .rsc file (which is a plaintext script) or a restore of .backup file (which is a compressed and ciphered binary file).
If you did the latter and it didn’t rewrite MAC addresses, something must have changed in the way how the restore is done.

Yes, I used the “.backup” one.


I am lost. The idea of VRRP is that the addresses used by external devices are the virtual ones. So the two Mikrotiks using VRRP to backup each other should have their physical addresses e.g. like .253 and .254, and the virtual one should be .1. Or you have to configure the hosts (or the DHCP server) to have .3 as gateway and DNS.

Yes, both Tiks have their respective addresses within the same network so the Virtual one. They’re like 1, 2, 3.
I was asking about the DHCP server because I use static clients and when the Backup RB comes into play, it is not leasing properly. That’s why I was wondering if it had something to do with the gateway. What I’ve read, it’s recommended to use the gateway of the virtual one so clients may connect properly. I am stuck around this point.


You can use the virtual address as gateway for the other devices as a gateway is stateless. DNS is halfway to stateless - if the primary Tik stops responding and the DNS requests start coming to the secondary one, its cache will be empty so some DNS requests regarding records cached by the primary one will be forwarded to upstream server, but the client won’t notice anything but slightly higher response time. But DHCP is completely stateful, so unless you’d find a way to synchronize the leases granted by the primary to the secondary (I don’t know any such way), the clients may get different addresses from the secondary than which they got from the primary.

So, the recommendation is to use both Tiks gateways in their own as DNS servers (which is as I’ve configured it right now for the Master)?
I tried last night to set the virtual one as gateway for all clients and I noticed that the leases changed due to the ClientID. Although I had those clients already static.


I don’t use webproxy but I’d say it is similar to DNS behaviour; however, as you have different internet connection with a different public IP, existing connections will be broken and will have to be re-established by the client.

Alright.


As said above. DNS works the same from the perspective of the client regardless on which Tik the virtual IP is currently up. Webproxy will work the same but existing TCP sessions will break when VRRP moves the virtual IP.

Ok.


If you mean a DHCP relay from these Tiks to some external DHCP server, then it doesn’t matter which of the Tiks forwards the client’s requests. But it just moves the issue one floor higher - if that server dies, you’re left without the DHCP service. So for full redundancy, you need two DHCP servers which synchronize the leases.

I was thinking of this in order to avoid messing with each Tiks server. I am not sure if using the Relay from the VRRP to save time and be messing around.


VRRP does not know anything about DHCP or the external devices’ IDs. It just moves the virtual addresses among the VRRP group members. If the virtual address migrates, the virtual MAC address migrates too, so the external devices do not notice a change. The VRRP group member which become active will have to use ARP to determine MAC addresses of those external devices but that’s also nothing worth worrying.

Ok.

I haven’t sang victory yet

I’m afraid you expect too much from VRRP. It provides nothing more than redundancy on a network segment where two (or more) physical devices can provide to other devices in the same network segment and IP subnet the service of a gateway (or other services which the clients need to contact on a static address regardless which physical device provides it, like DNS in your case) . It doesn’t provide any means to synchronize context of stateful services between the physical devices; outside the Mikrotik world, such solutions do exist but they complement, not use, VRRP. So if one of the physical devices has to provide the same services (beyond gateway for routing) like the other ones, it must be configured the same way; where context synchronization is required, Mikrotik is not your choice.

Also, in enterprise grade networking, where the LAN-facing side of the physical routers uses VRRP, their WAN-facing side usually uses some dynamic routing protocol and tracking of the VRRP state so that the machine on which the VRRP virtual address is up is advertised as a router to the LAN subnet towards the WAN side. This makes no sense in your case as your two WAN uplinks are SOHO type with a fixed address and NAT, but might be interesting for you if you had a redundant connection where dynamic routing protocol would be supported.

As for the DHCP server - you can run it at both the Tiks if you use non-overlapping address pools on them and provision the static leases at both. You could have address conflicts if the same address was leased to one host by one server, that server would die afterwards, and the other one would see that address as free and lease it to another host. But if you do it this way, you can as well attach the DHCP servers to the physical interfaces, not to the VRRP ones - when a client asks for the IP configuration, it broadcasts a DHCPDISCOVERY request, and it must be able to handle answers from multiple servers.

Thanks for the clarification.
Even though, my main reason using VRRP is to have active connections to the clients at home while “rebooting, moving or upgrading the main MK”.
The rest could wait. So far, I haven’t been able to make it work. It even tried lowering the lease time so MK2 could get fresh clients but not even with that.
I’m stuck into this. I’ve read plenty about setting this up and people make it simple and easier but not sure why I am not able to do so.
Pulling in and out, rebooting, I managed to make my PC hold connection but not in my table or cellphone. That’s why I’m not sure what’s going on.

Now wait. What means “hold connection”, and what means “table” - a VoIP phone or a desktop PC? If the cellphone is connected using wireless as I suppose, there is not just the DHCP and gateway part, there is also the wireless authentication which cannot be inherited from one AP to another. So describe each case separately, and detail what means to “hold connection”.

I meant with “hold connection” to be able to surf without problems (within 3-5 seconds after unplugging the Master MK) straight internet access.
I’m using my router with wireless DumbAP so clients go straight to MK.
Was a typo, it’s “tablet”.

OK, so not in the sense that you wouldn’t have to re-establish the TCP connections.

So both the tablet and the mobile phone are connected to a wireless AP which just “translates wireless to Ethernet”, but already the DHCP is running on the Tik. And there is a switch between the AP and the two Tiks so if one Tik is down the AP can still talk to the other one, correct?

Do the tablet and mobile phone have static leases reserved or do they get dynamic addresses? How long does it take them to recover, if they do at all? The point is that the client knows the IP of the server which has granted them the lease, and when it expires, the client first tries to renew it with that server before resorting to broadcasting again.

And, last point, have you changed the gateway and dns-server items in /dhcp server network to the virtual IP?

OK, so not in the sense that you wouldn’t have to re-establish the TCP connections.

Yes.


So both the tablet and the mobile phone are connected to a wireless AP which just “translates wireless to Ethernet”, but already the DHCP is running on the Tik. And there is a switch between the AP and the two Tiks so if one Tik is down the AP can still talk to the other one, correct?

Yes. Like: ISP > MK1+MK2 > Switch > AP


Do the tablet and mobile phone have static leases reserved or do they get dynamic addresses? How long does it take them to recover, if they do at all? The point is that the client knows the IP of the server which has granted them the lease, and when it expires, the client first tries to renew it with that server before resorting to broadcasting again.

They do have static leases. Normal lease (10min). That may be the case. That’s why lowering the leases to 5s didn’t work.


And, last point, have you changed the > gateway > and > dns-server > items in > /dhcp server network > to the virtual IP?

I did change the gateway in both MK to the same virtual one without luck. I didn’t try to change the DNS to the virtual, instead I kept both same gateways of the MKs as DNS.

It must be something related to the VRRP and the DHCP.

I haven’t found a solution yet.
Any help will be appreciated.

Post the current configuration of both the Mikrotiks.

Here it is:
https://www.dropbox.com/s/6zaybsjogvr007c/VRRP%20Configs.docx?dl=0

What I can see is that you now only deal with the static leases (as the parameter address-pool of /ip dhcp-server is set to the default value static-only), so the fact that the pools for dynamic leases are the same at both routers does not cause any trouble now.

But I can see that on the 450, the default gateway assigned to the DHCP clients is the physical 10.0.50.1, which means that the client devices lose it once that router goes down. On the 750, it is properly set to the virtual address 10.0.50.3. So set it to 10.0.50.3 also on the 450 under /ip dhcp-server network, enable the 450, wait until the DHCP client devices renew their leases while the 450 is enabled, and then disable/disconnect the 450 again. The mobile devices should continue to work normally, like the PC. Is my guess correct that you have set the IP configuration statically on the PC, with default gateway set to 10.0.50.3, whereas the tablet and phone get the gateway address via DHCP?

Off topic, what subject do you teach?

What I can see is that you now only deal with the static leases (as the parameter > address-pool > of > /ip dhcp-server > is set to the default value > static-only> ), so the fact that the pools for dynamic leases are the same at both routers does not cause any trouble now.

Yes, I use static to avoid messing with the queues and other stuff.


But I can see that on the 450, the default gateway assigned to the DHCP clients is the physical 10.0.50.1, which means that the client devices lose it once that router goes down. On the 750, it is properly set to the virtual address 10.0.50.3. So set it to 10.0.50.3 also on the 450 under > /ip dhcp-server network> , enable the 450, wait until the DHCP client devices renew their leases while the 450 is enabled, and then disable/disconnect the 450 again. The mobile devices should continue to work normally, like the PC. Is my guess correct that you have set the IP configuration statically on the PC, with default gateway set to 10.0.50.3, whereas the tablet and phone get the gateway address via DHCP?

Yes, I had to put back the 450 to gateway 10.0.10.1 from 10.0.10.3 in order to make a step back. The problem was that some devices renewed on the 450 (even though this last one was disconnected) and some others on the 750. Even waiting for the leases to be renown. That was a mess. I didn’t understand why some devices kept waiting for the 450 and not liking to the then active 750.
I have set all clients to static leases. Their addresses are set by DHCP, even my PC.


Off topic, what subject do you teach?

I teach English as a second language. I have done all my networking stuff by myself in a trial and error manner :slight_smile: .

Attach the DHCP servers at both machines to the VRRP interface rather than the “physical” interface. The client remembers the IP address of the DHCP server from which it got the lease so it asks it for renewal using DHCPREQUEST to its individual (unicast) address before reverting to broadcasting a DHCPDISCOVER. And it may take a different time with different client implementations before the client gives up trying with the previous server and starts broadcasting again. If you attach the DHCP servers to the VRRP interfaces, only the DHCP server attached to the currently active VRRP interface will be active, but it will inherit the virtual IP address from the VRRP interface so the clients will get the virtual address not only as the default-gateway and dns-server one but also as the leasing-server one. It is still valid that the pools should not overlap to avoid conflicting addresses to be assigned.

Attach the DHCP servers at both machines to the VRRP interface rather than the “physical” interface. The client remembers the IP address of the DHCP server from which it got the lease so it asks it for renewal using DHCPREQUEST to its individual (unicast) address before reverting to broadcasting a DHCPDISCOVER. And it may take a different time with different client implementations before the client gives up trying with the previous server and starts broadcasting again. If you attach the DHCP servers to the VRRP interfaces, only the DHCP server attached to the currently active VRRP interface will be active, but it will inherit the virtual IP address from the VRRP interface so the clients will get the virtual address not only as the default-gateway and dns-server one but also as the leasing-server one. It is still valid that the pools should not overlap to avoid conflicting addresses to be assigned.

Ok, let me wrap this scenario (I don’t want to mess the network):
First, set the DHCP to VVRP interface.
Then, let clients pick up the addresses given by the VRRP.

Do I also have to make the DHCP server gateway eg 10.50.10.3 to both Tiks? Or just leave each one with their own gateways.
In regards to DNS, Tik1 has DNS server (10.50.10.1) should I leave it as it is or make VRRP also to be the DNS server?

Both the gateway and the dns-server in the /ip dhcp-server network shall also be set to the virtual address 10.50.10.3 so that nothing changes from the client’s perspective when the virtual address migrates between the physical devices.