vrrp configuration with fully redundant switches

Hey guys,

I am configuring a failover network setup. Here is the topology.

Here is the topology:

https://imgur.com/ZJZqHo1

Routers are CCR2004 which doesn’t have any switch chip, in order to not waste CPU on layer 2 traffic, I didn’t configure bridge and vlan on the router. The switches have a ICCP link in between, that is the MLAG function under the bridge.

The router is running ebgp and ibgp for the failover.

I configured vrrp in the router for the redundancy, but due to the topology, each router have 2 connections from switch, so I configured the 4 vrrp interfaces for each physical interface. They all have the same vrid and 2 vrrp interfaces have a higher priority than the other 2, so they will become master.

When operates normally, 2 vrrp interfaces in 1 router will become master due to the high priority. The failover between routers is working as it should. As you can see, there are 2 vrrp interface in 1 router, but in the customer’s router, I only configured 1 as default gateway. I can configure 2 vrrp IPs in the customer router, but only 1 default route can be active, I should be able to use NetWatch to control it.

When I take down 1 switch, 1 vrrp ip will be offline. The other vrrp interface will become master, in this case I need configure 2 IP in the customer router just in case the switch failure.

Is there a better way to configure this? I think I can simplify the topology by taking out 2 connections between router and switch, so each router will only have 1 connection from 1 switch. I can give that a try tomorrow.

Any advise will be appreciated. :slight_smile:

I’m confused at what you’re trying to do. But a couple tips are:

Yeah, MLAG with VRRP might do.

I have already reduced the complexity of this topology, there just 2 links goes between core switches and routers, hence I only need to have 1 vrrp IP, I have also did some failover test, it is working as it should. When R01 is powered off, customer router will lost ping about 25 seconds, then traffic will be resumed.

The downtime for powering off switch is shorter, only looses few pings.

25 seconds sounds like a bit much. Are you using BFD?

I dont know if MLAG with Mikrotik is different from other vendors but point of using MLAG is having 2 physical devices to behave as 1 logical unit.

That is with MLAG I would normally just configure IP-addresses on the VLAN-interfaces and have the MLAG-cluster doing regular L3 routing.

This way you dont need any HSRP/VRRP or other first hop redundancy protocol (FHRP) which will give you downtime during failover scenarios.

Dont forget to setup loadsharing into layer3+layer4 for upstream and downstream LAGs (through LACP).

I was referring to R1-R3 (L3/BGP). L2 VRRP/LAG should kick in pretty much instantly. BTW, what do you mean by upstream LAG in this scenario?

When I use MLAG I set it up as (example with Rx upstream, SWx being MLAG and FWx being downstream):

SW1 connected to R1 with LAG1 (LACP) and R2 with LAG2 (LACP). And FW1 with LAG3 (LACP) and FW2 with LAG4 (LACP). And of course the MLAG-PEER links to SW2.

SW2 connected to R1 with LAG1 (LACP) and R2 with LAG2 (LACP). And FW1 with LAG3 (LACP) and FW2 with LAG4 (LACP). And of course the MLAG-PEER links to SW1.

This way from the view of lets say FW1 it have 1 LAG (LACP) where one cable ends at SW1 and one ends at SW2. For FW SW1 and SW2 looks and behaves as a single SW to which it got 2 cables connected forming a LAG (through LACP).

Same with FW2 that got 1 LAG with one cable towards SW1 and one cable towards SW2.

And the same goes for R1 and R2.

Cablewise it will be something like:

R1-int1: SW1-int1
R1-int2: SW2-int1

R2-int1: SW1-int2
R2-int2: SW2-int2

FW1-int1: SW1-int3
FW1-int2: SW2-int3

FW2-int1: SW1-int4
FW2-int2: SW2-int4

And then of course the MLAG-PEER lets say:

SW1-int24: SW2-int24
SW1-int23: SW2-int23

Here you can of course add additional physical cables to the LAG (LACP) which is the MLAG-PEER but minimum recommended is 2 (so the switches dont end up in a split-horizon situation just because one cable which is the MLAG-PEER goes poff).

So logically the above will become:


R1    R2
||    ||
   SW
||    ||
FW1   FW2

Normally you just do L2 with such setup as above but you could do L3 aswell by configuring one VLAN per LAG or per group of devices and then configure IP for these VLAN-interfaces (so the L2-switch becomes a L3-switch).

Let’s see if it’s something @skycanfiya might find interesting. Personally, I’m pretty curious where that 25-second delay is coming from in the current setup.

25 seconds is often related to STP so I would call that as a prime suspect.

Try setting “edge=yes” on all interfaces (just as a test) and see if the downtime goes down to below 1 sec ?

I am not using BFD. The issue is the CCR2004 doesn’t have switch chip in side, any layer 2 traffic will increase the CPU load, I did some speed test, when I doing 1g down and 500 up, CPU is at 27%. I
the router had a switch chip, I can create a bridge and just use RSTP or bonding interface. I think it is all depends on what is the impact on the CPU, maybe impact is not that big.

The port connecting to the router already detected as edge port, but I can manually set that and have a try. That should give it a faster speed for RSTP. Also the inter chassis link is setup edge=no, PTP=yes.

Yeah, i might need to do that one again. I was asking my colleague to ping the customer router that I setup, he told me that.

I just tried to disable the vrrp interface in R01, it actually took 10 seconds for the ping to continue.

BFD is pretty lightweight with adjustable timers for how often control packets are sent and it doesn’t really strain the CPU. It’s highly recommended for L3 like iBGP in your case. You might also check if your upstream providers or IXP offers BFD. If that’s the case, I’d definitely use that as well.

As for the 25-second delay, I’d suggest trying to find the root cause.

Thanks! I tried again, when I ping customer router from Internet, it took 10 seconds. But when I ping from internal network, it only loose few pings. I might need to check with ISP, that should related to BGP.

CCR2004’s bridging process speed is similar to IP routing processing speed, so the add some layer 2 traffic shouldn’t affect the performance too much. So I have created the bonding interface in the each CCR2004 to bond 2 interfaces from 2 core switches. I think the meshed topology should be all good.




There are few things in my mind:

  1. Due to there is no bridge in the router, all the Internet traffic is on vlan1 which is the default vlan, we should really segregate the Internet traffic from default vlan, I should created a vlan interface and add vrrp to that vlan interface, then I can create a bridge in the router, so I can vlan off the Internet traffic.
    Not sure if there is a better idea than this.

  2. We have a /24 public IP, we are going to split into few /26 subnet, I will inform our ISP to do that so we can advertising those /26 subnets. Then the question comes down to the VRRP IP, can I just use 1 VRRP interface but assigning different subnet to this interface?