I have a half rack of servers in a data center and I am looking to get a high availability setup which provides redundant paths to and from the servers.
Attached is a network diagram of my understanding of how I might achieve this. However I don’t fully understand how to configure BGP and OSPF to handle a failure of one of the routers. Or maybe the topology is inappropriate in some way?
I would really appreciate any advice or commentary you might have regarding the red areas marked question in the diagram.
I don’t entirely understand how protocols/routing in the red box areas work.
So my questions would be:
Is BGP and OSPF appropriate in the areas I have outlined.
Are BGP and OSPF the best tools for handling a failure of one of the routers.
If broadly yes to question 1 and 2, where do the servers look to for an IP gateway? The switches? If so, how do I configure the switches for layer 3 redundancy?
@Larsa, after further reading today, I agree, why indeed?
BGP is essential as that’s how I advertise routes to the colo provider.
The reason for exploration of OSPF is that I’m trying to understand what my options are for handling failure of one the routers.
I thought making the switches IP gateways for the for the servers and using OSPF between the switches and routers would be an option, however I am beginning to doubt that given the Layer 3 limitations of the CRS326 switches.
I think I could use VRRP? However as far as I see it that has two issues:
What if the colo provider sends packets destined for the servers via the the inactive/backup VRRP partner?
Since you’re not acting as an ISP and the BGP session is only with your colocation provider (as your default gateway), you can skip BGP and just go with OSPF for internal routing. Turn on BFD for OSPF to get quick routing failover, otherwise you’ll be stuck with the default 40-second delay.
You can also add two VRRP instances to load-balance across both gateways (check out MikroTik’s dual VRRP load balancing example). If you’ve only got two routers, you don’t even need OSPF since VRRP alone will do the job.
Agree with @Larsa on routing you can keep it simple and use OSPF for everything but the BGP peering to your upstream.
However, if you want to keep the CRS326 as the L3 gateway, I would consider not using MLAG and just use independent ports to your servers on the MikroTik side, this will allow you to utilize L3 hardware offload on the CRS326 switches since MLAG doesn’t support hw offload and they can act as a gateway that will be capable of wirespeed traffic. You’ll have to keep duplicate L3 VLAN interfaces disabled on switch-2 and use netwatch to enable/disable them since vrrp isn’t supported in hardware offload either. On the server side, most servers support several non-lacp bonding types just like MikroTik (since it’s linux based) which will use whatever path is active and forwarding based on ARP and MAC learning. You’ll need to enable either rapid or multiple spanning tree on the switches to prevent loops and prefer traffic to switch 1 as the primary. It’s not as clean as MLAG but the performance is way better.
Alternatively, you can put another set of CCR2004s or CCR2116s in as the L3 gateway for the servers and connect the switches below them. Then you could use MLAG if desired.
Either way, it will give you much better performance between servers as well as to the Internet.
Unfortunately VRRP forces you back into CPU as well. Most people build duplicate L3 IPs on the VLAN interfaces for each switch, with the IPs for switch 2 disabled and create a script to detect the outage with netwatch and enable them if needed, which is essentially a way to hack together an L3 hw-offload “friendly” VRRP alternative.
Something that has been suggested to me is Layer 2 bridging from the servers to the routers (via switches ) and then configuring ECMP on the servers, effectively giving them two gateways. What’s your view of that?
Also in terms of the switch failover in the previous scenario, would an anycast IPv4 address on both switches with OSPF managing the distances be a valid approach?
I would personally use another vendor for switching that support stacking.
I don’t quite trust MikroTik yet for advanced switching. They are still buggy.
In the second hand market there are many available options that are quite cheap.
For example two Cisco Catalyst 3750X. It’s quite cheap that you could get a 3rd one to keep as cold spare in case one dies.
Then you can achieve even better redundancy with 2 uplinks (one from each switch) with LACP to each router, and then downstream, you can do LACP between both switches to your servers.
IMHO you don’t need L3 on the switches (at least with what your diagram shows and your described use case).
You can simply do VRRP on the routers for the ‘lan’ side and have the switches just do L2 VLANs.
And then use BGP for eBGP with the datacenter and iBGP between the two routers.
If you don’t want to bother with OSPF for iBGP to work, you can set the next-hop choice to “force self” on the iBGP peers.
This way no matter what fails in your topology, there will be a path (either L3 or L2) to reach your servers without any manual intervention.