Hello mducharme,
My topology goes like this
ISP 1 -- Router A ------- Router C ------- Router D ------- router F ----- Router B--- ISP 2
So I setup Router A and Router B as BGP routers, inside Routers C, D , F , A, B all use OSPF, I want to establish that when my ISP 1 is down ( it happens sometimes and I m stuck currently) that all traffic goes to Router B and ISP 2, the primary thing is default route
I ve setup OSPF as I mentioned , I am asking is this going to work as I mentioned before ?
You can the redundancy you want to work, yes, but this situation is more complex than most starting out. Unfortunately this view of the network topology is a bit simplistic, since I cannot see everything, but I assume that routers C D and F form a core of sorts, with router A and B as the edge. You have to think *very* carefully about anything that can go wrong. I can see some challenges:
- If Router A and Router B are going to be doing your BGP, they *need* to be peered with one another because their routing tables should agree. Normally they would be next to each other, but in this case there are several hops in between. This leads to the possibility of a split brain problem in event of certain failures, covered later. Keep in mind that this also means that if a packet does arrive at Router B to go out the Internet, but Router A is properly up and running, it will send this packet to router A to go out if router A is the preferred gateway because they should agree.
- With a static default route on Router A and Router B, if the BGP peering goes down but the ethernet link does not, the OSPF default route will still go to router A because the route will be active. This can be solved by asking both ISPs to only give you a default route instead of a full table, then this default route will go away if the BGP peering goes down.
- If you do #2, which is probably best, then the problem becomes how to redistribute the default gateway into OSPF. RouterOS can do this on paper by having OSPF redistribute default "if-installed" but in practice this is not always reliable and may not work properly. On the network for my employer, we get around this by running a second AS number (a private AS) on our core routers vs. our edge with our public AS. We add a static blackhole default route on the core routers (not the edge) and they redistribute to OSPF, then once the packet gets to the core routers, they get the default through BGP so they route it to whichever border router is the default. In your case if you adopted that setup this would mean choosing a private AS number in the range 64512 to 65535, then setting up BGP on routers C D and F with that AS number (you would set up three peers, between C and D, between D and F, and between C and F, unless you use route reflectors). Then you would peer router F with router B (on your public AS) and router C with router A (on your public AS). You should also enable "remove private AS number" option on your external peerings with the ISPs on routers A and B so that you do not end up accidentally sending them routes with this private AS number in the AS path. I would like to reinforce that this may not necessarily be the best setup for you depending potentially on other issues covered later.
- If you have done what is in #3 above then you would have OSPF redistribute default route on possibly all of routers C D and F with a static blackhole default on each. OSPF default would then shunt the traffic from other routers (outside of those 5 above) to a point where the BGP routes would take it the rest of the way. The static blackhole default is again just a way of working around the sometimes unreliable "redistribute if-installed" option because then you can use "redistribute always" on those, since a static blackhole default is redistributed as if it were a static default route to the other routers.
- The other potential issue that I can see here is that of a split-brain situation. You have to think about what can happen if a link goes down internally. This would have two consequences, the first and probably least problematic is that your edge routers losing peering with each other would mean that they would both install the default for the ISP they were connected to. I don't know which default route that routers C D and F would receive in their private AS, it is more unpredictable. What would happen in this case is traffic may go out the wrong gateway, but you would still have connectivity. The second consequence is that your network could be broken into two islands by a link failure. Your diagram above is looks like a chain, where a loss of link between, say, Router A and Router C, or Router C and Router D, or Router D and Router F, or Router F and Router B would split your network in two. I do not see what redundancies are built in because you have not shown your entire network, only those 5 routers as though they were in a chain. If they are indeed in a chain like your diagram above, then a break anywhere in the chain would cause problems where the network would be split into two halves. Both halves would be able to route traffic out (assuming that both Router A and Router B were up) but if all traffic were arriving on Router A, the return traffic for the other half of the network would not get there because of the break in the link. The worst case scenario that I can see (although I haven't thought through everything) would be a link loss between Router A and Router C where Router A was still up and running and peered successfully with ISP1, which would mean that Router A would be on an island by itself and routers B-F would be still connected to each other but not to Router A. In this case, all incoming traffic would still be going to Router A (which is still up and running and connected to the Internet) but with the link to Router C down, it would have no way of getting the packets to the rest of the network, so then your failover would not work as desired.
The split brain in 4 seems problematic to me and there are other potentially other challenges that I haven't considered. You have to think carefully about what weirdness can start happening if failure X occurs for every possible failure before you roll out a specific topology. The edge routers should have redundant connectivity to each other to remove the possibility of a split brain. Another potential way is to alter the design above so that you are originating the public prefixes from a more internal router, say router C or D or F, or some combination of those. For more information see
https://www.noction.com/knowledge-base/ ... figuration under the heading "Where to originate prefixes".