BGP routers not reflecting all routes

We have two BGP sessions with two independent ISPs at two different locations and a multihop BGP session between our two routers. We have configured route reflection to our inter-site peering but notice that some routes are not distributed. I assume that this is simply due to the routes not currently being active.

Site A:

/routing bgp instance
set default as=37105 router-id=48.29.21.1
/routing bgp peer
add name=TATA remote-address=208.160.18.81 remote-as=26937 ttl=default
add multihop=yes name=siteB nexthop-choice=force-self \
    remote-address=48.29.23.1 remote-as=37105 route-reflect=yes \
    ttl=default update-source=lo

Site B:

/routing bgp instance
set default as=37105 router-id=48.29.23.1
/routing bgp peer
add name=Level3 remote-address=209.62.70.53 remote-as=28100 ttl=default
add multihop=yes name=siteA nexthop-choice=force-self \
    remote-address=48.29.21.1 remote-as=37105 route-reflect=yes \
    ttl=default update-source=lo

Site A learns the route from its ISP but installs the route via Site B, so has both routes ready. Site B doesn’t install the backup route via site A as the route via its ISP is preferred:

site A:

[admin@siteA] > ip route print detail where dst-address in 41.79.4.0/24
Flags: X - disabled, A - active, D - dynamic, C - connect, S - static, r - rip,
       b - bgp, o - ospf, m - mme, B - blackhole, U - unreachable, P - prohibit
 0 ADb  dst-address=41.79.4.0/24 gateway=48.29.23.1 gateway-status=48.29.23.1 recursive via 198.19.12.54
        ether4 distance=200 scope=40 target-scope=30 bgp-as-path="28100,1299,37148,37209" bgp-local-pref=100
        bgp-origin=igp bgp-communities=28100:1,28100:13 received-from=siteB

 1  Db  dst-address=41.79.4.0/24 gateway=208.160.18.81 gateway-status=208.160.18.81 reachable via  ether3
        distance=20 scope=40 target-scope=10 bgp-as-path="26937,6453,1299,37148,37209" bgp-origin=igp
        bgp-communities=6453:86,6453:2000,6453:2100,6453:2101,26937:4000,26937:11020 received-from=TATA

site B:

[admin@siteB] > ip route print detail where dst-address in 41.79.4.0/24
Flags: X - disabled, A - active, D - dynamic, C - connect, S - static, r - rip,
       b - bgp, o - ospf, m - mme, B - blackhole, U - unreachable, P - prohibit
 0 ADb  dst-address=41.79.4.0/24 gateway=209.62.70.53 gateway-status=209.62.70.53 reachable via  ether5
        distance=20 scope=40 target-scope=10 bgp-as-path="28100,1299,37148,37209" bgp-weight=40
        bgp-origin=igp bgp-communities=28100:1,28100:13 received-from=Level3

Is there something I can do to announce inactive routes so that site B can recovery more quickly in the event of its ISP loosing connectivity?

Are you trying to reflect a full table to somewhere other than the two BGP border routers?

When using iBGP, it is normal not to see all the routes.

Suppose destination X is preferable through router A.
Router A will send prefix X to router B using iBGP.
Router B will compare ispA->X with its own ispB->X path. If router B prefers ISP-A, then router B will withdraw B->X from iBGP. Thus Router A won’t see the B->X route in its table.

So basically, in router A, you will only see the routes from B which B considers to be better than the routes via isp A.

Sometimes, there are routes where A would use ISP A, but B would use ISP B.

Suppose destination Y is roughly the same through both ISPs. Each router would get as far in the path selection algorithm as the step where an eBGP path is preferable to an iBGP path. In this scenario, both routers A & B would choose their own direct ISP’s path to destination Y, and both would send the prefix Y to the other router. So both routers would show an active route to their directly-connected ISP, and an inactive route via the opposite router’s ISP.

This may seem a bit confusing, but it makes sense if you have a 3rd router. If you were to look in this third router, it wouldn’t make sense to see an advertisement from router B that says to go via router A - router A will have already sent this same prefix to router C. Since router B agrees that A is the best destination, then it’s going to stay out of the conversation regarding that particular route. If A loses the route to z, it will withdraw that route from C and B. Router B would then notice that it has only one choice left - using ISP B, at which point it will tell routers A and C “hey, guys, I can reach destination z!”

Whatever the case, when looking at router C’s table, you would know that all routes with just one gateway are preferable via that ISP by both routers, and any routes with two prefixes, then router C is going to pick it’s favorite of the two routes, and any eBGP peers of router C would only see the routes that C would choose.

Think of a prefix announcement in BGP to be equivalent to the statement: “If you give me a packet for destination xyz, then here’s what I am going to do with it.” So given that, it makes complete sense for router B not to bother wasting everyone’s time by saying “I would give packets for xyz to router A.” Router B just shuts up and remains silent about xyz as long as it wouldn’t carry the packet to ISP B at all.

Also - it looks like your IBGP is set up to use “next hop self” - there are reasons to change the next hop address of prefixes in iBGP, but in general, you don’t want to do that - you want to leave the next hop address (gateway address) to be the IP address of the eBGP peer. This, of course, requires that your /30 link addresses on the ISP circuits themselves be active in your OSPF table so that iBGP can plan the next hop IP accordingly. Also, you don’t need to set multihop on iBGP - that’s just understood to be the case for iBGP. Multihop is a “special case” for eBGP, which defaults to only speaking to a directly-connected peer.

The reason is that BGP is “big picture” stuff, and in general, you want the internal distance to the gateway route to be available as a factor in iBGP neighbors’ route-making decisions.

Think of it like choosing an airline ticket. You might find an amazingly cheap flight to Paris, but if the airport is a 3 day drive from your house, you would probably take a more expensive ticket from an airport nearer your home. The “next hop” IP in bgp is basically the “airport”, and the bgp distance is analogous to the characteristics of the flight - how long it takes, how many layovers, how long, what airport, how expensive, etc.

Lastly, you don’t need to use reflect-route=yes. BGP routers will automatically send all routes learned via eBGP to every iBGP neighbor (except as in my previous post). Route reflector is a special case where you have a central router that peers iBGP with a group of routers. By default, iBGP will not repeat any iBGP-learned routes to its other iBGP neighbors, unless you enable route reflector. Using route reflectors is a method to cut down the number of peerings required inside your network. Each border router will peer with (usually) two route-reflectors. The route reflectors act like clearing houses for your network’s global routing policy, and distribute the best routes of their cluster to each other and to other route reflectors. The border routers won’t need to peer with each and every router in your BGP deployment - just the route reflectors.

Many thanks for the time and effort you invested in your replies, I’ve certainly learnt something and have updated the settings on my iBGP peers.

Your post perfectly explains why both routers initially have an active and backup route and then remove the backup routes when the other side chooses it as a preferred path.