How to designate some path as lower preference

We have a private network with some MikroTik routers running BGP (all different private AS numbers, so eBGP)
which have links between them (not a star or full mesh).
Several of the routers also have a GRE tunnel over internet to a Linux system running quagga bgpd.
One of the routers has a link to that system.

We would like to arrange it so that the links between the routers (and between the router and Linux system)
are preferred, also when there are some more hops along those links, over the GRE tunnel. The tunnel should
only be used when parts of the network become unreachable due to link failures.

I have read the article about filters and setting weights, and also found the use of the “set distance” to set
the distance of the route appearing in the table, but I have not been able to set things up in such a way
that routing works when everything is up. Routes on “far away” routers point in the wrong direction and
end up in ping-pong when locally a preference is made to not use the GRE tunnel.

How is one supposed to configure this in BGP? or is it not possible to do it this way?

Is it correct that you would use “bgp prepend” as a trick to achieve this result?

Prepend is a good way to control which path traffic will come inbound to your AS and local preference works well for selecting the outbound link.

However, if you control both ASes, then I would opt for communities and probably use local preference in both ASes to control the traffic.

Here is an excellent overview of communities in service provider networks

https://www.nanog.org/meetings/nanog40/presentations/BGPcommunities.pdf

Thanks for your reply!
With a prepend on incoming and outgoing at each router towards the GRE tunnel it looks like it is
working now.
I will study the community feature to see how we can make use of that, although it is not immediately
clear to me how it would be used in this case.
What I already learned is that by “not telling others what you know” it is very easy to create routing
problems. Instead of traffic flowing along another path than we would prefer, it does not flow at all
anymore…
Fortunately it is not a critical network (…)

I would just use local_pref - on each peer, add a rule to the in-filter chain for that peer (you should use a dedicated in-filter for each peer) and set local_pref=120 for prefixes you would like to prefer this peer’s path for. Since you control all routers, then you’ll be doing the same thing at the other end, and your policy will be followed to a tee.

Unfortunately this does not work correctly!
When all routes are available it selects the route with the highest local_pref, when that route is disrupted due
to link problems it switches to the lower preference route OK, but when the better route returns it does not
change all the routes back to the higher preference path but at least some of them remain pointing the
wrong way.
With bgp-prepend this problem does not occur…

Why is this?

Another question more or less related to this:

Part of the network looks like this:

internet------R1---------R2---------R3--------R4
              |                               |
              +------------GRE----------------+

R1 has a default route and all routers have Default Originate - If Installed.
When all links are up, R2 has default pointing to R1, R3 has default pointing to R2,
R4 has 2 defaults: pointing to R3 active, pointing to R1 (via GRE) available.
This is just fine! Traffic for default flows R4->R3->R2->R1 and the local networks
connected to the 4 routers all have a default route, all learned via BGP.
(no static default routes are in place)

Now when the link between R2 and R3 is cut, R4 loses the default to R3, and
switches to its default via GRE to R1. However, R3 now does not have a default
route anymore! I would expect that it would get one pointing to R4.
All the other routes have that behaviour (R3 routes via R4), but the default route
at R3 is lost.

Is there a logical explanation for this?

Is there a particular reason that you’re using BGP for all of this? OSPF would be much better suited if you control all of the routers and it’s all a single “autonomous system” of routers - meaning that they should all share the same routing policy. Put a very bad cost on the GRE tunnel (more than the total cost for the R1->R2->R3->R4 path - i.e. if these links all cost 10, then make the GRE cost 40 or more - 100 would be totally safe. Originate default from R1 and let the others choose the best path to reach R1 based on the costs.

Something about this BGP design just triggers my “this is wrong” feeling - ideally, you would originate the default prefix only one time at R1 and let the others pass it along - I know Mikrotik’s iBGP implementation won’t forward a default prefix (I hope they fix this) - but maybe eBGP will forward an eBGP-learned default prefix (I’m away from my lab so I can’t test this right now). Basically, when each router originates a default prefix, it’s based on the authority of a default GW’s presence in IGP (static, ospf, rip, etc) but in your case, this “igp” authority is only based on BGP, so it seems strange that a router would learn a default from BGP, and then use this as an authoritative “igp” source to generate a new default prefix.

I think I understand now why the local_pref method does not work.
There appears to be no “cost” metric in BGP other than the path length (number of hops).
So when you set local_pref somewhere it is used (as the name implies) locally, but the fact that the decision
was made on local_pref is not communicated to the next router so the network as a whole does not understand
this. E.g. R4 knows it should prefer R3 over the path via GRE to R1, but R3 does not understand that R4 is
making this decision and when the situation at R2-R3 changes it sees no reason to go back to the preferred
situation routing via R2 rather than via R4.

With the prepend it works ok because the path length is always known everywhere so R3 can make the proper
decision. It appears we have to go with the bgp-prepend to get correct behaviour.

W.r.t. using BGP: this is in a network where we control some local routers but not the entire network.
And earlier in other parts of the network the decision was made to use BGP. (it is the same network that
docmarius is also operating in, the amprnet also called hamnet). It is a network based on radio links where we
want to add some redundancy by using internet as a fallback, but it is “not fun” to route over internet as we are
radio amateurs. Hence this strange requirement of not using the shortest path.

I agree that BGP is probably not the best routing algorithm, because there is also little room for adding in relative
path quality figures, e.g. a link that is a bit long stretched and has some packet loss. I have some experience
with Cisco iegrp which has quality and delay metrics that can be tweaked in more detail. OSPF is similar to
that, I think.

I still don’t understand the default route issue because it appears to work OK one way and not the other. R3 takes
its default from R2 (which has taken it from R1) but not from R4 (that has also taken it from R1). But the setup is
basically the same on all routers, with the same BGP instance (different AS number on all the routers) and peer
settings and the same routing filters, all of them are like this:

/routing filter
add action=accept chain=amsterdam-in prefix=44.0.0.0/8 prefix-length=8-32
add action=accept chain=amsterdam-in prefix=0.0.0.0 prefix-length=0
add action=discard chain=amsterdam-in
add action=accept chain=amsterdam-out prefix=44.0.0.0/8 prefix-length=8-32
add action=accept chain=amsterdam-out prefix=0.0.0.0 prefix-length=0
add action=discard chain=amsterdam-out

Why this works OK over one link and not over the other is a mystery to me. All the detail info I can get about routes
and advertisements looks similar everywhere, and I have tried both with setting local_pref and with bgp-prepend
at R4 and there is no difference in this behaviour.

This is pretty much the case here - because you’re using eBGP, and local_pref does propagate between BGP-speaking routers, but only iBGP.
In BGP, the decision list is like this:

  1. highest weight (internal to any one router only / semi-proprietary metric)
  2. highest local_pref (internal to any one ASN)
  3. route’s origin ‘authority’ (my word) → igp > egp > unknown
    (did the originator of the route do so based on an IGP route, an EGP peer’s route (and who uses EGP anymore?), or just inject it arbitrarily?

In your scenario, at this point, all routes are the same - weight doesn’t communicate, all routers are eBGP neighbors so local_pref doesn’t propagate, and all routes are going to be originated the same way so 3 will always be a tie too.

  1. AS-PATH - this also can’t be accumulating on the default gateway metric, because every router originates it - i.e. there’s no AS path. Now, the return paths to the local networks behind each router WILL accumulate AS-PATH lengths…

And at this point, this is going to equate to #-of-hops, just as you said.

Perhaps setting a recursive static default GW address in R2,R3,andR4 would be best - and use an IP address being advertised by R1 as that “recursive default GW.” The other routers’ path-making decision to reach this address will follow the BGP table’s view of how to reach R1… and the AS-PATH will accumulate properly on these. In fact, local-pref could work as well because if R2 sets local-pref=100 on R1, and local-pref=50 on R3, then R2 will prefer R1 for everything. R2 will now pass along its R1 routes to R3. R3 will prefer R2 in the same way, so it will pass the paths from R2 to R4. R4 will receive from both R1 and R3, but will set local_pref=100 on R3, and local_pref=50 on R1. In this case, R4 will not assert any paths from R1 towards R3 because it’s only going to announce things to R3 that wouldn’t go back to R3…

In this case, BGP will behave somewhat like spanning tree - if the R2->R3 link breaks, it should re-converge the way you want. I think it’s the default-originate=if-installed that’s biting you. Try it with the static recursive-lookup default GW being an address advertised by R1.

Well, that appears to be not correct. I have no idea why the default route is treated special in BGP, and not simply
as a route to 0.0.0.0/0 just like any other one, but when I look in the R3 routing table when all links are up I see this:

0 ADb  dst-address=0.0.0.0/0 gateway=44.137.60.5 
        gateway-status=44.137.60.5 reachable via  ether1.vlan51 distance=20 
        scope=40 target-scope=10 bgp-as-path="4220406000,4220406100" 
        bgp-origin=incomplete received-from=utrecht

which in fact is quite similar to any other route:

3 ADb  dst-address=44.137.0.0/16 gateway=44.137.60.5 
        gateway-status=44.137.60.5 reachable via  ether1.vlan51 distance=20 
        scope=40 target-scope=10 bgp-as-path="4220406000,4220406100" 
        bgp-origin=incomplete received-from=utrecht

not anything special like no AS path…
In R2 the default route is like this:

 0 ADb  dst-address=0.0.0.0/0 gateway=44.137.42.1 
        gateway-status=44.137.42.1 reachable via  ether1.vlan57 distance=20 
        scope=40 target-scope=10 bgp-as-path="4220406100" bgp-med=0 
        bgp-origin=incomplete received-from=gw-44-137

and in R4 the two default routes are like this:

 0 ADb  dst-address=0.0.0.0/0 gateway=44.137.62.1 
        gateway-status=44.137.62.1 reachable via  ether3-link-nos distance=20 
        scope=40 target-scope=10 
        bgp-as-path="4220403600,4220406000,4220406100" bgp-origin=incomplete 
        received-from=hilversum 

 1  Db  dst-address=0.0.0.0/0 gateway=44.137.61.17 
        gateway-status=44.137.61.17 reachable via  gre5 distance=20 scope=40 
        target-scope=10 
        bgp-as-path="4220403732,4220403732,4220403732,4220403732,4220406100" 
        bgp-med=0 bgp-origin=incomplete received-from=gw-44-137

All hunky-dory I would say!
But still, it won’t work for the default route and it works fine for all other routes.
Mystery…

I tried to do this but it does strange things too… at R3 I see this after adding the route:

 0 ADb  dst-address=0.0.0.0/0 gateway=44.137.60.5 
        gateway-status=44.137.60.5 reachable via  ether1.vlan51 distance=20 
        scope=40 target-scope=10 bgp-as-path="4220406000,4220406100" 
        bgp-origin=incomplete received-from=utrecht 

 1   S  dst-address=0.0.0.0/0 gateway=44.137.0.1 
        gateway-status=44.137.0.1 unreachable distance=100 scope=30 
        target-scope=10 

 6 ADb  dst-address=44.137.0.1/32 gateway=44.137.60.5 
        gateway-status=44.137.60.5 reachable via  ether1.vlan51 distance=20 
        scope=40 target-scope=10 bgp-as-path="4220406000,4220406100" 
        bgp-origin=incomplete received-from=utrecht

The resulting static route is marked unreachable. The docs say it should be marked recursive. Why not?

Static routes default to being scope=30, target scope = 10.
(this is something I’ve never seen anywhere but in Mikrotik - it befuddles me)

Your recursive default GW should have target-scope >= the scope of BGP-installed routes or else it doesn’t see it as “reachable”

As for the AS-PATH on default routes, yes they can collect an AS-PAtH, but your output shows me that the intermediate routers aren’t originating default GW, but passing the prefix along to the next ebgp neighbor - which is a good thing actually. In this case, the middle routers which can never be the default GW should not have “originate if-installed” because if there’s a default prefix in BGP, they’ll just pass it along, and if not, then they won’t.

If-installed won’t “lift” the route up into BGP (I imagine BGP hovering above the RIB like a helicopter or something) and then announce to the BGP-speaking universe that it has the default GW unless the route already exists in the RIB. The only AS in your setup that should be originating it (since it passes along between your various ASNs just fine, apparently) is the R1 asn - and the rest will simply use AS-PATH length to choose the best way to reach R1. (unless you supersede AS-path by using local-pref.)

Since default GW is passing through properly, I’d suggest sticking with pure-bgp and not using the recursive lookups, and just disabling the originate=if-installed on R2,R3,and R4. If a router is completely isolated from reaching R1, then it has no business originating a default GW anyway (at least not in the topology you’ve shared)

p.s. - distance 100 would place this route as inferior to BGP anyway, but like I said, this shouldn’t be necessary since the interem routers are simply forwarding the default-gw prefix from R1 as they should.

It appears the “default originate” option is misleading at best.
All things I have tried until now only point into the direction that it is a global route filter that has precedence over the
route filter one installs.
“if-installed” apparently can also mean “has been learned from other routers”, not only “is defined statically”.

When I set the “default originate” option to “never” in R2, R3 immediately loses its default route and R4 switches
to the GRE tunnel, just like when cutting R2-R3 but with all the other routes unaffected this time.
There is no observable difference between default originate “if-installed” or “always”.
There is some strange thing that appears to affect the learing of default route by R3 from R4 only.

I hope to add another router R5 soon that fits between R3 and R1 the same way as R4 does now (radio link to R3
and GRE tunnel to internet), and see what happens. Hopefully this will resolve if there is some problem with R4
or if it is at R3 or in the general idea.

I have now managed to set a recursive default route (after re-reading the paragraph about scope and target-scope
on the Wiki many times) and it causes problems when combined with distributing default routes via BGP.
At first everything seems to be OK (with the links up) and all the static default routes point in the correct direction,
and when R2-R3 is cut the route at R3 switches correctly to R4, but now R4 learns from R3 that it has a default route
that is closer than the one via GRE and sets its default route to R3 causing a loop. What a mess…
Combined with default-originate never it works. The loop does not occur and everything switches correctly for
the first time!

While it is a workaround, I do not really like it because it needs so much care to set all the parameters correctly
on all the routers. Distributing default via BGP has the advantage of being clear and the same on all routers, e.g.
on my own router at home I use two routing tables and the BGP method worked fine but the recursive method cannot
be used because it resolves the nexthop in the wrong (main) routing table. So, different configuration everywhere.

If you use the floating static method, then you definitely want to disable default originate in BGP except in R1 for the very reason you discovered. In fact, you wouldn’t need to distribute a default route into BGP at all, since every router is using a dynamic source (some IP that R1 announces) as the target of a static default GW. So long as the chosen path to reach R1 follows your desired policy, then the static default GW is going to do the same.

Mikrotik’s BGP treats default routes strangely - in Cisco, you would only need to originate it at R1 and be done, but ROS doesn’t pass default gw routes through unless you have distribute default GW enabled - which strictly speaking isn’t quite right because you don’t want to originate it - (i.e. assert yourself as the source) but you want to pass it along if it’s in the table.

From my own experiments recently, I can say that the behavior is to use the global out-filter, and any routes which are accepted by this filter are then subjected to the per-peer out-filter. If you do any modifications to the routes’ attributes in the global-out, they will be present for the per-peer filter to see, but they won’t be in the routing table itself with those attributes yet (which is what I would expect).

Ok… thanks for your continued effort to solve this.
I have no global out-filter defined in any of the routers, but only peer in and out filters and they now are all defined
the same. Before I added that redundant path I had “clever” filters to reduce the number of routes, because the
default route and the 44.0.0.0/8 route would normally serve a leaf node just fine there would be no need to send
all the subnet routes there. However when having redundant paths that may not work well, so I have simplified
those filters at least until everything works well.

There is really something strange with the default route.
I am considering advertising a route to 0.0.0.0/1 and 128.0.0.0/1 instead of a default route at R1 to overcome
these problems. Because I really do not see why 0.0.0.0/0 would have to be treated (or configured) anyway
different from all the other routes.

I completely agree with that - OSPF seems to treat it differently as well, but this seems to be a standard behavior of OSPF on multiple platforms, though, where in BGP, the default route prefix behaves pretty much the same as any other prefix.

I’m getting insane!! That does not work EITHER!!!
WHY WHY WHY???
Why is a /8 route passed without problem and a /0 or /1 not?
I now remember reading about an issue with BGP and small netmasks in a recent release note, but I cannot
find it anymore. I hope this bug has not returned. All routers are running version 6.34.2

ZeroByte: I asked support about the release note and if it is a known bug…

The recursive static default GW route method should still work for you, though - once you’re able to get the scopes worked out of course.