Community discussions

MikroTik App
 
User avatar
NathanA
Forum Veteran
Forum Veteran
Topic Author
Posts: 829
Joined: Tue Aug 03, 2004 9:01 am

BGP + static candidate routes: ROS picks the wrong one??

Thu Oct 13, 2016 12:53 pm

I have searched these forums high and low, and cannot find anybody discussing the same (or similar) issue.

Here is the scenario:

Customer has 2 ISPs. One ISP provides 2 gateways -- both exposed to the customer on the same L2 and within the same subnet -- and uses BGP with private ASN for failover (only default route is announced by the provider). The other ISP just uses straight IPoE, static address, no BGP or other routing protocol or tunneling. As a result, there are 3 default routes in the RIB: 2 from the same BGP instance, and one static route. The first ISP is the preferred default and the second is just as a backup if the first is completely down, so the static default route is configured with a higher distance than the BGP-added routes.

This works fine at first:
 #      DST-ADDRESS        PREF-SRC        GATEWAY            DISTANCE
 0 ADb  0.0.0.0/0          192.168.111.3   192.168.111.1            20
 1   S  0.0.0.0/0                          192.168.254.254          25
 2  Db  0.0.0.0/0          192.168.111.3   192.168.111.2            20
...up until the BGP session that the current active default was learned from dies. At that point, I would expect the second BGP default route to be chosen as active, since it has a lower distance. However, this is not what happens:
 #      DST-ADDRESS        PREF-SRC        GATEWAY            DISTANCE
 0 A S  0.0.0.0/0                          192.168.254.254          25
 1  Db  0.0.0.0/0          192.168.111.3   192.168.111.2            20
The static route with higher distance is selected as the active one, even though the gateway for the BGP route is seen to be "reachable". If I enable verbose logs, RouterOS tells me that it selected this route, but does not explain why:
21:44:23 route,debug,calc Begin calculation 
21:44:23 route,debug,calc Select route 
21:44:23 route,debug,calc     dst-address=0.0.0.0/0 
21:44:23 route,debug,calc     attributes 
21:44:23 route,debug,calc         protocol=STATIC 
21:44:23 route,debug,calc         distance=25 
21:44:23 route,debug,calc         scope=30 
21:44:23 route,debug,calc         target-scope=10 
21:44:23 route,debug,calc         next-hop= address=192.168.254.254 
21:44:23 route,debug,calc         comment= 
21:44:23 route,debug,calc         origin-type=STATIC 
21:44:23 route,debug,calc End calculation 
However! If I disable the static route, the remaining BGP route is selected (as one would expect), and when I re-enable the static route, the remaining BGP route remains the active route! OR if I disable and re-enable the remaining BGP session while the static route is active, when BGP re-adds the default route to the table, it instantly is chosen as the active one:
02:39:36 route,debug,calc Begin calculation 
02:39:36 route,debug,event Added candidate route 
02:39:36 route,debug,event     dst-prefix=0.0.0.0/0 
02:39:36 route,debug,event     attributes 
02:39:36 route,debug,event         protocol=BGP 
02:39:36 route,debug,event         scope=40 
02:39:36 route,debug,event         preferred-source=192.168.111.3 
02:39:36 route,debug,event         next-hop= address=192.168.111.2 
02:39:36 route,debug,event         origin-type=BGP 
02:39:36 route,debug,event         origin-instance-id=0 
02:39:36 route,debug,event         bgp-peer-router-id=172.26.255.1 
02:39:36 route,debug,event         bgp-peer-flags=0 
02:39:36 route,debug,event         bgp-router-id=172.26.255.2 
02:39:36 route,debug,event         bgp-origin=INCOMPLETE 
02:39:36 route,debug,event         bgp-as-path=64999 
02:39:36 route,debug,event         bgp-as-path-len=1 
02:39:36 route,debug,event         bgp-nexthop=192.168.111.2 
02:39:36 route,debug,event         bgp-localpref=90 
02:39:36 route,debug,event         use-te-nexthop=yes 
02:39:36 route,debug,calc Select route 
02:39:36 route,debug,calc     dst-address=0.0.0.0/0 
02:39:36 route,debug,calc     attributes 
02:39:36 route,debug,calc         protocol=BGP 
02:39:36 route,debug,calc         scope=40 
02:39:36 route,debug,calc         preferred-source=192.168.111.3 
02:39:36 route,debug,calc         next-hop= address=192.168.111.2 
02:39:36 route,debug,calc         origin-type=BGP 
02:39:36 route,debug,calc         origin-instance-id=0 
02:39:36 route,debug,calc         bgp-peer-router-id=172.26.255.1 
02:39:36 route,debug,calc         bgp-peer-flags=0 
02:39:36 route,debug,calc         bgp-router-id=172.26.255.2 
02:39:36 route,debug,calc         bgp-origin=INCOMPLETE 
02:39:36 route,debug,calc         bgp-as-path=64999 
02:39:36 route,debug,calc         bgp-as-path-len=1 
02:39:36 route,debug,calc         bgp-nexthop=192.168.111.2 
02:39:36 route,debug,calc         bgp-localpref=90 
02:39:36 route,debug,calc         use-te-nexthop=yes 
02:39:36 route,debug,calc End calculation 
 #      DST-ADDRESS        PREF-SRC        GATEWAY            DISTANCE
 0 ADb  0.0.0.0/0          192.168.111.3   192.168.111.2            20
 1   S  0.0.0.0/0                          192.168.254.254          25
I have also reproduced this problem with prefixes other than 0.0.0.0/0. It happens in any situation where you have 2 BGP routes and 1 static route for the same prefix, where the static route has a higher distance, and the active BGP route is removed from the table.

Do I smell a bug here? Or am I missing something obvious? This is RouterOS 6.34.6 on a CCR.

Thanks,

-- Nathan
 
pe1chl
Forum Guru
Forum Guru
Posts: 10195
Joined: Mon Jun 08, 2015 12:09 pm

Re: BGP + static candidate routes: ROS picks the wrong one??

Thu Oct 13, 2016 2:58 pm

It is depending on the scope and target scope of the route, I think.
 
User avatar
NathanA
Forum Veteran
Forum Veteran
Topic Author
Posts: 829
Joined: Tue Aug 03, 2004 9:01 am

Re: BGP + static candidate routes: ROS picks the wrong one??

Thu Oct 13, 2016 3:22 pm

It is depending on the scope and target scope of the route, I think.
target-scope in all cases is the default of 10. The nexthop route for the 2 BGP gateways is a connected route, which by default has a scope of 10. The nexthop route for the static default route is also a connected route and also has a scope of 10. Therefore all nexthop route scopes are within the target-scopes of the default routes, and all of them are tied (all target-scopes are 10, all nexthop route scopes are 10).

Therefore the distance should be the tie-breaker. But it isn't working that way.

Is there a flaw in my logic?

-- Nathan
 
pe1chl
Forum Guru
Forum Guru
Posts: 10195
Joined: Mon Jun 08, 2015 12:09 pm

Re: BGP + static candidate routes: ROS picks the wrong one??

Thu Oct 13, 2016 3:39 pm

Ok in that case it indeed should not be the reason.
But in my router using iBGP the scope and target scope for BGP routes are different, hence the suggestion.
Do you need that static default routes? I operate a number of routers where the default route is distributed
via BGP, and after some teething problems it works fine and I have deleted the static routes.
(I don't think I had the problem you describe but my static routes pointed to the same gateway that BGP
would likely decide)
 
paulct
Member
Member
Posts: 336
Joined: Fri Jul 12, 2013 5:38 pm

Re: BGP + static candidate routes: ROS picks the wrong one??

Thu Oct 13, 2016 3:52 pm

Is there just one BGP instance or multiple?
 
User avatar
NathanA
Forum Veteran
Forum Veteran
Topic Author
Posts: 829
Joined: Tue Aug 03, 2004 9:01 am

Re: BGP + static candidate routes: ROS picks the wrong one??

Thu Oct 13, 2016 4:18 pm

But in my router using iBGP the scope and target scope for BGP routes are different, hence the suggestion.
This is eBGP (different ASes).
Do you need that static default routes? I operate a number of routers where the default route is distributed
via BGP, and after some teething problems it works fine and I have deleted the static routes.
I must not have explained the situation clearly. This is an end-user's router. They do not (currently) use BGP internally. They don't have their own ASN. The only point of BGP here is as a failover mechanism between two gateways on the one ISP's side...if one gateway dies, then the customer's router starts using the same ISP's other gateway as its default route. They only get a default route announced to them over the BGP sessions from the ISP.

There is a second ISP involved that is used strictly as a backup if ISP #1 dies completely (neither gateway is reachable). They do not and cannot speak BGP to this other ISP. That's where the static route factors in.
Is there just one BGP instance or multiple?
One instance, 2 peers. And both peers on the ISP's side are in the same subnet, so the same connected route is used as the nexthop in both cases.

-- Nathan
 
paulct
Member
Member
Posts: 336
Joined: Fri Jul 12, 2013 5:38 pm

Re: BGP + static candidate routes: ROS picks the wrong one??

Thu Oct 13, 2016 4:50 pm

Stab in the dark here - set one of the filters on one of the peers with a higher local preference - maybe as it is on the same subnet/connected route it is seen as one?
 
User avatar
NathanA
Forum Veteran
Forum Veteran
Topic Author
Posts: 829
Joined: Tue Aug 03, 2004 9:01 am

Re: BGP + static candidate routes: ROS picks the wrong one??

Thu Oct 13, 2016 4:57 pm

Stab in the dark here - set one of the filters on one of the peers with a higher local preference - maybe as it is on the same subnet/connected route it is seen as one?
Thanks for the suggestion, but sorry, I should have mentioned earlier that we are already setting localpref via filters to different values on the two defaults learned via BGP, as a means of ensuring that one of the two ISP gateways is preferred over the other when they are both up. So, already done.

...and just to be clear, everything works exactly the way it should between the two BGP default routes in terms of priority, failover, etc., *so long as the static default route doesn't exist or is disabled*.

(EDIT: On a re-read, I take you to mean that maybe between two identical prefixes learned from a single BGP instance that both refer to the same nexthop, everything works as you would expect *within* that BGP instance, but the RIB > FIB election system views both together as a single entity relative to the same prefix learned from other sources? Hmm, maybe...still seems like that behavior wouldn't be called a "feature", though. If that is the case, then changing localpref might not be enough. Perhaps I should try 2 separate subnets/connected routes, or two separate eBGP instances as a last resort...)

-- Nathan
Last edited by NathanA on Thu Oct 13, 2016 5:07 pm, edited 2 times in total.
 
User avatar
ZeroByte
Forum Guru
Forum Guru
Posts: 4047
Joined: Wed May 11, 2011 6:08 pm

Re: BGP + static candidate routes: ROS picks the wrong one??

Thu Oct 13, 2016 5:04 pm

One question got missed: Is this one BGP instance or two instances?

My thoughts - this sounds like the state machine in BGP is getting something wrong (i.e. it's a bug)
I can't quite verbalize exactly what it is my intuition feels, but it's basically that both BGP routes are so close together that the state engine is "forgetting" to check the other BGP route before it checks the floating static route....

I think this bug(?) is probably yet another result of ROS's method of using the RIB as its scratch space for routing protocols. There's no "show ip bgp" equivalent. My mental model of routing is that BGP is its own space, out of which come the "winning" prefixes from those chosen by best path. After this, the best of BGP must contend with the best of other protocols via the distance metric. Thus an EBGP best will always trump a best OSPF route will always trump a best IBGP route, etc.... The routes being pre-mixed in the routing table itself seems like there arises a cart/horse or chicken/egg situation where care must be taken to re-evaluate all BGP routes before looking at the AD, (to say nothing of the OSPF issue I'm always harping about).

Anyway, if I were in your shoes, I'd try these things to see if they can successfully work around the issue:

Use a filter rule to set the distance of the backup BGP default GW to be 21 (instead of the default 20)

Make the second BGP session be a separate instance of BGP (ick). If this alone doesn't fix it, set the AD for BGP2 as 21. I really feel like this combination would work (but is terrible and shouldn't be necessary)
 
User avatar
NathanA
Forum Veteran
Forum Veteran
Topic Author
Posts: 829
Joined: Tue Aug 03, 2004 9:01 am

Re: BGP + static candidate routes: ROS picks the wrong one??

Thu Oct 13, 2016 5:09 pm

One question got missed: Is this one BGP instance or two instances?
You must have been typing this response up while other discussion was going on. :) This is answered above (one instance).
My thoughts - this sounds like the state machine in BGP is getting something wrong (i.e. it's a bug)
I can't quite verbalize exactly what it is my intuition feels, but it's basically that both BGP routes are so close together that the state engine is "forgetting" to check the other BGP route before it checks the floating static route....
I think you and paulct are in a mindmeld. :)
I wonder what would happen if you were to use a filter rule to set the distance of the backup default GW to be 21 (instead of the default 20)
That's a great idea! I'll try that really quickly...
I also wonder what would happen if the second BGP session were being run in a separate instance of BGP (ick).
Yes, ick. It will be my test of last resort.

Thanks,

-- Nathan
 
User avatar
ZeroByte
Forum Guru
Forum Guru
Posts: 4047
Joined: Wed May 11, 2011 6:08 pm

Re: BGP + static candidate routes: ROS picks the wrong one??

Thu Oct 13, 2016 5:13 pm

I edited my wording a little while you were quoting me.... As part of the "ick" solution, I really think that a separate BGP instance + AD would be such an explicit lineup that it would be almost the same as having 3 static routes w/ ping tests. How could that fail? Sure, it's a Faustian crime against nature, but if that's what it takes to get something this simple to work, then that's what it takes. A bug report should follow quickly after - email subject: WTF?


EDIT:

In general, I'd say that there is a problem with ROS's routing engine where "preemption" takes place. There are other scenarios where an inferior AD route remains active despite a better one from a different dynamic source. For instance, If OSPF is advertising a redistributed route "if-installed" based on a floating static route, it will ignore this prefix internal to OSPF, so if the same prefix becomes available in OSPF, it will not stop injecting the external route and using the OSPF one as it should.

Honestly, I hope these "glitches" are fixed by Mikrotik's routing engine overhaul for version 7.
 
User avatar
NathanA
Forum Veteran
Forum Veteran
Topic Author
Posts: 829
Joined: Tue Aug 03, 2004 9:01 am

Re: BGP + static candidate routes: ROS picks the wrong one??

Thu Oct 13, 2016 5:14 pm

I wonder what would happen if you were to use a filter rule to set the distance of the backup default GW to be 21 (instead of the default 20)
That's a great idea! I'll try that really quickly...
Sadly, this made no difference. That would have been a nice, easy work-around.

I will have to try the other potential work-arounds later. Regardless of results, looks like I will be opening up yet another support ticket...

-- Nathan
 
paulct
Member
Member
Posts: 336
Joined: Fri Jul 12, 2013 5:38 pm

Re: BGP + static candidate routes: ROS picks the wrong one??

Thu Oct 13, 2016 5:41 pm

a "quick" fix/test would be for your ISP to provide a new subnet. If it works then you know you have a bug, if it still doesn't work you have a bug :D
 
User avatar
ZeroByte
Forum Guru
Forum Guru
Posts: 4047
Joined: Wed May 11, 2011 6:08 pm

Re: BGP + static candidate routes: ROS picks the wrong one??

Thu Oct 13, 2016 5:48 pm

Sadly, this made no difference. That would have been a nice, easy work-around.
I had low hopes for this working, but it was worth trying. I feel more confident in the separate process method's success because being a separate process, it will not be going through any kind of state change when the primary route fails. Thus, it should be immediately available to the process which falls back to the next-highest AD, whereas a second route already available in a process that is "in flux" is not "available" while the choice is made based on AD.
 
User avatar
NathanA
Forum Veteran
Forum Veteran
Topic Author
Posts: 829
Joined: Tue Aug 03, 2004 9:01 am

Re: BGP + static candidate routes: ROS picks the wrong one??

Fri Oct 14, 2016 11:10 am

Well, everyone, the verdict is in:
  • Using a separate subnet for the other peer -- and thus a separate (connected) route for the nexthop -- did not fix it.
  • Setting up the session with the second peer as a separate BGP instance *did* fix it.
Once a second BGP instance was set up, at that point I did have to change the distance on one of the two BGP-installed routes in order for one to be preferred over the other in a predictable fashion (localpref made no difference).

So it would seem that ZeroByte's instincts were right on the money. And now that I have read a little bit more on ROS's BGP implementation, that this particular edge case / failure mode is possible makes sense; I wasn't aware that of the dynamic routing protocols, BGP (and only BGP) wholly uses the RIB to store all of its information, and does not maintain a separate table in memory at all.

I'll package this up as a support ticket and get it filed with MT soon.

-- Nathan
 
pe1chl
Forum Guru
Forum Guru
Posts: 10195
Joined: Mon Jun 08, 2015 12:09 pm

Re: BGP + static candidate routes: ROS picks the wrong one??

Fri Oct 14, 2016 11:31 am

I'll package this up as a support ticket and get it filed with MT soon.
I suspect a reply like "it will all be fixed in version 7".
 
User avatar
NathanA
Forum Veteran
Forum Veteran
Topic Author
Posts: 829
Joined: Tue Aug 03, 2004 9:01 am

Re: BGP + static candidate routes: ROS picks the wrong one??

Fri Oct 14, 2016 12:45 pm

I suspect a reply like "it will all be fixed in version 7".
Boy, I hope not. V7 is clearly a ways off.

-- Nathan
 
pe1chl
Forum Guru
Forum Guru
Posts: 10195
Joined: Mon Jun 08, 2015 12:09 pm

Re: BGP + static candidate routes: ROS picks the wrong one??

Fri Oct 14, 2016 2:23 pm

There have been repeated statements that there will be no major fixes to the routing in version 6.
BTW, did you try to enable "redistribute static", accompanied by suitable routing output filters so you
don't send all your routes to the ISP?
Maybe the BGP engine is watching static routes more closely when it is aware that it is to redistribute them?
 
User avatar
ZeroByte
Forum Guru
Forum Guru
Posts: 4047
Joined: Wed May 11, 2011 6:08 pm

Re: BGP + static candidate routes: ROS picks the wrong one??

Fri Oct 14, 2016 4:48 pm

Personally, I am really hoping that Mikrotik's re-implementation uses a dedicated BGP table like other routing platforms do. It just seems that the complicated book keeping and decision making where the router must know the difference between competing BGP routes vs. competing AD metric between protocols leads to subtle, but significant errors in the behavior of the platform.

And I'm with Pe1chl on this one - don't hold your breath for a fix in ROSv6. I've found a few other "quirks" like this, and any time I've reported them, I've either been told that it was supposed to act that way, or that it will be fixed in v7's total re-write. When I pressed the issue on the "it's supposed to work that way" items, those were also updated to "fixed in 7" status.

In one way, I can see the point - why waste effort in unravelling an issue that's going to be fixed anyway in a better implementation that they're actively working on? I do say, though, that this only makes sense if V7 is not far away.
 
User avatar
docmarius
Forum Guru
Forum Guru
Posts: 1222
Joined: Sat Nov 06, 2010 12:04 pm
Location: Timisoara, Romania
Contact:

Re: BGP + static candidate routes: ROS picks the wrong one??

Fri Oct 14, 2016 5:21 pm

Don't put your hopes up to high. I have a reported OSPF bug setting the route on the wrong interface in the "will be fixed in V7" state for 2+ years now.
 
User avatar
NathanA
Forum Veteran
Forum Veteran
Topic Author
Posts: 829
Joined: Tue Aug 03, 2004 9:01 am

Re: BGP + static candidate routes: ROS picks the wrong one??

Mon Oct 17, 2016 5:39 am

Oh, I'm sorry: I thought we were talking about a *routing* operating system here...y'know, a category of software where it would normally be considered kind of important that core features related to *routing* work properly, hence the name.

I speak facetiously. But only somewhat. :?

Fortunately, this is a simple enough -- if somewhat ugly -- workaround that I'm not actually too bothered.

-- Nathan
 
ignatievia
just joined
Posts: 1
Joined: Tue Oct 12, 2021 4:09 pm

Re: BGP + static candidate routes: ROS picks the wrong one??

Wed Oct 13, 2021 11:39 am

RouterOS 6.48.2
Faced the same problem
3 BGP network routes are backed up with a static one with higher distance
#      DST-ADDRESS        PREF-SRC        GATEWAY            DISTANCE
 1 ADb  10.20.13.0/24                     172.16.4.1                    20
 2   S  10.20.13.0/24                        172.16.3.1                  210
 3  Db  10.20.13.0/24                      172.16.5.1                    20
 4  Db  10.20.13.0/24                      172.16.253.1                  200 
Right after active BGP route disappears, static default became active. If I disable static route, correct BGP route is chosen amogst active ones. However, if I reenable static route after, it doesn't became active
 
User avatar
Kamikadze
just joined
Posts: 1
Joined: Mon Nov 01, 2021 2:31 pm
Location: Sofia

Re: BGP + static candidate routes: ROS picks the wrong one??

Tue Feb 22, 2022 9:50 pm

Hello,

hAP ac^2 with Router OS - 6.49.3(Stable)

I have the same issue:

Two BGP sessions, two default routes(distance 20) via BGP and one static route with higher distance 250:

#  DST-ADDRESS        PREF-SRC        GATEWAY            DISTANCE
0 ADS 0.0.0.0/0 10.11.10.1 201
1 ADb 0.0.0.0/0 MT_CNR_Internet1 20
2 S ;;; FAKE Default Route DO NOT DELETE IT
0.0.0.0/0 ether1 250
3 Db 0.0.0.0/0 MT_DRC_Internet1 20 

When my first BGP session is already down static default became active although it has a higher distance. If I disable the static route, the remaining second BGP route became active (as one would expect), and when I re-enable the static route, the second BGP default route is still active route.
If I disable the static route I have active default route via second BGP session. When I activate the static route again I have an active default route via second BGP session:

/ip route> disable 2
/ip route> enable 2
/ip route> print
#DST-ADDRESS        PREF-SRC        GATEWAY            DISTANCE
0 ADS 0.0.0.0/0 10.11.10.1 201
1 ADb 0.0.0.0/0 MT_DRC_Internet1 20
2 S ;;; FAKE Default Route DO NOT DELETE IT 
 
hugleo
newbie
Posts: 33
Joined: Wed Aug 06, 2008 8:20 am

Re: BGP + static candidate routes: ROS picks the wrong one??

Mon Jan 16, 2023 2:16 am

Same problem Hex S RB760iGS. 6.48.6
Should be solved for V7?
 
pe1chl
Forum Guru
Forum Guru
Posts: 10195
Joined: Mon Jun 08, 2015 12:09 pm

Re: BGP + static candidate routes: ROS picks the wrong one??

Mon Jan 16, 2023 11:56 am

Little did we know when writing the above that at the time v7 really had arrived, we would be longing back for the v6 days when there were so few routing bugs!
 
User avatar
mrz
MikroTik Support
MikroTik Support
Posts: 7042
Joined: Wed Feb 07, 2007 12:45 pm
Location: Latvia
Contact:

Re: BGP + static candidate routes: ROS picks the wrong one??

Mon Jan 16, 2023 12:23 pm

v7 do not have this problem.

Who is online

Users browsing this forum: No registered users and 20 guests