Community discussions

MikroTik App
 
keane
just joined
Topic Author
Posts: 8
Joined: Wed Mar 30, 2022 11:12 am
Location: China Shenzhen

Please ask me about the inaccessibility of BGP routing

Wed Mar 30, 2022 11:46 am

May I ask you?
I used two RouterOS to establish eBGP neighbor relationship.
RouterOS-02 network has several routes,RouterOS-02 passes BGP routes to RouterOS-01,
Then there is a phenomenon,If I drop the ICMP message sent by RouterOS-01 to RouterOS-02 ether2 interface address in RouterOS-02 firewall,
the BGP route from R2 is unreachable in R1 routing table.

RouterOS-01 the configuration is as follows:
# jan/02/1970 21:12:43 by RouterOS 6.40.8
# software id = WR7Q-3WD0
# model = RB1100x4
set [ find default=yes ] supplicant-identity=MikroTik
/routing bgp instance
set default as=63641
/ip address
add address=10.10.10.1/30 interface=ether2 network=10.10.10.0
/ip route
add check-gateway=ping disabled=yes distance=1 dst-address=192.168.199.0/24 \
    gateway=10.10.10.2
add distance=1 dst-address=192.168.254.0/24 gateway=10.10.10.2
/routing bgp peer
add hold-time=30s keepalive-time=10s name=peer1 remote-address=10.10.10.2 \
    remote-as=63642 ttl=default
/system identity
set name=ROS-01
RouterOS-02 the configuration is as follows:
# jan/02/1970 21:12:19 by RouterOS 6.40.8
# software id = HNE1-R9VD
# model = RouterBOARD 1100x4
/interface bridge
add name=bridge1
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/routing bgp instance
set default as=63642
/ip address
add address=10.10.10.2/30 interface=ether2 network=10.10.10.0
add address=192.168.254.254/24 interface=ether2 network=192.168.254.0
add address=192.168.200.1/24 interface=ether4 network=192.168.200.0
add address=192.168.201.1/24 interface=ether4 network=192.168.201.0
/ip firewall filter
add action=drop chain=input dst-address=10.10.10.2 in-interface=ether1 \
    protocol=icmp
/routing bgp network
add network=192.168.200.0/24 synchronize=no
add network=192.168.201.0/24 synchronize=no
/routing bgp peer
add hold-time=30s keepalive-time=10s name=peer1 remote-address=10.10.10.1 \
    remote-as=63641 ttl=default
/system identity
set name=ROS-02
You do not have the required permissions to view the files attached to this post.
 
eduplant
Member Candidate
Member Candidate
Posts: 139
Joined: Tue Dec 19, 2017 9:45 am

Re: Please ask me about the inaccessibility of BGP routing

Thu Mar 31, 2022 9:47 am

I scratched my head looking at this and decided to just test it in the lab myself. Whether or not I can reach the BGP routes is fully dependent on whether or not I have the static route (to 192.168.199.0/24 via 10.10.10.2) on ROS-01 enabled or not. At some level this makes sense to me because I imagine there is a single next-hop entry in the FIB and when check-gateway=ping decides that the next-hop is unreachable, this same conclusion carries over to every other route that has the same resolved next-hop. I think the behavior is consistent and in the big picture makes sense even if it's a little unintuitive at first glance. I am having a hard time understanding the administrative purpose of caring about the reachability of the same next-hop via ICMP for some routes and via other means (ARP/recursion) for others.

On the other hand ... from your screenshot it looks like the static route is disabled? Do you still have reachability issues when it's disabled? For reference, I tested this with 6.49.2 because it's what I already had in the lab and I noticed yours is 6.40.8. If you still have reachability issues when the static route is disabled, there may be some bug fixed in the intervening four years of releases where disabling a static route with ICMP next-hop checking forgets to go back and reevaluate the next-hop status in the FIB based on ARP/recursion. I'm not gonna skim through all of those patch notes but at least we know that it behaves correctly on later releases :)
 
keane
just joined
Topic Author
Posts: 8
Joined: Wed Mar 30, 2022 11:12 am
Location: China Shenzhen

Re: Please ask me about the inaccessibility of BGP routing

Fri Apr 01, 2022 4:23 am

I scratched my head looking at this and decided to just test it in the lab myself. Whether or not I can reach the BGP routes is fully dependent on whether or not I have the static route (to 192.168.199.0/24 via 10.10.10.2) on ROS-01 enabled or not. At some level this makes sense to me because I imagine there is a single next-hop entry in the FIB and when check-gateway=ping decides that the next-hop is unreachable, this same conclusion carries over to every other route that has the same resolved next-hop. I think the behavior is consistent and in the big picture makes sense even if it's a little unintuitive at first glance. I am having a hard time understanding the administrative purpose of caring about the reachability of the same next-hop via ICMP for some routes and via other means (ARP/recursion) for others.

On the other hand ... from your screenshot it looks like the static route is disabled? Do you still have reachability issues when it's disabled? For reference, I tested this with 6.49.2 because it's what I already had in the lab and I noticed yours is 6.40.8. If you still have reachability issues when the static route is disabled, there may be some bug fixed in the intervening four years of releases where disabling a static route with ICMP next-hop checking forgets to go back and reevaluate the next-hop status in the FIB based on ARP/recursion. I'm not gonna skim through all of those patch notes but at least we know that it behaves correctly on later releases :)
Hi,Thank you for your experiment.
I used 6.40.x and 6.45.8 to have this phenomenon, and there is no phenomenon when I use 7.1.5 for testing, so I think there should be a bug in the old version.
When I tested these two ROS versions, the fault phenomenon was as follows:
ROS-01 added a static route, the next hop of this static route was the ip address of the Ether1 interface of ROS-02, and Check-gateway was set =ping, when the ip address of the Ether1 interface of ROS-02 discards icmp packets, the static route added by ROS-01 will be unreachable. (This must be a normal and reasonable phenomenon), but this static route appears When the unreachable phenomenon occurs, the BGP route received by ROS-01 from ROS-02 is also unreachable. At this time, when the static route Check-gateway of ROS-01 is disabled, the static route and BGP return to normal. I don’t know.

Can you understand what I mean, but also thank you very much for helping me verify this phenomenon
 
eduplant
Member Candidate
Member Candidate
Posts: 139
Joined: Tue Dec 19, 2017 9:45 am

Re: Please ask me about the inaccessibility of BGP routing

Fri Apr 01, 2022 8:20 am

When the unreachable phenomenon occurs, the BGP route received by ROS-01 from ROS-02 is also unreachable. At this time, when the static route Check-gateway of ROS-01 is disabled, the static route and BGP return to normal.

It sounds like you're seeing what I'm seeing, then. I think the behavior might seem a little strange but I don't suspect it's a bug. I know a lot of things are different about the routing/packet handling subsystems of v7.x.x, so I'll test this next and see if I can confirm.

I'll make an attempt at explaining why I think this behavior is reasonable and the details mostly come down to how packet processing is implemented in most routers: the RIB, the FIB, and the adjacency table. The RIB is mostly [1] what's represented by :ip route print in that it contains all routes that the router knows about from all sources, used or unused. Despite how it looks, the RIB is not directly used to make packet forwarding decisions. Instead, the RIB is cooked down through a variety of rules (active or inactive routes, best route selection, reachability of next-hops, recursive next-hops, policy routing, etc.) into a simplified set of tables: the FIB (containing only the active set of longest-prefix match information and a corresponding adjacency reference for each) and the adjacency table (basically physical ports and metadata about how to modify the packet). This is certainly an oversimplification and indeed some of these details are opaque or I am ignorant to re: RouterOS specifically. For example, the documentation says RouterOS does go further still by caching FIB lookups for identical flows. Mikrotik does have some level of information about this here [2].

What I think is happening here from looking at the behavior is that the check-gateway=ping feature directly operates on the adjacency entry. Even though you might have any number of routes that all point to a single adjacency (in your case 10.10.10.2), the structure of the FIB does not know or care that these routes came from BGP or static routes or whatever, only what the longest-prefix match information is and a single reference to the common adjacency. You may hypothetically have hundreds of thousands of routes that all have the same next hop, and rather than make an identical and wasteful decision on each and every one, the FIB only contains many entries all referencing one adjacency. When the check-gateway=ping function fails a ping test, it seems like it comes along in the adjacency table and either purges or suppresses that entire adjacency.

I suspect this was simply an implementation choice. One option is to assume that when the operator wants to alter routing policy and invalidate a next-hop for being unreachable by ICMP echo that they intend for this to be a property of the next-hop adjacency and not a property of the particular route that you configure the check-gateway=ping clause on. The other option is to assume the opposite: that the operator intends for this to only be a property of the route. The second option seems to have dubious practical benefit. If the operator decides that 10.10.10.2 is unreachable, why would they want any other routes that rely on it to be installed in the FIB? Furthermore, you would need to separate all of the routes with check-gateway=ping and those without into two piles when constructing the FIB and the adjacency table. And for what purpose?

If this is in fact what's happening, this seems to be the more reasonable of the two choices. For example ... do you have a specific use case for why you want those BGP routes to be reachable even though you are creating the circumstances to invalidate that static route? It's not obvious to me. Maybe what you're trying to accomplish can be accomplished some other way.



[1] The :ip route print detail form of the command also is the only glimpse I know of for what actually happened in the FIB in v6.x.x, so it's not entirely pure. It's mostly just RIB information, though.
[2] https://wiki.mikrotik.com/wiki/Manual:IP/Route
 
keane
just joined
Topic Author
Posts: 8
Joined: Wed Mar 30, 2022 11:12 am
Location: China Shenzhen

Re: Please ask me about the inaccessibility of BGP routing

Fri Apr 01, 2022 1:27 pm

When the unreachable phenomenon occurs, the BGP route received by ROS-01 from ROS-02 is also unreachable. At this time, when the static route Check-gateway of ROS-01 is disabled, the static route and BGP return to normal.

It sounds like you're seeing what I'm seeing, then. I think the behavior might seem a little strange but I don't suspect it's a bug. I know a lot of things are different about the routing/packet handling subsystems of v7.x.x, so I'll test this next and see if I can confirm.

I'll make an attempt at explaining why I think this behavior is reasonable and the details mostly come down to how packet processing is implemented in most routers: the RIB, the FIB, and the adjacency table. The RIB is mostly [1] what's represented by :ip route print in that it contains all routes that the router knows about from all sources, used or unused. Despite how it looks, the RIB is not directly used to make packet forwarding decisions. Instead, the RIB is cooked down through a variety of rules (active or inactive routes, best route selection, reachability of next-hops, recursive next-hops, policy routing, etc.) into a simplified set of tables: the FIB (containing only the active set of longest-prefix match information and a corresponding adjacency reference for each) and the adjacency table (basically physical ports and metadata about how to modify the packet). This is certainly an oversimplification and indeed some of these details are opaque or I am ignorant to re: RouterOS specifically. For example, the documentation says RouterOS does go further still by caching FIB lookups for identical flows. Mikrotik does have some level of information about this here [2].

What I think is happening here from looking at the behavior is that the check-gateway=ping feature directly operates on the adjacency entry. Even though you might have any number of routes that all point to a single adjacency (in your case 10.10.10.2), the structure of the FIB does not know or care that these routes came from BGP or static routes or whatever, only what the longest-prefix match information is and a single reference to the common adjacency. You may hypothetically have hundreds of thousands of routes that all have the same next hop, and rather than make an identical and wasteful decision on each and every one, the FIB only contains many entries all referencing one adjacency. When the check-gateway=ping function fails a ping test, it seems like it comes along in the adjacency table and either purges or suppresses that entire adjacency.

I suspect this was simply an implementation choice. One option is to assume that when the operator wants to alter routing policy and invalidate a next-hop for being unreachable by ICMP echo that they intend for this to be a property of the next-hop adjacency and not a property of the particular route that you configure the check-gateway=ping clause on. The other option is to assume the opposite: that the operator intends for this to only be a property of the route. The second option seems to have dubious practical benefit. If the operator decides that 10.10.10.2 is unreachable, why would they want any other routes that rely on it to be installed in the FIB? Furthermore, you would need to separate all of the routes with check-gateway=ping and those without into two piles when constructing the FIB and the adjacency table. And for what purpose?

If this is in fact what's happening, this seems to be the more reasonable of the two choices. For example ... do you have a specific use case for why you want those BGP routes to be reachable even though you are creating the circumstances to invalidate that static route? It's not obvious to me. Maybe what you're trying to accomplish can be accomplished some other way.



[1] The :ip route print detail form of the command also is the only glimpse I know of for what actually happened in the FIB in v6.x.x, so it's not entirely pure. It's mostly just RIB information, though.
[2] https://wiki.mikrotik.com/wiki/Manual:IP/Route
Let me explain my network environment to you:
I use RouterOS to connect with the operator and establish an eBGP neighbor through this connection.
I receive the BGP routes sent by the operator. At the same time, in order to prevent the loss of some routes,
I added a The default route is set to Chenkgateway=ping, and recently the ip address of the interface on the operator's side often loses packets.
Next, the phenomenon we are discussing occurs, that is, my default route becomes In the case of unreachability, the BGP routes in the routing table are also unreachable, so I noticed that Checkgateway will have this phenomenon.
Of course, this phenomenon may not be a bug, it may be a feature of Routeros. :lol:
 
eduplant
Member Candidate
Member Candidate
Posts: 139
Joined: Tue Dec 19, 2017 9:45 am

Re: Please ask me about the inaccessibility of BGP routing

Fri Apr 01, 2022 8:08 pm

Let me explain my network environment to you:
I use RouterOS to connect with the operator and establish an eBGP neighbor through this connection.
I receive the BGP routes sent by the operator.

Do you only have this one link or do you have other links to the same or a different provider? I’m trying to figure why you need BGP here if a default suffices.

At the same time, in order to prevent the loss of some routes,
I added a The default route is set to Chenkgateway=ping, and recently the ip address of the interface on the operator's side often loses packets.

Why are you losing routes in this circumstance? Is the BGP peer being torn down? If this is your only link then it seems like a static default route without a check-gateway=ping clause is your best choice. You have no choice but to use the link even if it’s unacceptably lossy.

If you have other links and other eBGP sessions, then wouldn’t you rather invalidate this whole path to 10.10.10.2 and use the other link when this link has problems?
 
keane
just joined
Topic Author
Posts: 8
Joined: Wed Mar 30, 2022 11:12 am
Location: China Shenzhen

Re: Please ask me about the inaccessibility of BGP routing

Sat Apr 02, 2022 9:33 am

Let me explain my network environment to you:
I use RouterOS to connect with the operator and establish an eBGP neighbor through this connection.
I receive the BGP routes sent by the operator.

Do you only have this one link or do you have other links to the same or a different provider? I’m trying to figure why you need BGP here if a default suffices.

At the same time, in order to prevent the loss of some routes,
I added a The default route is set to Chenkgateway=ping, and recently the ip address of the interface on the operator's side often loses packets.

Why are you losing routes in this circumstance? Is the BGP peer being torn down? If this is your only link then it seems like a static default route without a check-gateway=ping clause is your best choice. You have no choice but to use the link even if it’s unacceptably lossy.

If you have other links and other eBGP sessions, then wouldn’t you rather invalidate this whole path to 10.10.10.2 and use the other link when this link has problems?

Hi,Let me explain to you: my ROS has only one BGP connection, and recently it is due to the line interconnected with the operator, the operator's ip address will lose packets, but the actual line and ping to other ip addresses of the operator are normal Yes, the operator explained that they set a security policy on the IP address, which caused packet loss, but it did not actually affect the use of BGP lines.
 
eduplant
Member Candidate
Member Candidate
Posts: 139
Joined: Tue Dec 19, 2017 9:45 am

Re: Please ask me about the inaccessibility of BGP routing

Sat Apr 02, 2022 9:43 am

Hi,Let me explain to you: my ROS has only one BGP connection, and recently it is due to the line interconnected with the operator, the operator's ip address will lose packets, but the actual line and ping to other ip addresses of the operator are normal Yes, the operator explained that they set a security policy on the IP address, which caused packet loss, but it did not actually affect the use of BGP lines.

That makes sense. ICMP is usually the lowest priority for routing hardware to deal with and most operators implement control-plane policing configurations to protect their router CPUs. The result of this is that ICMP through the box to somewhere else is fine, but ICMP to the router itself may suffer under any number of conditions. This is why check-gateway=ping can be dicey and you should really only use it if you have to.

Still, with only one uplink ... why bother with BGP or check-gateway=ping at all? It seems like this whole circumstance goes away if you just have a static default to the provider next-hop.
 
keane
just joined
Topic Author
Posts: 8
Joined: Wed Mar 30, 2022 11:12 am
Location: China Shenzhen

Re: Please ask me about the inaccessibility of BGP routing

Sat Apr 02, 2022 11:05 am

Hi,Let me explain to you: my ROS has only one BGP connection, and recently it is due to the line interconnected with the operator, the operator's ip address will lose packets, but the actual line and ping to other ip addresses of the operator are normal Yes, the operator explained that they set a security policy on the IP address, which caused packet loss, but it did not actually affect the use of BGP lines.

That makes sense. ICMP is usually the lowest priority for routing hardware to deal with and most operators implement control-plane policing configurations to protect their router CPUs. The result of this is that ICMP through the box to somewhere else is fine, but ICMP to the router itself may suffer under any number of conditions. This is why check-gateway=ping can be dicey and you should really only use it if you have to.

Still, with only one uplink ... why bother with BGP or check-gateway=ping at all? It seems like this whole circumstance goes away if you just have a static default to the provider next-hop.
hi, because I need to broadcast my own ip address to the operator through BGP, and at the same time I need to pass the BGP route to others, so I do this
 
eduplant
Member Candidate
Member Candidate
Posts: 139
Joined: Tue Dec 19, 2017 9:45 am

Re: Please ask me about the inaccessibility of BGP routing

Sat Apr 02, 2022 11:15 am

hi, because I need to broadcast my own ip address to the operator through BGP, and at the same time I need to pass the BGP route to others, so I do this

Ah I see. So what again is the purpose of check-gateway=ping in your network? If you aren’t invalidating the next-hop via an ICMP check than it won’t invalidate your BGP routes.
 
keane
just joined
Topic Author
Posts: 8
Joined: Wed Mar 30, 2022 11:12 am
Location: China Shenzhen

Re: Please ask me about the inaccessibility of BGP routing

Sat Apr 02, 2022 12:47 pm

hi, because I need to broadcast my own ip address to the operator through BGP, and at the same time I need to pass the BGP route to others, so I do this

Ah I see. So what again is the purpose of check-gateway=ping in your network? If you aren’t invalidating the next-hop via an ICMP check than it won’t invalidate your BGP routes.
hi, check-gateway is a habit to set, and now it is also disabled,But I still think it might be a bug in routeros :lol:

Who is online

Users browsing this forum: No registered users and 16 guests