Community discussions

MikroTik App
 
User avatar
JanZorz
newbie
Topic Author
Posts: 37
Joined: Fri Jan 07, 2011 1:42 pm

RPKI and real life failure scenario

Fri Dec 17, 2021 1:17 pm

Hi,

I've been testing RPKI on ROS 7.1 and it works fine. I followed the suggestions in documentation and it does what it is supposed to do.

The thing that is bothering me is that if you activate rpki-validation in filter rule - that particular filter becomes invalid if RPKI validator is not accessible and therefore none of received routes from adjacent BGP speaker are installed as active routes. I can see the routes in routing table, but they are marked with "I", means inactive. If I look at /routing/route/print I can see routes being marked with "Fb" which means Filtered. When I switch back on RPKI validator - the routes becomes active after some time as they are validated and installed.
In case of power outage - router will come back, establish the BGP session - but none of received routes will be active before they can be verified. If RPKI validator was down for some time then all ROA objects would be stalled and some validators would refuse to load them. Since there is no validator, there are no routes to the outer world, validator can't refresh TAs (and ROAs) and we have a catch22.

Is there a way to tell a filter "do validation if RPKI validator is reachable", otherwise install all the received routes and do the validation (and apply valid/invalid/unknown rules) later, when validator is reachable and RTR session established properly?

Cheers, Jan
 
User avatar
mrz
MikroTik Support
MikroTik Support
Posts: 7042
Joined: Wed Feb 07, 2007 12:45 pm
Location: Latvia
Contact:

Re: RPKI and real life failure scenario

Fri Dec 17, 2021 1:52 pm

To me it looks like not a good network design if you need BGP routes to connect to the validator to verify the same BGP routes.
 
User avatar
JanZorz
newbie
Topic Author
Posts: 37
Joined: Fri Jan 07, 2011 1:42 pm

Re: RPKI and real life failure scenario

Fri Dec 17, 2021 3:12 pm

To me it looks like not a good network design if you need BGP routes to connect to the validator to verify the same BGP routes.
I can connect to validator directly, no worries about that. Problem is if the whole network is cut away from the world because routes received over BGP are not active because validator is not in proper state - and validator can be in weird state because it can't connect to the Internet to fetch the latest ROAs... but guess what? No access to the Internet because routes received from upstreams are not active...

Cheers, Jan
 
User avatar
JanZorz
newbie
Topic Author
Posts: 37
Joined: Fri Jan 07, 2011 1:42 pm

Re: RPKI and real life failure scenario

Fri Dec 17, 2021 3:16 pm

I tested with routinator that was oflline for 12 hours and with setting stale="warn" or stale="accept" it will start, try to connect to the Internet to fetch the new ROAs and when it fails - it'll just load stale objects from the disk and use those. At that moment router is able to connect to validator and slowly makes routes active and then validator can refresh the ROAs database.

Without adding a "stale" directive to validator - we can easily enter the catch22 loop.

Cheers, Jan
 
User avatar
mrz
MikroTik Support
MikroTik Support
Posts: 7042
Joined: Wed Feb 07, 2007 12:45 pm
Location: Latvia
Contact:

Re: RPKI and real life failure scenario

Fri Dec 17, 2021 3:58 pm

It sounds the same, validator cannot get data because you are relying on validator to use BGP routes that should be validated to get data. It really is a misconfiguration.
And isn't the validator returning "unknown" if there is no record for the route? So you can accept "unknown" prefixes.
 
User avatar
JanZorz
newbie
Topic Author
Posts: 37
Joined: Fri Jan 07, 2011 1:42 pm

Re: RPKI and real life failure scenario

Fri Dec 17, 2021 5:53 pm

It sounds the same, validator cannot get data because you are relying on validator to use BGP routes that should be validated to get data. It really is a misconfiguration.
And isn't the validator returning "unknown" if there is no record for the route? So you can accept "unknown" prefixes.
Let me test what routinator does without "stale=accept" directive and offline overnight. Will it load anything? Will it load stale data from the disk? Will router make routes active after I start routinator with stale data? I'll tell you tomorrow.

Cheers, Jan
 
eduplant
Member Candidate
Member Candidate
Posts: 139
Joined: Tue Dec 19, 2017 9:45 am

Re: RPKI and real life failure scenario

Mon Dec 20, 2021 10:53 am

Looks like we lost the last two posts here but one of my bigger takeaways was to confirm my understanding of what Jan observed.

Is it really the case from what you observed that routes that 1) survive the inbound filter based on RPKI status + operator policy, 2) survive route selection and get installed into the table, 3) at some later time the RPKI validator group becomes unreachable … these routes get marked invalid?

I follow what mrz was saying and agree that your network design should never allow BGP policy to break reachability from your ASBRs to your RPKI validators. Nonetheless, I’m finding it a little hard to believe that the best course of action regardless of how you lose connectivity to the RPKI validators is to go through the BGP table and invalidate routes that you’ve already accepted. Maybe this isn’t intended behavior or maybe I misunderstood your test results.
 
User avatar
JanZorz
newbie
Topic Author
Posts: 37
Joined: Fri Jan 07, 2011 1:42 pm

Re: RPKI and real life failure scenario

Mon Dec 20, 2021 3:13 pm

Hi,

Yes, we lost my report from a bit more extensive test, so let me try to write that again.

I left my validator VM offline for 12 hours and removed Internet access from it and when I started it again - it tried to refresh ROAs from the Internet repositories, failed and then loaded whatever was in cache on the disk and started providing RPKI validation service to my test router - so that is good result.

RPKI validator is not somewhere remote - it's on the same subnet as test router. You can loose connectivity to validator in many ways - switch dies or VM where validator is running dies or you have power outage and validator VM doesn't want to boot anymore... many ways how to loose access to validator from your router.

Biggest problem that I see here is that if you add anywhere in your route filter "rpki-verify validator" things become interesting, specially if you loose connectivity to your validator (for whichever reason). Here are my filter rules when validator is accessible:
[jan@MK-TEST-lju] /routing/filter/rule> print
Flags: X - disabled, I - inactive
0 chain=bgp_in rule="if (dst==2607:fae0:a000::/36) {accept}"
1 chain=bgp_in rule="if (dst==2607:fae0:2000::/36) {accept}"
2 chain=bgp_in rule="rpki-verify validator"
3 chain=bgp_in rule="if (rpki invalid) { reject } else { accept }"
4 chain=bgp_in rule="if ( protocol bgp ) { accept }"

If I shut down validator - after 7200 seconds (default expiry interfal for RPKI validator in ROS 7.1) I get this:

[jan@MK-TEST-lju] /routing/filter/rule> print
Flags: X - disabled, I - inactive
0 chain=bgp_in rule="if (dst==2607:fae0:a000::/36) {accept}"
1 chain=bgp_in rule="if (dst==2607:fae0:2000::/36) {accept}"
2 I chain=bgp_in rule="rpki-verify validator"
3 chain=bgp_in rule="if (rpki invalid) { reject } else { accept }"
4 chain=bgp_in rule="if ( protocol bgp ) { accept }"

Note the "I" in the filter chain. That little "I" renders the whole filter chain invalid. Completely. Even 2607:fae0:a000::/36 and 2607:fae0:2000::/36 prefixes that I set to unconditionally accept before RPKI clause are now inactive.

[jan@MK-TEST-lju] /ipv6/route> print detail where dst-address=2607:fae0:a000::/36
Flags: D - dynamic; X - disabled, I - inactive, A - active;
c - connect, s - static, r - rip, b - bgp, o - ospf, d - dhcp, v - vpn, m - modem, y - copy;
H - hw-offloaded; + - ecmp
DIb dst-address=2607:fae0:a000::/36 routing-table=main gateway=2607:fae0:a000:14::2
immediate-gw=2607:fae0:a000:14::2%vlan500-to-mk-test distance=200 scope=40 target-scope=30
bgp-local-pref=100 bgp-atomic-aggregate=yes bgp-origin=igp

Is it wise to invalidate the whole filter chain just because RPKI validator is not accessible?

This actually means that if your validator VM goes down for whichever reason - you loose all routes received from your BGP peers that are using any of filters that are using RPKI validation as incoming - that means usually cutting yourself from the global internet, if your policy is to RPKI verify all received routes.

My suggestion would be to change the filter behaviour to "if RPKI validator is not reachable just ignore the rpki-verify clause and still process the rest of the filter chain normally."

Cheers, Jan
 
User avatar
mrz
MikroTik Support
MikroTik Support
Posts: 7042
Joined: Wed Feb 07, 2007 12:45 pm
Location: Latvia
Contact:

Re: RPKI and real life failure scenario

Mon Dec 20, 2021 3:25 pm

We will think of some solution not to make whole chain invalid after validator goes down.
 
User avatar
JanZorz
newbie
Topic Author
Posts: 37
Joined: Fri Jan 07, 2011 1:42 pm

Re: RPKI and real life failure scenario

Mon Dec 20, 2021 5:02 pm

We will think of some solution not to make whole chain invalid after validator goes down.
excellent, thnx!!!

While thinking about this - I would expand a little bit the behaviour description:
"if RPKI validator is not reachable just ignore the rpki-verify clause and still process the rest of the filter chain normally, but when RPKI validator comes back - re-validate all the installed routes and act accordingly."

Cheers, Jan
 
eduplant
Member Candidate
Member Candidate
Posts: 139
Joined: Tue Dec 19, 2017 9:45 am

Re: RPKI and real life failure scenario

Mon Dec 20, 2021 7:30 pm

We will think of some solution not to make whole chain invalid after validator goes down.
Part of my last reply included a suggestion for the changed behavior and I’ll try to reproduce it in case its helpful.

I was curious where the determinations for “valid”, “unknown”, and “invalid” come from. According to the RFC for RTR (6480), the only thing sent to routers is a prefix whitelist of valid prefixes with associated valid lengths; the other two types are derived locally by the router. Importantly, “unknown” doesn’t so much mean “I checked with an RPKI validator and there is no information”. It is simply the third case where nothing is known about the RPKI status of a prefix. To me, it follows that a reasonable implementation would be to give every BGP route an RPKI status of unknown until and unless acted upon by an RPKI validator group.

With my suggestion rpki-verify as a rule action would be removed and only RPKI matchers would remain. However, right now it seems to have two benefits and I think those can still be preserved.

Benefit 1 is that you don’t have to enable RPKI filtering globally on every set of prefixes you learn from every peer, just the ones you learn that pass through a specific filter. This can still be expressed for free through policy. Since every BGP route has an RPKI status, simply use the RPKI information in policy from peers where you care and don’t check it from peers where you don’t.

Benefit 2 is that technically you could have a different pool of RPKI validators for different filters, and therefore check peers against different validity. To keep this behavior, how about moving the RPKI group configuration from /routing/filter/rule rule=“rpki-verify <rpki_group>“ to an optional /routing/bgp/(template|connection)/input.rpki_group=<rpki_group>? By default, input.rpki_group=default. /routing/bgp/rpki would gain /routing/bgp/rpki/group and /routing/bgp/rpki/validator where the default group would exist with no associated validators. Basically it would be analogous to how /interface/list and /interface/list/members and several other mechanisms are implemented.

Since ROSv7 has a proper, separate RIB-IN, it’s already known which BGP peer populated which BGP route in /routing/route. When new information is received via RTR for each RPKI validator group, the process can go through and only modify the RPKI status for routes learned from peers that are configured for that group. The end result of this would be that when one or more RPKI validator groups become unreachable, the associated BGP routes revert to unknown [1]. New routes learned while the validator groups are unavailable get the default status of unknown as well.

I think many operators will want to still accept unknown routes and either localpref them lower than valid routes or leave them be. Their prefixes and connectivity will survive until validation returns. Either way, RPKI policy actually remains coherent and the operator remains in control of how to handle the failure.

[1] I’d suggest adding a cache timer and not having this happen immediately. Juniper seems to have thought it was a good idea and implemented a local cache time for preserving the RPKI status of routes for a period of time after an RTR session fails: https://www.juniper.net/documentation/u ... ation.html. I couldn’t find an equivalent for Cisco.
 
User avatar
mrz
MikroTik Support
MikroTik Support
Posts: 7042
Joined: Wed Feb 07, 2007 12:45 pm
Location: Latvia
Contact:

Re: RPKI and real life failure scenario

Mon Dec 20, 2021 8:18 pm

FYI actually all three status types are determined locally on the RTR client. Validator just sends the list of prefixes and originator AS.
 
eduplant
Member Candidate
Member Candidate
Posts: 139
Joined: Tue Dec 19, 2017 9:45 am

Re: RPKI and real life failure scenario

Mon Dec 20, 2021 9:05 pm

FYI actually all three status types are determined locally on the RTR client. Validator just sends the list of prefixes and originator AS.
Yeah, you’re technically right. I was mostly focusing on how “unknown” is determined. I can’t find anywhere that requires that “I can’t talk to an RPKI validator” and “I did talk to an RPKI validator and it had no information about this prefix” must be locally implemented differently. The rest of the proposal follows from that.
 
User avatar
JanZorz
newbie
Topic Author
Posts: 37
Joined: Fri Jan 07, 2011 1:42 pm

Re: RPKI and real life failure scenario

Thu Jan 06, 2022 3:33 pm

Is it really the case from what you observed that routes that 1) survive the inbound filter based on RPKI status + operator policy, 2) survive route selection and get installed into the table, 3) at some later time the RPKI validator group becomes unreachable … these routes get marked invalid?
That's exactly the case... This is nowhere near operations ready.

Cheers, Jan

Who is online

Users browsing this forum: No registered users and 14 guests