We will think of some solution not to make whole chain invalid after validator goes down.
Part of my last reply included a suggestion for the changed behavior and I’ll try to reproduce it in case its helpful.
I was curious where the determinations for “valid”, “unknown”, and “invalid” come from. According to the RFC for RTR (6480), the only thing sent to routers is a prefix whitelist of valid prefixes with associated valid lengths; the other two types are derived locally by the router. Importantly, “unknown” doesn’t so much mean “I checked with an RPKI validator and there is no information”. It is simply the third case where nothing is known about the RPKI status of a prefix. To me, it follows that a reasonable implementation would be to
give every BGP route an RPKI status of unknown until and unless acted upon by an RPKI validator group.
With my suggestion
rpki-verify as a rule action would be removed and only RPKI matchers would remain. However, right now it seems to have two benefits and I think those can still be preserved.
Benefit 1 is that you don’t have to enable RPKI filtering globally on every set of prefixes you learn from every peer, just the ones you learn that pass through a specific filter. This can still be expressed for free through policy. Since every BGP route has an RPKI status, simply use the RPKI information in policy from peers where you care and don’t check it from peers where you don’t.
Benefit 2 is that technically you could have a different pool of RPKI validators for different filters, and therefore check peers against different validity. To keep this behavior, how about moving the RPKI group configuration from
/routing/filter/rule rule=“rpki-verify <rpki_group>“ to an optional
/routing/bgp/(template|connection)/input.rpki_group=<rpki_group>? By default,
input.rpki_group=default.
/routing/bgp/rpki would gain
/routing/bgp/rpki/group and
/routing/bgp/rpki/validator where the
default group would exist with no associated validators. Basically it would be analogous to how
/interface/list and
/interface/list/members and several other mechanisms are implemented.
Since ROSv7 has a proper, separate RIB-IN, it’s already known which BGP peer populated which BGP route in
/routing/route. When new information is received via RTR for each RPKI validator group, the process can go through and only modify the RPKI status for routes learned from peers that are configured for that group. The end result of this would be that when one or more RPKI validator groups become unreachable, the associated BGP routes revert to unknown [1]. New routes learned while the validator groups are unavailable get the default status of unknown as well.
I think many operators will want to still accept unknown routes and either localpref them lower than valid routes or leave them be. Their prefixes and connectivity will survive until validation returns. Either way, RPKI policy actually remains coherent and the operator remains in control of how to handle the failure.
[1] I’d suggest adding a cache timer and not having this happen immediately. Juniper seems to have thought it was a good idea and implemented a local cache time for preserving the RPKI status of routes for a period of time after an RTR session fails:
https://www.juniper.net/documentation/u ... ation.html. I couldn’t find an equivalent for Cisco.