I can totally confirm the observation of the OP. We have nearly 100 filter rules. Not counting jumps, there are roughly 22 "ebgp-in" and 5 "ebgp-out" for each peer (some shared). See below.
Changing filters attached to only one peer can cause 10 minutes of 100% CPU load on one core.
The setup is relatively normal I'd say:
Transit A Transit B Peering P
| | |
+-------+ | +------+
| | |
+------+---+----+------+
| CCR2216-1G-12XS-2XQ |
+------+---+----+------+
| | |
+--------+ + +------+
| | |
iBGP I Customer C1 Customer C2
So we get full tables from transits A and B, apply bogon filtering (which sadly requires one rule per bogon network, since filtering via address lists doesn't match longer prefixes), community filtering and setting and some local preference calculation. Basically, the ebgp-in chain for transit looks like this (each line is one item in /routing/filter/rule, with some jumps omitted):
delete bgp-communities ^64496:;
set bgp-local-pref 170;
if (dst in 0.0.0.0/8 && dst-len in 8-32) { reject; }
if (dst in 10.0.0.0/8 && dst-len in 8-32) { reject; }
if (dst in 100.64.0.0/10 && dst-len in 10-32) { reject; }
if (dst in 127.0.0.0/8 && dst-len in 8-32) { reject; }
if (dst in 169.254.0.0/16 && dst-len in 16-32) { reject; }
if (dst in 172.16.0.0/12 && dst-len in 12-32) { reject; }
if (dst in 192.0.0.0/29 && dst-len in 29-32) { reject; }
if (dst in 192.0.2.0/24 && dst-len in 24-32) { reject; }
if (dst in 192.168.0.0/16 && dst-len in 16-32) { reject; }
if (dst in 198.18.0.0/15 && dst-len in 15-32) { reject; }
if (dst in 198.51.100.0/24 && dst-len in 24-32) { reject; }
if (dst in 203.0.113.0/24 && dst-len in 24-32) { reject; }
if (dst in 240.0.0.0/4 && dst-len in 4-32) { reject; }
if (dst in 255.255.255.255 && dst-len == 32) { reject; }
append bgp-communities 64496:120;
if (bgp-local-pref > 0) { set bgp-local-pref -bgp-path-len; }
if (bgp-communities includes graceful-shutdown) { set bgp-local-pref 0; }
if (bgp-communities includes blackhole) { set blackhole yes; }
if (bgp-communities any-list restrict-hw-offload and not bgp-as-path [[:TOP_AS:]]$) { set suppress-hw-offload yes; }
rpki-verify default; if (rpki invalid) { reject } else { accept }
After the routes are accepted into the RIB, they are forwarded to iBGP and customers, filtering by communities, prefixes and source.
if (dst == 192.0.2.0/24) { accept; }
if (dst == 198.51.100.0/24) { accept; }
if (dst == 203.0.113.0/24) { accept; }
if (not dst in 2001:db8::/32 && dst-len in 1-48 && protocol ospf && afi ipv6) { accept; }
if (bgp-communities any-list redistribute-to-customers) { accept; }
I built a lab with CHR and limited the CPU time the VMs may use. /routing/stats/process indicates that each transit session used ~2,5 minutes process time, while the customer session took >3 minutes (probably consecutive). That's a bit surprising, given that the customer session exports many routes and only receives one in my test setup, and exporting (5 filter rules, no best path selection) should be much faster than importing (22 filter rules and best path selection required).
Without filters, process times are around 1 minute (except for one transit, which only took 10s - guess it came in first and no best path selection was necessary)
Any hints how to optimize this setup? Is "if (A || B || C) { accept; }" faster than "if (A) { accept; }","if (B) { accept; }"? It's not clear to me how setting affinity could benefit this setup, since all documentation and also the Youtube video only indicate it helps with few cores (of which the CCR2216 surely has enough). And I'm already using input.accept-nlri on customer sessions.
(Sidenote I noticed while labbing: the dhcp-client times out installing a default route on boot, because the routing stack is already at work parsing BGP)