CCR2216 L3HW bgp Best configuration?

Hello, I am testing different L3 HW Offload configurations on a CCR2216 (RouterOS v7) and I would like to confirm with the community which design is the most optimal.

๐Ÿ”น Network scenario

  • Device: CCR2216-1G-12XS-2XQ (RouterOS v7.x)
  • Traffic: ~10 Gbps transit + ~600k BGP routes received (full table from transit + IXs)
  • Interfaces:
    • sfp28-1: transit (BGP with provider, IP local 10.0.0.0/30 VLAN 601)
    • sfp28-2: IX #1 (185.1.90.0/24 VLAN 602)
    • sfp28-3: IX #2 (193.149.1.0/24 VLAN 603)
    • sfp28-4: output to customer servers (own ASN + /24)

๐Ÿ”น Option 1: all uplinks inside the bridge with L3HW

  • All ports (uplinks and clients) are inside bridge SWICH with vlan-filtering=yes and l3-hw-offloading=yes.
  • Each uplink is placed into a dedicated internal VLAN (601, 602, 603) as access, and clients in VLAN 1 (untagged).
  • Results:
    • CPU: ~20% while forwarding 10 Gbps.
    • /routing/route/print count-only where afi=ip and active and hw-offloaded โ†’ ~500k routes.
    • /interface/ethernet/switch/l3hw-settings/monitor:
      ipv4-routes-total: 521361
      ipv4-routes-hw:    197064
      ipv4-routes-cpu:   324296
      nexthop-cap:       8192
      nexthop-usage:     136
      
  • Question: Is it normal that RouterOS shows ~500k โ€œhw-offloadedโ€ but the L3HW monitor only reflects ~197k installed in the ASIC? I understand that very large aggregates (/0โ€“/21) stay in CPU by design, but I want to confirm.

๐Ÿ”น Option 2: uplinks outside the bridge (no L3HW)

  • Only the customer port (sfp28-4) is inside the bridge with l3-hw-offloading=yes.
  • The 3 uplinks are outside the bridge with l3-hw-offloading=no.
  • Results:
    • CPU: ~1% while forwarding 10 Gbps.
    • /routing/route/print count-only where afi=ip and active and hw-offloaded โ†’ ~10k routes.
    • /interface/ethernet/switch/l3hw-settings/monitor:
      ipv4-routes-total: 10
      ipv4-routes-hw:    10
      ipv4-routes-cpu:   ~400k
      
  • Here forwarding is done in software (CPU), not the ASIC.

๐Ÿ”น My questions

  1. Which option is really the most production-optimal for future scaling (40โ€“100 Gbps, millions of PPS)?
  2. Is it correct to assume that Option 1 is the only design that really uses the ASIC, even if it shows ~20% CPU instead of 1%?
  3. Is it expected to see such a difference between the โ€œhw-offloadedโ€ routes in /routing/route vs those in l3hw-settings/monitor (e.g., 500k vs 197k)?
  4. Are there any recommended tweaks to further reduce CPU load in Option 1? (firewall/conntrack, bridge settings, shortest-hw-prefix, etc.)

๐Ÿ”น Extra info (under load)

  • Option 1: /tool profile cpu=all duration=30s โ†’ all cores show ~1โ€“8% usage but there is usually one core close to 90%.
  • Option 2: /tool profile cpu=all duration=30s โ†’ all cores ~0โ€“1%.
  • /interface/bridge/settings/print โ†’ use-ip-firewall=no, use-ip-firewall-for-vlan=no.
  • No NAT or firewall rules on transit traffic.
  • ipv4-shortest-hw-prefix is usually ~22.

๐Ÿ‘‰ Has anyone else with CCR2216 and full tables observed the same? Which design is the recommended practice: all uplinks in the bridge (Option 1) or leaving them outside (Option 2)?

Thanks in advance ๐Ÿ™

2 Likes