CCR2216 – L3 HW Offload vs Queue Tree / Traffic Steering

Hi,
We are using a CCR2216 (ROS 7.16.1) as a core router handling ~13 Gbps traffic across 3 downstream segments (We recently transitioned from a CCR1072 to the CCR2216), our requirement is:
Apply basic bandwidth control (Queue Tree – only 3 queues)
Redirect some traffic to a cache server

Issue:
With L3 HW offload enabled: Throughput is fine (~13 Gbps) but Queue Tree has no real effect.
With L3 HW offload disabled: Queue Tree work correctly but
CPU jumps to ~90–100%

So currently we cannot combine between High throughput (HW offload) & Basic QoS

Is there any way to make Queue Tree work while keeping L3 HW offload enabled?

We expected CCR2216 to handle this role, but currently stuck between performance vs features.

Any advice or real-world experience would be appreciated.

Short answer: Queue Trees use CPU. L3 HW offload bypasses CPU. Those two are not compatible.

That said, post your sanitized config here, as well as the use case for using queue trees, since it may be possible to leverage the hardware offloaded QoS on the CCR2216 to achieve similar.