Only one simple queue of PCQ type with a 500M total limit increases CPU to 100%

I want to do a fair bandwidth control for our ~800 users.

I made a PCQ queue and attached it to a simple queue:

/queue type
add kind=pcq name=parent-default pcq-classifier=src-address,dst-address \
    pcq-dst-address6-mask=64 pcq-limit=40KiB pcq-src-address6-mask=64 \
    pcq-total-limit=20000KiB
/queue simple
add dst=sfp1-vlan2486-internet name=bw-limit target=\
    BRAS1_BOND,servers-bridge total-max-limit=495M total-queue=\
    parent-default

All working normally most of the time, but at peak times at evening there’s CPU maxed out on queueing needs.
If I use SFQ or PFIFO there’s no problems.
But if I remember correctly SFQ have only around 1000 queues and PFIFO is bad for fair bandwidth sharing.

CCR1036-12G-4G running ROS 6.46.3

Well queues are CPU consuming and if i remember right simple queues use only 1 core, it is a single threaded process…
So maybe in your case it would be best if you used queue trees…

I have only one queue on this router. Is queue tree better in this scenario?

use two queues, one for each interface target, as suggested here
https://wiki.mikrotik.com/wiki/Tips_and_Tricks_for_Beginners_and_Experienced_Users_of_RouterOS#Queue_slows_down_router