CCR1036 high cpu usage by queuing and firewall

I have CCR1036 v6.24 all cpu cores utilizes and i move to simple queue to achieve full performance .
Apart from other normal simple queues, i have an address-list with the name “customers” with about 6000 customers . I use mangle to mark their packets and use in simple queues. The problem begins when this specific rule enables . The cpu usage of 10~15 cores goes more than 95% and overall performance of router degrades .
Actually i just want a light simple queue as container for these clients to sum of all queues be correct . This rule is not for shape . I tried queue type pfifo and pcq , both have same cpu usage .
Any suggestion ?

/queue type
add kind=pcq name="Customers Download" pcq-classifier=dst-address pcq-total-limit=1000000
add kind=pcq name="Customers Upload" pcq-classifier=src-address pcq-total-limit=1000000
add kind=pfifo name=default-500 pfifo-limit=500


/queue simple
add limit-at=0/150M max-limit=0/500M name="Customers down" packet-marks=Customers_download parent=Total_Download priority=8/4 queue=default/default-500 target=""
add limit-at=0/80M max-limit=0/150M name="Customers up" packet-marks=Customers_upload parent=Total_Upload priority=8/4 queue="default/Customers Upload" target=""

	
	
/ip firewall mangle	
add action=mark-packet chain=forward dst-address-list=Customers new-packet-mark=Customers_download packet-mark=no-mark passthrough=no
add action=mark-packet chain=forward new-packet-mark=Customers_upload packet-mark=no-mark passthrough=no src-address-list=Customers

Maybe you should try to mark connection and after that mark the packets.

Look at the wiki for examples.

Yeah, first connection then packet is the way to go
BTW, how much traffic are you pushing?

Total bandwidth is about 600Mbps but the queue is “limit-at=0/150M max-limit=0/500M” .
Marking connection had no benefit in performance so removed to have less rules .