CCR queue tree processor usage 100%

Hi guys,

We have a CCR 1036 acting as one of our main ACs. It currently terminates around 850 PPPoE connections.

We use radius to authenticate the connection
We give each connection an address list
We mark each connection in the fire wall
We mark each packet in the firewall
We then use pcq queue trees to allocate bandwidth.

We have many packages for home and business alike.

Until recently everything was going fine, however the CCR has begun to max out on CPU usage during the evening.

The traffic load will hit around 400Mbps and the cpu will hit 100%.

Then the router starts to drop all the PPPoE connections and we get a lot of support calls.

Has anyone any ideas what could be causing this?

Thanks.


Sent from my iPhone using Tapatalk

Does anyone have any insight to this?

Thanks.


Sent from my iPhone using Tapatalk

its hard to guess the situation without a topology and configuration of ccr, please post them

check following items when the load increases:

with tools > profile the use of cpu to see if queuing is the culprit of high cpu usage

with system > resources > cpu the load distribution across cpu cores

with system > resources the avaliable memory ram

check connection number with ip > firewall > connections to see if there are some increment, filter it to search some trend on established connections to see if there is any difference between normal operation or high cpu usage moment

Make sure you don’t count “IDLE” process as usage :slight_smile: Been getting lots of emails about this lately.

IDLE means FREE (not used)

Hi Guys,

Please see a screen shot of Winbox and a copy of an observium graph over the course of a day.

I think we are simply pushing queue trees too hard… I am going to deploy 2 more CCRs this wee to distribute the load on the queue trees.

As I understand it they cannot use multiple cores and therefore have a major limitation. I know of no way around it at this time.

My link to the screen shot did not work in the previous post and I couldn’t find the “Edit Post” button so I am posting it again here.

Thanks

https://onedrive.live.com/redir?resid=CBA562CDD838050C!151154&authkey=!AOVGvrEHbfTv9vI&v=3&ithint=photo%2cjpg

One queue structure is limited to one CPU core, you have 2 queue structures (main parent queues in global) so from all cores your queues can use only 2. As soon as those two become a bottleneck, traffic is delayed and all other cores are locked (fully busy) waiting on traffic.

Bottom line, your queue implementation is far from optimal for your hardware. on x86 where indicidual core is powerfull this would work with no problems, but in CCR you need to adjust.

suggestions:

  1. move away from parent=global, to parent=, it should allow you to have more parnent level queues == more used cores, less likely bottleneck.

if that doesn’t solve the problem

  1. try to consider changing queueing strategy - for CCR best setup is few thousands simple queues on the same level, maybe limit per client IP.

With simple queues, how do you distribute bandwidth evenly?

For example.

If you have a 10Mbps connection with 20 customers all on 2Mbps service.

When 5 customers are on, simple queue will work just fine.

When customers 6 - 20 come online and start downloading, simple queues do not share bandwidth evenly between customers? Or am i wrong?

Also, getting a little more complex. With queue trees we have business customers prioritized highly, home customers lower.

Inside the queues, we have 10mbps customers prioritized higher than 2mbps customers.

With simple queues is it possible to prioritize business customers higher than home, and certain packages within the business or customer queues higher than others?

Finally.

With Queue Trees we see a clear structure, each queue indented below its parent so taht it is easy to adjust and update as required. When i started testing with simple queues I could not see any way to clarify this. All the queues were just in a pile, which made it a bit tough to discern what was a parent and what was a child.

Have i done something wrong here? Is it possible for simple queues to be laid out like queue trees?

Thanks.

some time ago when ccr was released that was the situation

i think today situation is different, in the screenshot we can see all 36 cores under full load not only a few ones

That’s the point. Each parent queue and all its child queues will only use 1 core.

When any 1 parent queues core maxes out, it doesn’t offload or bring into action a second core, it now causes all cores to lock as they are waiting on the first core to complete its task.

Is anyone from mikrotik able to confirm that this is an acknowledged issue and that a solution to resolve this issue is being put into place?

Thanks guys.

ukzerosniper - You are absolutely correct. This is 100% precise explanation of issue.

We are working on proper fix for situation like this.

At this point please use queues which will not load CPU 100% for a longer period of time.

Hi @Strods,

Thanks for the acknowledgement.

Do you have any idea what sort of timescale the solution is planned to take?

Implementing more queues is not really an ideal solution as it would make it hard to fairly distribute the available bandwith to all the customers.

Thanks.

Do MikroTik have any idea what sort of timescale the solution is planned to take?

We moved to simple ques to quell this problem when the CCR’s first came out.

Our heaviest NAS is a CCR1009 with 559 active session which during peak usage 450mbps we see 27% cpu usage.

We use radius with the Mikrotik-Rate-Limit radius attribute. Customers range from 5mbps to 10mbps with a few commercial accounts up to 20mbps.

I would suspect a CCR1036 with simple ques would easily handle what your are doing.

Same problem, iam move parent from Global to etherX and CPU go down… But, i need control the traffic in more than one ethernet port, some solution?

Good Afertoon,

Any update about this?? i choose Mikrotik because it suppose it’s quality and cheap but with this problem…i start to doubt abt it…again…this problem has solution?? i really need to use queue tree…thank in advance