We have a CCR 1036 acting as one of our main ACs. It currently terminates around 850 PPPoE connections.
We use radius to authenticate the connection
We give each connection an address list
We mark each connection in the fire wall
We mark each packet in the firewall
We then use pcq queue trees to allocate bandwidth.
We have many packages for home and business alike.
Until recently everything was going fine, however the CCR has begun to max out on CPU usage during the evening.
The traffic load will hit around 400Mbps and the cpu will hit 100%.
Then the router starts to drop all the PPPoE connections and we get a lot of support calls.
its hard to guess the situation without a topology and configuration of ccr, please post them
check following items when the load increases:
with tools > profile the use of cpu to see if queuing is the culprit of high cpu usage
with system > resources > cpu the load distribution across cpu cores
with system > resources the avaliable memory ram
check connection number with ip > firewall > connections to see if there are some increment, filter it to search some trend on established connections to see if there is any difference between normal operation or high cpu usage moment
One queue structure is limited to one CPU core, you have 2 queue structures (main parent queues in global) so from all cores your queues can use only 2. As soon as those two become a bottleneck, traffic is delayed and all other cores are locked (fully busy) waiting on traffic.
Bottom line, your queue implementation is far from optimal for your hardware. on x86 where indicidual core is powerfull this would work with no problems, but in CCR you need to adjust.
suggestions:
move away from parent=global, to parent=, it should allow you to have more parnent level queues == more used cores, less likely bottleneck.
if that doesn’t solve the problem
try to consider changing queueing strategy - for CCR best setup is few thousands simple queues on the same level, maybe limit per client IP.
With simple queues, how do you distribute bandwidth evenly?
For example.
If you have a 10Mbps connection with 20 customers all on 2Mbps service.
When 5 customers are on, simple queue will work just fine.
When customers 6 - 20 come online and start downloading, simple queues do not share bandwidth evenly between customers? Or am i wrong?
Also, getting a little more complex. With queue trees we have business customers prioritized highly, home customers lower.
Inside the queues, we have 10mbps customers prioritized higher than 2mbps customers.
With simple queues is it possible to prioritize business customers higher than home, and certain packages within the business or customer queues higher than others?
Finally.
With Queue Trees we see a clear structure, each queue indented below its parent so taht it is easy to adjust and update as required. When i started testing with simple queues I could not see any way to clarify this. All the queues were just in a pile, which made it a bit tough to discern what was a parent and what was a child.
Have i done something wrong here? Is it possible for simple queues to be laid out like queue trees?
That’s the point. Each parent queue and all its child queues will only use 1 core.
When any 1 parent queues core maxes out, it doesn’t offload or bring into action a second core, it now causes all cores to lock as they are waiting on the first core to complete its task.
Is anyone from mikrotik able to confirm that this is an acknowledged issue and that a solution to resolve this issue is being put into place?
Any update about this?? i choose Mikrotik because it suppose it’s quality and cheap but with this problem…i start to doubt abt it…again…this problem has solution?? i really need to use queue tree…thank in advance