Community discussions

 
Railander
just joined
Topic Author
Posts: 24
Joined: Thu Jun 16, 2016 11:30 pm

CCR 0.3%+ packet loss whenever above 5% CPU

Thu Aug 31, 2017 9:34 pm

We have several CCRs and noticed small levels of packet loss (anywhere from 0.3% upwards of 2%) whenever CPU utilization is above 5%.
One thing we noticed is the only scenario the packet loss doesn't happen is when the CCR is using Fast Path for all its traffic. If Fast Path is disabled (firewall rules, QoS, etc) the packet loss will happen at pretty much any CPU level, starting at about 5% CPU, even with Fasttracking all connections.

Typically such levels of packet loss wouldn't be such a huge deal, but when you have 3 or more CCRs making up the backbone of your network and packets going through all of them give you collectively 2%+ packet loss that's a big problem for us.

Does anyone know why this micro packet loss happens? Is there a way to even avoid it?

To give you some more info, we even tried a CCR that had Fast Path enabled and 1-2% CPU at 0% packet loss and simply adding a firewall rule for
ip firewall filter add chain=forward action=fasttrack-connection
will bump the CPU to 9-11% and 0.4% packet loss.
 
mducharme
Trainer
Trainer
Posts: 868
Joined: Tue Jul 19, 2016 6:45 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Thu Aug 31, 2017 10:25 pm

Under System->Resources->CPU, see whether a specific CPU is hitting 100% at the time of the packet loss

The biggest challenge with the CCR's is that they have a large number of relatively weak CPUs, so a single CPU can get overloaded well before the entire unit does. Usually this is because something is configured in a way that is not optimal for multi-threading, and there is often a different configuration that will work that is more friendly for multi-threading.
 
Railander
just joined
Topic Author
Posts: 24
Joined: Thu Jun 16, 2016 11:30 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Thu Aug 31, 2017 10:34 pm

Hello.

From checking the logs this might indeed be the problem (the current RouterOS version doesn't support this, I'm yet to update it).

Image

Also, I noticed that disabling IP Flow in this CCR (1016) reduced CPU usage from ~20% to ~15%, while on another CCR (1072) enabling it had no notable change in the CPU which was still at 1-2%, both with about the same amount of traffic.

Additionally, this CCR had no firewall nor queues up and had Fast Path enabled for all its traffic. The only configurations are VLANs, bridges, bonding, and OSPF routing.

Is there any way to prevent individual CPUs from reaching 100% usage with this kind of setup? Such poor multi-threading seems to me like a fault of either the Linux kernel used or RouterOS itself.
 
mducharme
Trainer
Trainer
Posts: 868
Joined: Tue Jul 19, 2016 6:45 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Thu Aug 31, 2017 10:40 pm

Is there any way to prevent individual CPUs from reaching 100% usage?
Yes, certain ways of configuring things result in excessive load on a single core. The first thing to determine is what element of your config is the culprit - use the profiler tool in RouterOS to determine what process is responsible for the high load on one CPU, that's a good place to start.

Once you know what is putting a large load on one CPU you can look at how to reconfigure it to make it more multicore friendly.

For instance, one of the common ways of setting up queueing is to do a queue tree with parent global, there are many scripts out there that do this, works great on a MikroTik home router. For CCR it is terrible because all queue trees with a common parent are processed on a single CPU core, and it is better to use other configurations for queueing to accomplish the same task but better distribute it across the cores. There are other processes that similarly can bog down a single core. You just have to track down what is causing it.
 
mducharme
Trainer
Trainer
Posts: 868
Joined: Tue Jul 19, 2016 6:45 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Thu Aug 31, 2017 11:00 pm

What volume of traffic are you pushing through these?
 
Railander
just joined
Topic Author
Posts: 24
Joined: Thu Jun 16, 2016 11:30 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Thu Aug 31, 2017 11:59 pm

What volume of traffic are you pushing through these?
5Gbps tops, adding all interfaces.
 
mducharme
Trainer
Trainer
Posts: 868
Joined: Tue Jul 19, 2016 6:45 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Fri Sep 01, 2017 12:28 am

Can you paste your config with the hide sensitive option?

What did you learn from the profiler? You should be able to see when a CPU is maxed what process is maxing out that CPU.
 
Railander
just joined
Topic Author
Posts: 24
Joined: Thu Jun 16, 2016 11:30 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Fri Sep 01, 2017 12:31 am

Can you paste your config with the hide sensitive option?

What did you learn from the profiler? You should be able to see when a CPU is maxed what process is maxing out that CPU.
I need to update RouterOS, the current version in that CCR doesn't support per-core profiling.
Will do that later tonight and come back with more info.
 
mducharme
Trainer
Trainer
Posts: 868
Joined: Tue Jul 19, 2016 6:45 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Fri Sep 01, 2017 12:36 am

OK.. also BTW I think you misunderstand fastpath - fastpath is automatically active when it is enabled and you have no firewall rules.

Fasttrack is kindof a 'fastpath-lite' where you can fastpath some traffic in situations where you need to have firewall rules and other such things. It is not as efficient as just having everything fastpath'ed.

That is why adding the rule:

ip firewall filter add chain=forward action=fasttrack-connection

actually increases your CPU usage, because by doing that you are disabling full blown fastpath (as soon as you have firewall rules, it disables fastpath) and enabling fasttrack
 
Railander
just joined
Topic Author
Posts: 24
Joined: Thu Jun 16, 2016 11:30 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Fri Sep 01, 2017 12:39 am

OK.. also BTW I think you misunderstand fastpath - fastpath is automatically active when it is enabled and you have no firewall rules.

Fasttrack is kindof a 'fastpath-lite' where you can fastpath some traffic in situations where you need to have firewall rules and other such things. It is not as efficient as just having everything fastpath'ed.

That is why adding the rule:

ip firewall filter add chain=forward action=fasttrack-connection

actually increases your CPU usage, because by doing that you are disabling full blown fastpath (as soon as you have firewall rules, it disables fastpath) and enabling fasttrack
Yes, I am fully aware about all of that. That firewall rule was for testing purposes to help identify why I noticed such packet loss especially when the firewall was enabled (which means Fast Path disabled), and just to make sure it wasn't my filtering rules causing the issue I experimented with that simple rule and even then I had the exact same packet loss.
 
mducharme
Trainer
Trainer
Posts: 868
Joined: Tue Jul 19, 2016 6:45 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Fri Sep 01, 2017 12:55 am

OK.. also, good idea to check for certain bad settings that can kill your performance, for instance turning on "Use IP firewall" in the bridge settings kills the CPU, good to identify whether there might be some issue like that.
 
Railander
just joined
Topic Author
Posts: 24
Joined: Thu Jun 16, 2016 11:30 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Fri Sep 01, 2017 12:59 am

OK.. also, good idea to check for certain bad settings that can kill your performance, for instance turning on "Use IP firewall" in the bridge settings kills the CPU, good to identify whether there might be some issue like that.
Good one.
In my case it was already disabled in all routers.
 
mducharme
Trainer
Trainer
Posts: 868
Joined: Tue Jul 19, 2016 6:45 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Fri Sep 01, 2017 1:07 am

Do you have any masquerade rules? What is your OSPF topology like, is there anything that would cause frequent LSA's like /32 routes for PPP tunnels?

Also I am wondering about the bonding - fastpath apparently on works with bonded interfaces on receive, and apparently even then only since RouterOS 6.30
 
Railander
just joined
Topic Author
Posts: 24
Joined: Thu Jun 16, 2016 11:30 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Fri Sep 01, 2017 1:31 am

Do you have any masquerade rules? What is your OSPF topology like, is there anything that would cause frequent LSA's like /32 routes for PPP tunnels?

Also I am wondering about the bonding - fastpath apparently on works with bonded interfaces on receive, and apparently even then only since RouterOS 6.30
Regarding NAT, like I said I tried with the firewall completely empty save for that fasttrack rule and the problem still occurred.

Our network has 700ish OSPF routes, no other routing protocols besides OSPF.
There is no flapping LSA advertisements that I can see. Also no PPP tunnels.

RouterOS version on this one is 6.34.6, not sure if Fast Path is actually active on the bonding interfaces since there is no indicator, but there is basically no traffic on them to begin with, 200 Mbps up+down tops.
 
mducharme
Trainer
Trainer
Posts: 868
Joined: Tue Jul 19, 2016 6:45 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Fri Sep 01, 2017 2:39 am

Do you have connection tracking turned on? how big is your connection tracking table? I found previously that port scanning created many connections to port 445 filling up connection tracking table and causing CPU spikes. Not sure if this is exposed to the Internet or not and whether you are blocking those.

Also what about your ARP table? If you have an interface name instead of IP as a next hop, ARP entries for remote hosts (ex. on the internet) accumulate in the ARP table on the local device, often it is just due to a mistake.

What about bridge table - how many MACs?
 
Railander
just joined
Topic Author
Posts: 24
Joined: Thu Jun 16, 2016 11:30 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Fri Sep 01, 2017 3:36 am

Do you have connection tracking turned on? how big is your connection tracking table? I found previously that port scanning created many connections to port 445 filling up connection tracking table and causing CPU spikes. Not sure if this is exposed to the Internet or not and whether you are blocking those.

Also what about your ARP table? If you have an interface name instead of IP as a next hop, ARP entries for remote hosts (ex. on the internet) accumulate in the ARP table on the local device, often it is just due to a mistake.

What about bridge table - how many MACs?
conntrack is set to auto, and since I currently have no filter/NAT/mangle/raw rules it is off.
when it is on there are probably anywhere from 150k to 500k connections.

all routes in the routing table have IPs as the gateway (nexthop), with the exception of connected routes.

by bridge table do you mean bridge>hosts? currently 464.
 
mducharme
Trainer
Trainer
Posts: 868
Joined: Tue Jul 19, 2016 6:45 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Fri Sep 01, 2017 6:47 am

OK thanks - all of those sound fine - have run out of ideas as to why this might be happening, until there is profiler data and perhaps a config.
 
User avatar
dgnevans
Member
Member
Posts: 463
Joined: Fri Mar 08, 2013 11:24 am
Location: Zimbabwe
Contact:

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Sun Sep 03, 2017 10:34 pm

Recently I had an issue where I found I needed to disable IP Route Cache under ip settings.routers would hang for no reason with mutiple ospf adjacency changes. Ip route cache normally sitting on between 90 to 400 would quickly climb and cpu usage would be high. it would then lock up. I have since tunred off ip Route cache on all my routers. no problems since. cpu usage down as well.
 
Lupin
Member Candidate
Member Candidate
Posts: 265
Joined: Mon Feb 16, 2009 10:22 pm
Location: Italy

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Sun Oct 08, 2017 10:35 pm

conntrack is set to auto, and since I currently have no filter/NAT/mangle/raw rules it is off.
when it is on there are probably anywhere from 150k to 500k connections.

all routes in the routing table have IPs as the gateway (nexthop), with the exception of connected routes.

by bridge table do you mean bridge>hosts? currently 464.
Hi Railander, I see same behaviour on my network.
On CCR1016-12S-1S+ I've small packet loss, and on next hop with other 1016 packet loss is adding
No problem on CCR1072.

I try to record with Camtasia the CPU usage but I never see a core go to 100%
See blu lines on smokeping
pl.png
Have you find a solution?
You do not have the required permissions to view the files attached to this post.
 
mducharme
Trainer
Trainer
Posts: 868
Joined: Tue Jul 19, 2016 6:45 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Sun Oct 08, 2017 10:47 pm

Have you find a solution?
Are you sure that the 1016's themselves are the cause of the loss, and not the intervening connectivity? In general, loss is associated either with a single core being maxed out or close to maxed out (90% or higher), or congestion.
 
Lupin
Member Candidate
Member Candidate
Posts: 265
Joined: Mon Feb 16, 2009 10:22 pm
Location: Italy

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Sun Oct 08, 2017 11:13 pm

Have you find a solution?
Are you sure that the 1016's themselves are the cause of the loss, and not the intervening connectivity? In general, loss is associated either with a single core being maxed out or close to maxed out (90% or higher), or congestion.
See the video
https://youtu.be/Mrr6Ubh8hW4
 
plankanater
Member Candidate
Member Candidate
Posts: 166
Joined: Wed Mar 14, 2012 3:56 am

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Mon Oct 09, 2017 3:57 am

What router model and what ports are you using? From your video it looked like a 1072. You are not using the Ethernet port are you? If you are viewtopic.php?f=3&t=125361
 
Lupin
Member Candidate
Member Candidate
Posts: 265
Joined: Mon Feb 16, 2009 10:22 pm
Location: Italy

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Mon Oct 09, 2017 8:48 am

What router model and what ports are you using? From your video it looked like a 1072. You are not using the Ethernet port are you? If you are viewtopic.php?f=3&t=125361
On 1072 sfp-plus3 with 10Gbps Fiber
On 1016 sfpplus1 with 10Gbps Fiber
sfp module is:
https://mikrotik.com/product/Splus85DLC03D

I've no error, no fcs error, no align error, all errors filed stats is 0

I don't understand why I'm loosing packet :(
 
thobias
just joined
Posts: 22
Joined: Thu Nov 30, 2017 8:45 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Fri Feb 16, 2018 2:48 pm

Hi
We are also experiencing this with a couple of CCR1072, fastpath active on all interfaces and total cpu usage of around 30% with no core going over 70% in profiler.
Getting the same ping loss or more than OP. On another CCR1070 with less traffic and cpu usage under 5% there is no ping loss.
 
pe1chl
Forum Guru
Forum Guru
Posts: 5917
Joined: Mon Jun 08, 2015 12:09 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Fri Feb 16, 2018 3:16 pm

Again, you need to understand that it is a 72-core CPU and that the 5% load is relative to all the 72 cores fully loaded, so you can see that there easily can be one or two cores at 100% load (when there is a single-threaded process) and still show a 5% load for the entire unit.
You need to investigate in more detail what the load is.
 
changeip
Forum Guru
Forum Guru
Posts: 3803
Joined: Fri May 28, 2004 5:22 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Sat Feb 17, 2018 7:39 am

/ip settings
set icmp-rate-limit=0

then see if there is still packet loss...
Colo and Wholesale Bandwidth Available! Sales at SanDiegoBroadband dot com
 
thobias
just joined
Posts: 22
Joined: Thu Nov 30, 2017 8:45 pm

Re: CCR 0.3%+ packet loss whenever above 5% CPU

Mon Feb 19, 2018 10:24 am

/ip settings
set icmp-rate-limit=0

then see if there is still packet loss...
No unfortunately still the same after setting rate limit to 0. Also this would not explain why the loss adds upp the more mikrotik devices I go through.

Who is online

Users browsing this forum: belits17 and 145 guests