Community discussions

MikroTik App
 
sebastian
just joined
Topic Author
Posts: 6
Joined: Wed Aug 08, 2018 8:16 am

Random latency peaks: CCR1016-12S-1S+ hardware design issue suspected!

Mon Aug 13, 2018 10:32 am

Hello.
We have discovered some issue with latency peaks in our CCR1016-12S-1S+.

Initial remarks:
  • In our company we are using Zabbix (v 3.4.11) to monitor our devices. The Zabbix server is directly connected to CCR1016-12S-1S+
    The CCR has RouterOS 6.42.1 (Current)
    The CCR has SFP modules manufactured by MikroTik
There are some strange high latency peaks, which you can see in the screenshot from Zabbix attached as connected_data_ccr12S.png. Check attachment for better image quality.
connected_data_ccr12S-small.jpg
All tree presented graphs relate to the same period of time. If we compare the latency graph (on the top) with the CPU graph (in the middle) and interface traffic (bottom) we can see that even when there is low CPU usage and low traffic, there are still some high latency peaks. Those peaks are about 10 to 80ms.
To eliminate our suspicion that maybe the configuration of our production CCR is problematic, we did some test with a simple configuration (25 IP Firewall filter rules and 25 Mangle rules). For the tests we used MT traffic generator.
Also we did the same tests with a CCR1016-12G. The exact configuration we applied on a clean router (no defaults).
Image with latency distribution from traffic generator is in attachment: traffic-generator-distribution.png. In the image we present latency distribution for both routers CCR1016-12S-1S+ and CCR1016-12G) and for different packet size. Check attachment for better image quality.
traffic-generator-distribution-small.jpg
Besides we checked and problem still persist:
  • We replaced SFP RJ45 (S-31DLC20D,S-RJ01) in CCR1016-12S-1S+ with DAC SFP cables
    All types of interface queues: default-small, default etc.
Our observations:
  • those latency peaks always appears on CCR1016-12S-1S+ in same latency ranges 60ms+ m, 28ms+, etc.
    the latency peaks don't occur on CCR1016-12G although the boards have the same architecture, CPUs and RAM.
We tried replacing the suspected CCR1016-12S-1S+ with another CCR1016-12S-1S+ unit. The results are exactly the same, from which I conclude the issue is not specific to a particular unit.

Anyone else facing the same problem with CCR1016-12S-1S+?
You do not have the required permissions to view the files attached to this post.
 
djdrastic
Member
Member
Posts: 367
Joined: Wed Aug 01, 2012 2:14 pm

Re: Random latency peaks: CCR1016-12S-1S+ hardware design issue suspected!

Tue Aug 14, 2018 12:19 am

I recently rolled out one site and noticed the same pattern.I initially thought it was a bonding issue , as we were running a A/P bond to the other side.Booted up the coldspare unit the customer bought and we were still seeing the same weird latency.Both ends were 2x CCR1016-12S-1S+'s with Mikrotik BiDi and SFP RJ45 modules on FW version 6.42.5 . I tried all variations off Bridge HW Offloading ON/OFF as well as FastPath ON/OFF in ip/settings with corresponding firewall rules.

Good to see I wasn't going crazy because I initially thought the customer was experiencing a DDOS but traffic analysis showed lets than a couple of mbits worth of traffic passing through.
 
joegoldman
Forum Veteran
Forum Veteran
Posts: 767
Joined: Mon May 27, 2013 2:05 am

Re: Random latency peaks: CCR1016-12S-1S+ hardware design issue suspected!

Tue Aug 14, 2018 12:49 am

I had a somewhat _similar_ problem on my CCR1036's a while back. It presented a little bit differently but ultimately just a high spike of use for a few seconds then settles down.

What it ended up being on my side was my 'BGP Nail' routes - i.e. so I can advertise my /24's out to the world, I'd put a high metric route in the route table so BGP see's an active route to push out to transit and peering. I had these routes as active routes to a Loopback interface (Bridge interface with no members) I did this because this is similar to how I was taught in Cisco world.

After having this issue for a while, I changed the routes to Blackhole - as they still remain active int he route table, and with a high metric so smaller routes and any real /24 routes will take precedence, and this stopped my issue completely. Basically I'd say as big netscans come through and hit my unused IP's the rush of traffic into the bridge interface caused the router to have a spaz.

So if you use this router with a lot of public IP space and you are routing a lot of it into a real interface as a nail route, try blackholing them instead.

If you are unsure about what im explaining, then please do an /export hide-sensitive and paste the output iinto the forum so we can have a look and help determine the problem.
 
djdrastic
Member
Member
Posts: 367
Joined: Wed Aug 01, 2012 2:14 pm

Re: Random latency peaks: CCR1016-12S-1S+ hardware design issue suspected!

Tue Aug 14, 2018 11:28 am

Thanks joe , I'll review the config again though this site is very very basic.Just a simple bond interface between 2 dark fibers and IBGP on either end distributing some private BGP prefixes.


Sebastian , just for a point of interest . Something I've seen with my situation is the latency to the routerboard/loopback IP suffers this fate , but traffic going through the router does not get affected or at least that's what I've seen in my testing.Not sure if you've seen the same thing.
 
User avatar
Steveocee
Forum Guru
Forum Guru
Posts: 1120
Joined: Tue Jul 21, 2015 10:09 pm
Location: UK
Contact:

Re: Random latency peaks: CCR1016-12S-1S+ hardware design issue suspected!

Wed Aug 15, 2018 11:02 pm

An older issue with lower end models was that having the LCD screen active would cause latency spikes. Don’t know if you have LCD active but may be worth a try turning it off if it is on?
 
sebastian
just joined
Topic Author
Posts: 6
Joined: Wed Aug 08, 2018 8:16 am

Re: Random latency peaks: CCR1016-12S-1S+ hardware design issue suspected!

Thu Aug 16, 2018 3:15 pm

@Steveocee
An older issue with lower end models was that having the LCD screen active would cause latency spikes. Don’t know if you have LCD active but may be worth a try turning it off if it is on?
We have LCD screen off on both CCR's

@djdrastic
Sebastian , just for a point of interest . Something I've seen with my situation is the latency to the routerboard/loopback IP suffers this fate , but traffic going through the router does not get affected or at least that's what I've seen in my testing.Not sure if you've seen the same thing.
Problem applies to traffic passing through the router. From what i know latency distribution show response time only when packets passing through the router.
Connection schema for traffic generator and device under test (CCR) is below.
schemat_traffic-new.jpg
@joegoldman
So if you use this router with a lot of public IP space and you are routing a lot of it into a real interface as a nail route, try blackholing them instead.
If you are unsure about what im explaining, then please do an /export hide-sensitive and paste the output iinto the forum so we can have a look and help determine the problem.
To eliminate our suspicion that maybe the configuration of our production CCR is problematic, we did some test with a simple configuration (25 IP Firewall filter rules and 25 Mangle rules). For the tests we used MT traffic generator.
In test configuration we used very simple mangle and filter rules without VLANs, bridging and there are only few routing paths.
Config you can se in attachment DUT_12S-config.txt.
You do not have the required permissions to view the files attached to this post.
 
sebastian
just joined
Topic Author
Posts: 6
Joined: Wed Aug 08, 2018 8:16 am

Re: Random latency peaks: CCR1016-12S-1S+ hardware design issue suspected!

Tue Dec 04, 2018 2:33 pm

Hello, again.
I came with new informations about CCR1016-12S-1S+. Our suspicions have confirmed.
All test was done by switching new test-unit to work as main production router.
Test what we have done:
  • At start we done some test by switching between our two CCR1016-12S-1S+, problem persists at both local CCR.
  • Some time ago we brought to test our third CCR1016-12S-1S+ from other location, and problem is present at this device also.
  • We have borrowed brand new CCR1016-12S-1S+ from our supplier and we used this new CCR1016-12S-1S+ to work as main production router. Problem persists at fourth device.
All four CCR1016-12S-1S+ which we used are somehow damaged, as MikroTik support suggested in our mail? In our opinion it's impossible.

Final test
Our tests which we mentioned in first post which we have done with traffic generator, shows CCR1016-12G don't have that problem which CCR1016-12S-1S+ have.
We decided swap CCR1016-12S-1S+ with CCR1016-12G to work as production router but we had to use media converters for fiber connections.
Just check image from our Zabbix, speaks for itself.
Difference is only we use CCR1016-12G instead of CCR1016-12S-1S+.
12G-vs-12S — kopia.png

Are @Support planning fix this issue in CCR1016-12S-1S+ by software patch ?
You do not have the required permissions to view the files attached to this post.
 
User avatar
Maggiore81
Trainer
Trainer
Posts: 564
Joined: Sun Apr 15, 2012 12:10 pm
Location: Italy
Contact:

Re: Random latency peaks: CCR1016-12S-1S+ hardware design issue suspected!

Fri Dec 07, 2018 3:38 pm

did you contact support?
 
sebastian
just joined
Topic Author
Posts: 6
Joined: Wed Aug 08, 2018 8:16 am

Re: Random latency peaks: CCR1016-12S-1S+ hardware design issue suspected!

Mon Dec 10, 2018 11:46 am

Ofc i did. Support dont belive this is common problem. Our suspicions have been confirmed after testing the brand new CCR1016-12S-1S+ and when we swapped CCR1016-12S-1S+ to CCR1016-12G.

Who is online

Users browsing this forum: No registered users and 40 guests