Community discussions

 
millenium7
Member Candidate
Member Candidate
Topic Author
Posts: 208
Joined: Wed Mar 16, 2016 6:12 am

CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Wed Jun 12, 2019 12:02 pm

This has just happened out of the blue. All data is transmitted to/from one of these routers via the SFPPlus1 port (connected with a Direct Attach Cable to a Mikrotik CRS328)
I went to site and logged into the router via ethernet/laptop before touching anything and found the port just entirely stopped transmitting data. It was receiving it OK but sat on 0 packets/s outbound
It's not the cable because moving it to port2 it works. And we have a 2nd backup router in place with exactly the same configuration and physical setup, I moved the cable from it to the first router and again receives packets but doesn't transmit

After rebooting the router, the port functioned fine for about 16 hours then exactly the same thing happened (fortunately this time I wrote a script to check for it and reboot the router)
But this is a huge problem, this is a production router with a lot of traffic going through it. Obviously we'll RMA the device but I want to know if this has been reported before, if there's any kind of known issue

RouterOS version is 6.42.3 and has been stable for months
Recent changes prior to this happening is adding 2 more neighbors via OSPF (and only about another 100 routes)
Router runs BGP + OSPF + MPLS and has about 152,000 routes in the routing table so shouldn't be overwhelmed, has 16gb memory (most of it is free) and 800mb+ free disk space so I doubt its a resource issue. Even if it was I wouldn't expect the SFP port to just entirely stop transmitting data, if anything it would be opposite if there was a routing loop or something going on.
Logs show nothing
 
glueck05
just joined
Posts: 5
Joined: Fri Jan 26, 2018 12:49 pm

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Wed Jun 12, 2019 2:14 pm

Hello,
i have have seen this issue also on CCR1036 on sfp-sfplus2 and on CCR1072 (also connected via DAC) often. The port shows Running but does not send any data. When this happens i disable and enable the port via netwatch and it worked again. A reboot is not nessescary (in my case).

RouterOS 6.42.12.

regards
 
millenium7
Member Candidate
Member Candidate
Topic Author
Posts: 208
Joined: Wed Mar 16, 2016 6:12 am

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Thu Jun 13, 2019 9:54 am

Cycling interface isn't a solution and for us would still result in an extended outage as this router handles PPPoE connections

Have replaced 1x router with the new CCR1036 revision that has dual power supplies and updated both to 6.44.3 including firmware
Will report back if it continues to lock up. And if it does we'll be ripping routers out and replacing with CCR1016's as we've had less issues with those
 
millenium7
Member Candidate
Member Candidate
Topic Author
Posts: 208
Joined: Wed Mar 16, 2016 6:12 am

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Thu Jun 13, 2019 1:33 pm

Nope, new hardware revision and 6.44.3 still same problem

So it's very likely some bug with the hardware or underlying OS that produces no logs and no information to us. As I can't possibly see how you can stop a SFP port from transmitting data no matter what you tried to do via scripting or configuration aside from a bridge filter rule (which wouldn't fix itself after a reboot)

The recent changes are enabling OSPF - but was stable initially, and enabling MPLS to those peers. That inherently won't cause the issue but it may trigger some bug. I've reverted all changes for now as the router should still reboot overnight if the problem is still there. Regardless we'll be swapping out with 1016's
 
millenium7
Member Candidate
Member Candidate
Topic Author
Posts: 208
Joined: Wed Mar 16, 2016 6:12 am

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Mon Jun 17, 2019 3:53 pm

replaced with brand new CCR1016's and the same problem happens!
This is caused by either OSPF or MPLS in combination with what's already running (eBGP, iBGP, PPPoE, IPSec). When OSPF+MPLS are disabled it's fine. But when enabling them the network is perfectly stable and looks totally fine for a few hours, literally no visible issues at all, definitely no routing loops or anything weird happening. Then after a random interval of a few hours the interface totally locks up in the transmit direction

Not a routing problem, I mean it totally locks up, not even Layer2 neighbor hello packets go out, dead. Disabled/enabling the interface doesn't seem to work, have to reboot the router
Routers were all running 6.44.3 and had /system routerboard upgrade applied as well

This is a big problem. Right now i'm thinking throw the CCR's in the bin and replace with Cisco. This has already cost way too much in downtime
 
mducharme
Trainer
Trainer
Posts: 875
Joined: Tue Jul 19, 2016 6:45 pm

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Tue Jun 18, 2019 2:04 am

We have this problem, but for us it happens every 30-90 days or so. It last happened 57 days ago. We have a ping watchdog to reboot the router when this happens. Disabling and re-enabling the interface might fix it too. Same CCR1036-8G-2S+, first generation. We have two CCR's connected to each other, one is PPPoE concentrator, the other not. The one that is not a PPPoE concentrator has no issues. Both run MPLS and OSPF.

We were soon going to be replacing the device with a CCR1072.
 
millenium7
Member Candidate
Member Candidate
Topic Author
Posts: 208
Joined: Wed Mar 16, 2016 6:12 am

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Tue Jun 18, 2019 12:34 pm

We have this problem, but for us it happens every 30-90 days or so. It last happened 57 days ago. We have a ping watchdog to reboot the router when this happens. Disabling and re-enabling the interface might fix it too. Same CCR1036-8G-2S+, first generation. We have two CCR's connected to each other, one is PPPoE concentrator, the other not. The one that is not a PPPoE concentrator has no issues. Both run MPLS and OSPF.

We were soon going to be replacing the device with a CCR1072.
This was happening to us every 1-12 hours, extremely disruptive to the network

We have had the combination of technologies in various forms as we've changed the network layout over the time. 1.5 years ago we did have OSPF + MPLS + PPPoE + BGP running on CCR1036's at 3 different sites and it was working fine
Now if we try to do the same with everything running on the same router the the interface will lock up. Only thing different is previously BGP was only receiving default routes, now we get much larger BGP routing tables, more PPPoE connections, larger OSPF network, and we are using SFP+ instead of ethernet interfaces

I don't know exactly what the problem is. If it's PPPoE in combination with everything else then great we can simply remove PPPoE from it, that's the easiest thing to do.
My plan is to remove PPPoE from that router anyway and bring it as close to every customer as possible, because it provides for easier QoS, faster reconnection as PPPoE won't drop and shorter paths when there's a routing failure to the closest edge router. Hopefully that will fix the problem, but i'm not going to do anything for a while except plan. We already have some very angry customers who have had to put up with continued disconnections for days.

I may end up just getting rid of the MikroTik routers at key locations and instead using something else. Starting to get too many issues with MikroTik. Good for distribution and customer equipment, not so good for the core
 
millenium7
Member Candidate
Member Candidate
Topic Author
Posts: 208
Joined: Wed Mar 16, 2016 6:12 am

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Mon Jun 24, 2019 8:16 am

I setup a lab using 1 of the existing routers, leaving config exactly the same. Used other devices to simulate switches and other routers
Setup BGP+OSPF+MPLS routers as good as I can but obviously not as big as the actual network. Added 200 PPPoE sessions with traffic generator across several routers to just send traffic all over the place. Setup a fake BGP router to inject global routing tables and blackhole all traffic to it. Simulated flapping PPPoE connections every few seconds. Also had traffic generator send from the CCR to another CCR (in place instead of CRS317 switch as I don't have a spare lying around) and run a total of 9.5gbit/s constantly through it. Using the same DAC cable

Has been stable for days. It's definitely not an issue with how things are configured
Maybe its very specific to having a CRS317 on the other end of the CCR. Maybe it's only when there's a certain number of active routes in the OSPF or MPLS table, who knows. But I can't replicate the issue so therefore I can't find a workaround, big problem as I don't want to try this on a production network again unless I can be sure it's going to be stable

Who is online

Users browsing this forum: No registered users and 12 guests