Community discussions

MikroTik App
 
millenium7
Member
Member
Topic Author
Posts: 313
Joined: Wed Mar 16, 2016 6:12 am

CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Wed Jun 12, 2019 12:02 pm

This has just happened out of the blue. All data is transmitted to/from one of these routers via the SFPPlus1 port (connected with a Direct Attach Cable to a Mikrotik CRS328)
I went to site and logged into the router via ethernet/laptop before touching anything and found the port just entirely stopped transmitting data. It was receiving it OK but sat on 0 packets/s outbound
It's not the cable because moving it to port2 it works. And we have a 2nd backup router in place with exactly the same configuration and physical setup, I moved the cable from it to the first router and again receives packets but doesn't transmit

After rebooting the router, the port functioned fine for about 16 hours then exactly the same thing happened (fortunately this time I wrote a script to check for it and reboot the router)
But this is a huge problem, this is a production router with a lot of traffic going through it. Obviously we'll RMA the device but I want to know if this has been reported before, if there's any kind of known issue

RouterOS version is 6.42.3 and has been stable for months
Recent changes prior to this happening is adding 2 more neighbors via OSPF (and only about another 100 routes)
Router runs BGP + OSPF + MPLS and has about 152,000 routes in the routing table so shouldn't be overwhelmed, has 16gb memory (most of it is free) and 800mb+ free disk space so I doubt its a resource issue. Even if it was I wouldn't expect the SFP port to just entirely stop transmitting data, if anything it would be opposite if there was a routing loop or something going on.
Logs show nothing
 
glueck05
just joined
Posts: 12
Joined: Fri Jan 26, 2018 12:49 pm

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Wed Jun 12, 2019 2:14 pm

Hello,
i have have seen this issue also on CCR1036 on sfp-sfplus2 and on CCR1072 (also connected via DAC) often. The port shows Running but does not send any data. When this happens i disable and enable the port via netwatch and it worked again. A reboot is not nessescary (in my case).

RouterOS 6.42.12.

regards
 
millenium7
Member
Member
Topic Author
Posts: 313
Joined: Wed Mar 16, 2016 6:12 am

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Thu Jun 13, 2019 9:54 am

Cycling interface isn't a solution and for us would still result in an extended outage as this router handles PPPoE connections

Have replaced 1x router with the new CCR1036 revision that has dual power supplies and updated both to 6.44.3 including firmware
Will report back if it continues to lock up. And if it does we'll be ripping routers out and replacing with CCR1016's as we've had less issues with those
 
millenium7
Member
Member
Topic Author
Posts: 313
Joined: Wed Mar 16, 2016 6:12 am

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Thu Jun 13, 2019 1:33 pm

Nope, new hardware revision and 6.44.3 still same problem

So it's very likely some bug with the hardware or underlying OS that produces no logs and no information to us. As I can't possibly see how you can stop a SFP port from transmitting data no matter what you tried to do via scripting or configuration aside from a bridge filter rule (which wouldn't fix itself after a reboot)

The recent changes are enabling OSPF - but was stable initially, and enabling MPLS to those peers. That inherently won't cause the issue but it may trigger some bug. I've reverted all changes for now as the router should still reboot overnight if the problem is still there. Regardless we'll be swapping out with 1016's
 
millenium7
Member
Member
Topic Author
Posts: 313
Joined: Wed Mar 16, 2016 6:12 am

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Mon Jun 17, 2019 3:53 pm

replaced with brand new CCR1016's and the same problem happens!
This is caused by either OSPF or MPLS in combination with what's already running (eBGP, iBGP, PPPoE, IPSec). When OSPF+MPLS are disabled it's fine. But when enabling them the network is perfectly stable and looks totally fine for a few hours, literally no visible issues at all, definitely no routing loops or anything weird happening. Then after a random interval of a few hours the interface totally locks up in the transmit direction

Not a routing problem, I mean it totally locks up, not even Layer2 neighbor hello packets go out, dead. Disabled/enabling the interface doesn't seem to work, have to reboot the router
Routers were all running 6.44.3 and had /system routerboard upgrade applied as well

This is a big problem. Right now i'm thinking throw the CCR's in the bin and replace with Cisco. This has already cost way too much in downtime
 
mducharme
Trainer
Trainer
Posts: 1020
Joined: Tue Jul 19, 2016 6:45 pm

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Tue Jun 18, 2019 2:04 am

We have this problem, but for us it happens every 30-90 days or so. It last happened 57 days ago. We have a ping watchdog to reboot the router when this happens. Disabling and re-enabling the interface might fix it too. Same CCR1036-8G-2S+, first generation. We have two CCR's connected to each other, one is PPPoE concentrator, the other not. The one that is not a PPPoE concentrator has no issues. Both run MPLS and OSPF.

We were soon going to be replacing the device with a CCR1072.
 
millenium7
Member
Member
Topic Author
Posts: 313
Joined: Wed Mar 16, 2016 6:12 am

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Tue Jun 18, 2019 12:34 pm

We have this problem, but for us it happens every 30-90 days or so. It last happened 57 days ago. We have a ping watchdog to reboot the router when this happens. Disabling and re-enabling the interface might fix it too. Same CCR1036-8G-2S+, first generation. We have two CCR's connected to each other, one is PPPoE concentrator, the other not. The one that is not a PPPoE concentrator has no issues. Both run MPLS and OSPF.

We were soon going to be replacing the device with a CCR1072.
This was happening to us every 1-12 hours, extremely disruptive to the network

We have had the combination of technologies in various forms as we've changed the network layout over the time. 1.5 years ago we did have OSPF + MPLS + PPPoE + BGP running on CCR1036's at 3 different sites and it was working fine
Now if we try to do the same with everything running on the same router the the interface will lock up. Only thing different is previously BGP was only receiving default routes, now we get much larger BGP routing tables, more PPPoE connections, larger OSPF network, and we are using SFP+ instead of ethernet interfaces

I don't know exactly what the problem is. If it's PPPoE in combination with everything else then great we can simply remove PPPoE from it, that's the easiest thing to do.
My plan is to remove PPPoE from that router anyway and bring it as close to every customer as possible, because it provides for easier QoS, faster reconnection as PPPoE won't drop and shorter paths when there's a routing failure to the closest edge router. Hopefully that will fix the problem, but i'm not going to do anything for a while except plan. We already have some very angry customers who have had to put up with continued disconnections for days.

I may end up just getting rid of the MikroTik routers at key locations and instead using something else. Starting to get too many issues with MikroTik. Good for distribution and customer equipment, not so good for the core
 
millenium7
Member
Member
Topic Author
Posts: 313
Joined: Wed Mar 16, 2016 6:12 am

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Mon Jun 24, 2019 8:16 am

I setup a lab using 1 of the existing routers, leaving config exactly the same. Used other devices to simulate switches and other routers
Setup BGP+OSPF+MPLS routers as good as I can but obviously not as big as the actual network. Added 200 PPPoE sessions with traffic generator across several routers to just send traffic all over the place. Setup a fake BGP router to inject global routing tables and blackhole all traffic to it. Simulated flapping PPPoE connections every few seconds. Also had traffic generator send from the CCR to another CCR (in place instead of CRS317 switch as I don't have a spare lying around) and run a total of 9.5gbit/s constantly through it. Using the same DAC cable

Has been stable for days. It's definitely not an issue with how things are configured
Maybe its very specific to having a CRS317 on the other end of the CCR. Maybe it's only when there's a certain number of active routes in the OSPF or MPLS table, who knows. But I can't replicate the issue so therefore I can't find a workaround, big problem as I don't want to try this on a production network again unless I can be sure it's going to be stable
 
WirelessDSL
newbie
Posts: 34
Joined: Thu Nov 24, 2011 12:43 pm
Location: Germany
Contact:

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Fri Mar 20, 2020 8:32 pm

Did anyone found a solution for this problem?

We experience the same issue.
2x CCR1072 connected to each other (V6.46.4 also with updated Firmware)
- 3m Mikrotik S+DA0003 -> suddenly traffic stops (between 5min and approx. 16h)
- Mikrotik SFP+ S+31DLC10D with LC-Patchcable -> suddenly traffic stops (between 5min and approx. 16h)
- Ports tested with and without Autonegotiation

Router are configured with OSPF+MPLS. Around 130 OSPF Routes.

Third router connected to one of the CCR1072 with DAC (S+DA0003) CCR1016-12S-1S+ (V6.46.4 also with updated Firmware) also with OSPF+MPLS work without issues.

I can´t reproduce it. Suddenly the traffic stops. No Layer 2/3 connectivity, no neighbour discovery possible. No log entries.

Sometimes it works to disable the interface and reenable it again. But sometimes I have to reboot one device to get it work again.

Any ideas?
MTCNA MTCRE MTCTCE MTCUME MTCWE MTCINE
 
millenium7
Member
Member
Topic Author
Posts: 313
Joined: Wed Mar 16, 2016 6:12 am

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Sat Mar 21, 2020 12:50 am

I found no solution, and the amount of outages and customer issues this caused i'll never be trying it again
We've had to keep those core routers entirely OSPF and MPLS free. As PPPoE is still terminated on those routers, this means we lose automatic failover if a major site goes down, and we have to manually move VPLS tunnels to another main site that links to the core

This is not a great solution at all, but the network has been stable enough that it hasn't been a major problem. We get alerts immediately when a BGP session to those core routers goes down, so it only takes at most 5 minutes to move all the tunnels over

The plan is to eventually move 90% of our customers over to a DHCP Option 82 based system. The hurdle has been route injection so allow /32 addresses to be assigned to customers anywhere in the network without having to manually do it (yuck). I managed to write a script to handle that recently
And for the rest of the customers, no more VPLS tunnels. Their PPPoE will terminate on the closest router not at a centralized location
This way the core can only have eBGP to the internet, and iBGP to major distribution sites. All customer traffic will be regular IP traffic which solves the failover problem of VPLS tunnels
 
WirelessDSL
newbie
Posts: 34
Joined: Thu Nov 24, 2011 12:43 pm
Location: Germany
Contact:

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Sat Mar 21, 2020 3:18 pm

Thanks for the reply.

I opened a ticket at Mikrotik with a mark to this thread and with supouts.
Maybe they take some time into it.

With 1G-Connections/Routers we never saw this problem. I think it´s related to CCR1072, maybe a firmware issue. With CCR1016 it isn´t happening until right now.
But with no logs, there is no debug possible.

You´re network concept with DHCP Option 82 sounds great, but I love the flexibility with MPLS and in 99% of the time it is stable enough.
I don´t want to think about changing the network concept ;)
MTCNA MTCRE MTCTCE MTCUME MTCWE MTCINE
 
millenium7
Member
Member
Topic Author
Posts: 313
Joined: Wed Mar 16, 2016 6:12 am

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Sun Mar 22, 2020 12:42 am

The biggest benefit of DHCP for both for us and customers is they can just take any router straight out of the box, plug it in and bam immediately have internet access, as almost all routers are configured for DHCP by default. They can factory reset it, still works just fine. Because MikroTik routers don't have very good WiFi coverage, this makes life a whole lot easier when we can just tell them to go buy one with big antenna's from a store and plug it in, problem solved. So far every router i've tried has had no problem with /32 assignments meaning no wasted IP's

The next biggest benefit is the connection does not need to be 'established'. A brief outage or a change of data path on a PPPoE connection can mean that data stops flowing, the circuit has to time-out then reconnect before it can flow again. MikroTik is the fastest at this but even so its often a few seconds which is long enough to effectively kill a VoIP session, and most other vendors are painfully slow upwards of 30 seconds. And when the PPPoE connection drops it flushes connections so can result in website timeouts etc. The net experience for the customer is a bit worse
Whereas with DHCP/straight IP it's treated on a per packet basis, no need to re-establish a circuit. If the data path changes even multiple times, it doesn't matter as the packets will still get to their destination. And the recovery time from a link failure is practically the same as it takes for the link to come back up, no additional waiting

The final benefit is traffic engineering. You cannot separate traffic inside a PPPoE tunnel, it all flows exactly the same way. We have quite a few 24ghz and 60ghz links that go down in the rain, they are backed up by 5ghz and the current failover is only ~200ms but now I can separate VoIP to always use the 5ghz when available, and I can reserve bandwidth for VoIP

The last 2 can mostly be achieved by just moving the PPPoE session as close to the customer as possible but its still not as good
 
WirelessDSL
newbie
Posts: 34
Joined: Thu Nov 24, 2011 12:43 pm
Location: Germany
Contact:

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Sun Mar 29, 2020 11:34 pm

Update:

I changed the ports on CCR1072.

1st Router
Before: SFPPlus 1-3 (with OSPF+MPLS)
After: SFPPlus 6-8 (with OSPF+MPLS)

2nd Router
Before: SFPPlus 2 and 3 (with OSPF+MPLS)
After: SFPPlus 7 and 8 (with OSPF+MPLS)

Now it´s working since two days without any issues. Have to wait a few days how it´s working now.

I found another thread which pointed me to this possible solution.
viewtopic.php?t=102946

Now it seems there are some issues with OSPF+MPLS with Port 1-4. Maybe a hardware issue.
MTCNA MTCRE MTCTCE MTCUME MTCWE MTCINE
 
glueck05
just joined
Posts: 12
Joined: Fri Jan 26, 2018 12:49 pm

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Mon Apr 06, 2020 9:59 am

@wirelessDSL: Does it work stable for the last week?

thanks,
glueck
 
WirelessDSL
newbie
Posts: 34
Joined: Thu Nov 24, 2011 12:43 pm
Location: Germany
Contact:

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Mon Apr 06, 2020 10:56 am

Since now. Everything is fine.

Hope for the best.
MTCNA MTCRE MTCTCE MTCUME MTCWE MTCINE
 
millenium7
Member
Member
Topic Author
Posts: 313
Joined: Wed Mar 16, 2016 6:12 am

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Wed Apr 08, 2020 10:23 am

This happened AGAIN in our network at a different location, but to 'ethernet' ports this time. So this bug seemingly doesn't care if its ethernet or SFP modules
This happened on a CCR1009-7G-1C-1S+

That site has had issues with VPLS tunnels randomly dropping off over the past couple months. I very thoroughly combed the MPLS labels on every router in the path, checked OSPF, checked everything i could. There's absolutely no issue, traceroutes show its using MPLS just fine, yet the tunnels just will not come up. Yet reboot the router and they work....... for a short while, then they start to drop off 1 by 1 for absolutely no reason again

A couple months pass by and we reach today where ether1 & ether3 just completely stop transmitting data, exact same situation as the original post. They receive fine, can see neighbors, but can't MAC ping or anything. TX bytes remain at 0 forever
Those ports were not bridged and had nothing in common. Rebooting the router the ports work perfectly fine for 1-10 minutes then suddenly just stop transmitting

I've narrowed the issue down to either MPLS or LDP, one or the other. But since you can't viably use MPLS without LDP I just have to abandon MPLS entirely
This is a HUUUUUUUUUUUGE problem MikroTik, holy hell. I understand not having some enterprise features, but when you have a bug like this that cripples the network on a supposedly supported feature, and show absolutely zero response to it, it's just not acceptable (yep, we submitted multiple supouts and did all the troubleshooting ourselves)

Our immediate solution is to completely disable MPLS on that router and use EoIP tunnels to all the sites, and between all sites that transit through that router
Our next move is to rip MPLS out of the entire network as its been slowly causing weird behavior like this for seemingly no reason. All PPPoE sessions will be terminated as close to the customer as possible to remove the need to VPLS tunnels
Yet we have issues with data throughput rate on RB3011's (already made a topic about it). So our longer term solution may be to ditch MikroTik entirely in our distribution and core network and go elsewhere. It seems at a certain scale, things just break, and there's too many little problems with huge consequences. When this happened last time it cost us several customers, just not worth it
 
WirelessDSL
newbie
Posts: 34
Joined: Thu Nov 24, 2011 12:43 pm
Location: Germany
Contact:

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Wed Apr 08, 2020 11:10 am

I got an answer from Mikrotik about the issue with stopping traffic on interfaces.

"It does not look like a hardware-related issue. Seems some similar issues have been reproduced in our labs, when suddenly Tx traffic stopped on a physical interface and it is related to L2MTU handling on the device, we will try to improve this in further RouterOS versions, but at moment we cannot say any ETA.
In our tests, it seems like work around helps if you simply increase the maximum L2MTU on some interface (it can be even an unused interface) and then restore it to a default value. For example, try to enter these commands:
/interface ethernet set sfp-sfpplus8 l2mtu=10222
/interface ethernet set sfp-sfpplus8 l2mtu=1580
It will create a short link down on all interfaces and after this procedure this issue should not appear more.
If you reboot or upgrade your router, then you should follow the same procedure again until we include a improvements in further RouterOS versions.
Please share your feedback if this stops the interface hang.
"

Maybe this could solve the issue temporary.

I´ll try that if it happens again.
MTCNA MTCRE MTCTCE MTCUME MTCWE MTCINE
 
millenium7
Member
Member
Topic Author
Posts: 313
Joined: Wed Mar 16, 2016 6:12 am

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Thu Apr 09, 2020 2:32 am

That's not a great fix
But would simply increasing the L2MTU and not restoring it back down help? Because there is no harm in setting L2MTU to max. Infact I don't know why it isn't set to maximum (that goes for every single device on the market). Nothing will ever send larger L2 frames unless specifically told to, i.e. you start stacking on lots of VLAN tags, MPLS/VPLS, PPPoE, increasing L3MTU, using IP packing etc. Even L2 protocols like ARP are small packets they don't suddenly become super large and pose an issue, as L2 packets just get silently dropped there is no communication mechanism hence all the protocols are built around assuming a certain universally accepted L2 MTU size anyway
 
emmabnt03
just joined
Posts: 1
Joined: Mon Aug 10, 2020 6:27 am

Re: CCR1036-8G-2S+ - SFP+ port stops transmitting data?

Thu Aug 13, 2020 3:23 am

I have the same drawback. It happened after an update. when CCR1036-8G-2S + was installed (in Replacement of an RB1100) it was this fault from the beginning. It was solved by reinstalling from netinstall. it worked for 4 months. I will try this interim solution and report if it doesn't fail. too much headaches has not given

Who is online

Users browsing this forum: Bing [Bot] and 22 guests