CCR1009: ether1 through ether4 sporadically drop then instantly come back up - switch chip problem?

We deployed our first CCR1009 at a remote site, and it looks to be a wonderful midway unit for routing between trying to software bridge on a CRS125 and spending a lot on other CCR models that are a bit of overkill.

However, ever since we deployed this CCR1009, I’ve been noticing this every few hours:

apr/22 17:23:35 interface,info ether1 link down
apr/22 17:23:35 interface,info ether2 link down
apr/22 17:23:35 interface,info ether3 link down
apr/22 17:23:35 interface,info ether4 link down
apr/22 17:23:35 route,ospf,info OSPFv2 neighbor X.X.128.33: state change from Full to Down
apr/22 17:23:35 route,ospf,info OSPFv2 neighbor X.X.128.47: state change from Full to Down
apr/22 17:23:37 interface,info ether2 link up (speed 100M, full duplex)
apr/22 17:23:37 interface,info ether3 link up (speed 100M, full duplex)
apr/22 17:23:38 interface,info ether1 link up (speed 1G, full duplex)
apr/22 17:23:38 interface,info ether4 link up (speed 1G, full duplex)
apr/22 17:24:28 route,ospf,info Database Description packet has different master status flag
apr/22 17:24:28 route,ospf,info     new master flag=false
apr/22 17:24:28 route,ospf,info OSPFv2 neighbor X.X.128.33: state change from Full to 2-Way

It’s not doing it at set times throughout the day, it seems pretty sporadic. However, ether1 (X.X.128.33) and ether4 (X.X.128.47) are the two routes out of the unit, so I believe this is dropping packets.

I’m noticing it happen on ether1-ether4, which correlates to the switch chip, even though we’re not switching from this unit at all. I’ve combed over the unit’s full export and nothing is really different from our previous incarnation of this unit being a CRS125 (and having CPU issues from that).

When doing a ping from one hop away, (out the ether4) I see when this happens (X.X.136.106 is down ether2 on the above, X.X.65.17 is upstream bridging between ether1 and ether4 on the above device via OSPF:

  SEQ HOST                                     SIZE TTL TIME  STATUS             
  727 X.X.136.106                            56  63 4ms  
  728 X.X.136.106                            56  63 1ms  
  729 X.X.136.106                                         timeout            
  730 X.X.136.106                                         timeout            
  731 X.X.136.106                                         timeout            
  732 X.X.136.106                                         timeout            
  733 X.X.136.106                            56  63 1ms  
  734 X.X.128.162                            84  64 0ms   redirect host      
  735 X.X.136.106                                         timeout            
  736 X.X.65.17                               84  64 134ms TTL exceeded       
  737 X.X.65.17                               84  64 134ms TTL exceeded       
  738 X.X.65.17                               84  64 118ms TTL exceeded       
  739 X.X.65.17                               84  64 130ms TTL exceeded

Thoughts? Why would ether1-ether4 just be dropping?

Last night I upgraded from 6.27 to 6.28, and also made sure that the firmware was up to date (v3.22).

Everything is up to date on this thing, yet ether1-ether4 keep randomly dropping for 2 seconds every few hours.

We’re most likely going to swap out this unit for our shelf spare CCR1036, but I really want to be able to trust this model. The price point (plus the switch chip and SFP+ port) is perfect for our needs in places where a CRS125 doing software bridge routing instead of just plain switching doesn’t cut it. However the CCR1036 is super overkill as it never goes over 0% CPU usage at the 5-6 sites we have it-- the CCR1036 costs a bit too much for how much we need to deploy them in places where a CRS125 really won’t do and the lack for SFP+ really hurts it.

Hi, I’m having the exact same problem. I have a CCR1009 ROS 6.27, and ether1 to ether4 are going down and up (within few seconds) on a irregular basis.

Have you found any solution?

Hi, I’m having the same issue! Have you had any solution?

Have you all sent SUPOUT. RIF to support@mikrotik.com ?

We had a similar problem with a CRS226 running v6.27. All 24 ports would flap for a few seconds down/up/down/up/down/up, randomly between 8-36 hours. We tried v6.28, but other bugs caused even bigger problems, so we downgraded to v6.27. MikroTik suggested we upgrade to 6.29. So now we’re running v6.29 - it’s only been 10 hours so far… but seems ok?

We have two CCR1009 boxes - and they have been just fine on v6.27, though, for several weeks. No flapping at all.

Alas, yesterday our CRS226 - running v6.29 - flapped all its switch ports 5 times.

We noticed the same on the CRS125 as well as CRS226 with 6.28

After upgrading to 6.29 yesterday this seems to have gone now. (At least I hope so)

No joy with v6.29 for us on the CRS226. Trying v6.29.1 now…

Hey,

we are seeing the same / similar issues on CCR1009-8G-1S-1S+ running 6.28.

routerboard: yes
model: CCR1009-8G-1S-1S+
serial-number: 4AB2047433A3
current-firmware: 3.13
upgrade-firmware: 3.22


08:44:54 interface,info ether4 link down 
08:44:54 interface,info ether1 link down 
08:44:54 interface,info ether2 link down 
08:44:54 interface,info ether3 link down 
08:44:54 pppoe,ppp,info AAISP: terminating... - disconnected 
08:44:54 pppoe,ppp,info AAISP: disconnected 
08:44:54 pppoe,ppp,info AAISP: initializing... 
08:44:54 pppoe,ppp,info AAISP: waiting for packets... 
08:44:54 pppoe,ppp,info AAISP: terminating... 
08:44:54 pppoe,ppp,info AAISP: disconnected 
08:44:54 pppoe,ppp,info AAISP: initializing... 
08:44:54 pppoe,ppp,info AAISP: waiting for packets... 
08:44:54 route,ospf,info OSPFv2 neighbor 134.0.17.140: state change from Full to Down 
08:44:54 route,ospf,info OSPFv3 neighbor 134.0.17.140: state change from Full to Down 
08:45:02 interface,info ether4 link up (speed 1G, full duplex) 
08:45:02 interface,info ether1 link up (speed 100M, full duplex) 
08:45:02 interface,info ether2 link up (speed 100M, full duplex) 
08:45:02 interface,info ether3 link up (speed 1G, full duplex)

Anyone any joy with this problem?

James

Make sure firmware is updated.

That’s not helping, pukkita. 6.27, 6.28, 6.29, 6.29.1, and 6.30rc22 all seem to have the same problem.

Those are RouterOS versions, not firmware versions. I mean firmware version in /System > Routerboard

Sorry, we update the firmware at the same time as updating the software.

All on 3.22, which is the most up-to-date for CRS226.

Don’t know about CRS226, but on CCR1009 (-1S-1S+ and -1S-1S+PC) with firmware 3.22 I’m not having any port flapping on switch chip ports as the original poster, using versions 6.27 and 6.19.

A general preventive measure with routers behaving weird, specially due to previous bugs, is a reset to no defaults (exporting the configuration to a rsc file before), after firmware/RouterOS has been updated, then import the rsc file back to make sure all gets initialized correctly.

Only thing I notice with versions >= 6.27 on CCRs is SFP “micro cuts”, with nothing showing in the logs (juts stop passing traffic, does not seem SFP module brand/model related). Almost not noticeable unless you’re monitoring it. Doesn’t happen with 6.19.

[quote=“pukkita”]A general preventive measure with routers behaving weird, specially due to previous bugs, is a reset to no defaults (exporting the configuration to a rsc file before), then reload the configuration to make sure all gets initialized correctly.
[/quote]

Have done that. Have also replaced with a completely different brand new CRS226, with a different PSU. Same problem: after about 5 days, all ports flap (sometimes once, sometimes several times in a day). After another approximately 5 days, either all ports flap some more, or kernel crash.

Have not yet used CRS226 but per your comments it looks not enough stable to put it in production.

No problems with CRS125 and CRS109 so far (6.27 and 3.22).

we are seeing the same / similar issues on CCR1009-8G-1S-1S+ running 6.28.

routerboard: yes
model: CCR1009-8G-1S-1S+
serial-number: 4AB2047433A3
current-firmware: 3.13
upgrade-firmware: 3.22

@Jamesrossell have you upgraded that CCR1009 firmware?.

Very few people seem to be reporting this problem. Also, it is strange that the person who started this thread has the same problem with the CCR1009. We also have two CCR1009, but are not using the switch features (all acting as separate ports), and they have been fine - well, one crashed last night at 23:59:60 because of a leap second bug.

[quote=“pukkita”]No problems with CRS125 and CRS109 so far.
[/quote]

Two CRS125 are ok for us. Two CCR1009-1S+ are ok. Just this CRS226 is being a problem. However, it seems like the problem is not the CRS but the software, maybe, because of the original poster of this thread. :frowning:

maznu, all CCR and CRS (and some RBs) have had port flapping problems at the beggining (just launched).

Seems history is repeating with CRS226, sending supout.rif and opening a ticket is the course of action to take.

As it happens with all versions, I’d do it with the 6.30rc as they’re already focused on that version to release.

[quote=“pukkita”]Seems history is repeating with CRS226, sending supout.rif and opening a ticket is the course of action to take.
[/quote]

We opened the ticket many weeks ago now. Each time it happens, we tell MikroTik and send a supout, they say nothing. It happens again, they tell us to install the next version… it happens again… and here we are, now testing 6.30rc22… and other threads discussing the same problem suggest that it is not fixed in 6.30 yet. But maybe the supout files will help this time… maybe… I hope…!