CCR1072-1G-8S+ BGP Locking Up

I have 3 CCR1072-1G-8S+ located in 3 different cores all running BGP to providers and running BGP between them. Over the last month all 3 have locked up at different intervals. Traffic passes through the router but all BGP stops working.

They require manually logging in and rebooting them. Was running 6.38.5 on one and 6.38.9 on another.

Upgraded to 6.40 and have not had them lock up yet but it has only been a few weeks.

Anyone else having issues with the CCR1072?

It is actually locking up both BGP and OSPF. Almost like it is dropping all forwarding.

Yes that was happening to me as well. I upgraded to 6.40.3 and it really didnt help. I swicthed most of my OSPF peers to point to point and that seems to have helped some.

Mine looks like total lockups. I am running on 6.40.1 and it has not locked up yet. Being a core router its hard to justify rebooting cores every 2 weeks for firmware.

Hi,

Do you see in the logs the message about the peering being dropped due to “hold timer expiration” (or something like that) by any chance?

As far I know, the routing process is the same for BGP and OSPF, so if it locks then both will go down, this happened to me using CCR1036, there are some posts around talking about it, happened 3 times, then didn’t happen again, I have yet to find the root cause, however I suspect it may be due a peer flapping and due to this BGP going nuts since it only use a single CPU for everything and the tileras aren’t powerful enough.

So, it has been running smooth for quite a while on 6.40.4. Then today I went in and removed a simple queue that was put in place for a DDoS attack. As soon as it removed the rule I was kicked out of the winbox and all OSPF neighbors dropped and router did not renegotiate OSPF until the router was was rebooted.

Jan/02/2018 15:11:47 system,info,account user admin logged in from X.X.X.X via winbox
<------------This is when i removed the queue even though it is logged lower down.
Jan/02/2018 15:12:37 system,info,account user admin logged out from X.X.X.X via winbox
Jan/02/2018 15:12:44 route,ospf,info OSPFv2 neighbor X.X.125.133: state change from Full to Down
Jan/02/2018 15:12:44 route,ospf,info OSPFv2 neighbor X.X.113.6: state change from Full to Down
Jan/02/2018 15:12:44 route,ospf,info OSPFv2 neighbor X.X.125.138: state change from Full to Down
Jan/02/2018 15:12:44 route,ospf,info OSPFv2 neighbor X.X.31.253: state change from Full to Down
Jan/02/2018 15:12:54 system,info,account user admin logged in from X.X.X.X via winbox
Jan/02/2018 15:12:56 system,info,account user admin logged in from X.X.X.X via winbox
Jan/02/2018 15:13:14 system,info,account user admin logged in from X.X.X.X via dude
Jan/02/2018 15:13:15 system,info,account user admin logged out from X.X.X.X via dude
Jan/02/2018 15:13:15 route,ospf,info OSPFv2 neighbor X.X.113.5: state change from Full to Down
Jan/02/2018 15:13:37 system,info,account user brescoadmin logged out from X.X.X.X via winbox
Jan/02/2018 15:13:39 system,info,account user brescoadmin logged out from X.X.X.X via winbox
Jan/02/2018 15:13:58 system,info,account user admin logged in from X.X.X.X via dude
Jan/02/2018 15:14:27 route,ospf,info OSPFv2 neighbor X.X.125.129: state change from Full to Down
Jan/02/2018 15:14:27 route,ospf,info OSPFv2 neighbor X.X.113.5: state change from ExStart to Down
Jan/02/2018 15:14:30 system,info,account user admin logged out from X.X.X.X via dude
Jan/02/2018 15:14:37 system,info simple queue removed by brescoadmin
Jan/02/2018 15:14:37 route,ospf,info Discarding packet: no neighbor with this source address
Jan/02/2018 15:14:37 route,ospf,info RouterId=X.X.125.129
Jan/02/2018 15:14:37 route,ospf,info source=X.X.31.94
Jan/02/2018 15:14:37 route,ospf,info Discarding packet: no neighbor with this source address
Jan/02/2018 15:14:37 route,ospf,info RouterId=X.X.125.133
Jan/02/2018 15:14:37 route,ospf,info source=X.X.125.197
Jan/02/2018 15:14:37 route,ospf,info Discarding packet: no neighbor with this source address
Jan/02/2018 15:14:37 route,ospf,info RouterId=X.X.113.5
Jan/02/2018 15:14:37 route,ospf,info source=X.X.125.193
Jan/02/2018 15:14:38 route,ospf,info Discarding packet: no neighbor with this source address
Jan/02/2018 15:14:38 route,ospf,info RouterId=X.X.113.6
Jan/02/2018 15:14:38 route,ospf,info source=X.X.31.118
Jan/02/2018 15:14:38 route,ospf,info Discarding packet: no neighbor with this source address
Jan/02/2018 15:14:38 route,ospf,info RouterId=X.X.113.5
Jan/02/2018 15:14:38 route,ospf,info source=X.X.125.193
Jan/02/2018 15:14:38 route,ospf,info Discarding packet: no neighbor with this source address
Jan/02/2018 15:14:38 route,ospf,info RouterId=X.X.125.129
Jan/02/2018 15:14:38 route,ospf,info source=X.X.31.94
Jan/02/2018 15:14:39 route,ospf,info Discarding packet: no neighbor with this source address
Jan/02/2018 15:14:39 route,ospf,info RouterId=X.X.113.6
Jan/02/2018 15:14:39 route,ospf,info source=X.X.31.118
Jan/02/2018 15:14:39 route,ospf,info Ignoring Link State Acknowledgment packet: wrong peer state
Jan/02/2018 15:14:39 route,ospf,info state=ExStart
Jan/02/2018 15:14:39 route,ospf,info Discarding packet: no neighbor with this source address
Jan/02/2018 15:14:39 route,ospf,info RouterId=X.X.125.133
Jan/02/2018 15:14:39 route,ospf,info source=X.X.125.197
Jan/02/2018 15:14:40 route,ospf,info Ignoring Link State Acknowledgment packet: wrong peer state
Jan/02/2018 15:14:40 route,ospf,info state=ExStart
Jan/02/2018 15:14:40 route,ospf,info Ignoring Link State Acknowledgment packet: wrong peer state
Jan/02/2018 15:14:40 route,ospf,info state=ExStart
Jan/02/2018 15:15:49 system,error,critical login failure for user admin from 6C:3B:6B:XX:XX:XX via mac-telnet
Jan/02/2018 15:16:06 system,error,critical login failure for user admin from 6C:3B:6B:XX:XX:XX via mac-telnet
Jan/02/2018 15:16:17 system,info,account user admin logged in from 6C:3B:6B:XX:XX:XX via mac-telnet
Jan/02/2018 15:16:33 system,info,account user admin logged out from X.X.X.X via dude
Jan/02/2018 15:16:33 system,info,account user admin logged out from X.X.X.X via dude
Jan/02/2018 15:16:33 system,info,account user admin logged out from X.X.X.X via dude
Jan/02/2018 15:16:33 system,info,account user admin logged out from 6C:3B:6B:XX:XX:XX via mac-telnet
Jan/02/2018 15:16:34 system,info,account user admin logged out from 6C:3B:6B:XX:XX:XX via mac-telnet
Jan/02/2018 15:16:34 system,info router rebooted
Jan/02/2018 15:16:36 route,bgp,info Failed to open TCP connection: Network is unreachable
Jan/02/2018 15:16:36 route,bgp,info RemoteAddress=X.X.113.6
Jan/02/2018 15:16:36 route,bgp,info Failed to open TCP connection: Network is unreachable
Jan/02/2018 15:16:36 route,bgp,info RemoteAddress=X.X.22.49
Jan/02/2018 15:16:36 route,bgp,info Failed to open TCP connection: Network is unreachable
Jan/02/2018 15:16:36 route,bgp,info RemoteAddress=X.X.27.105
Jan/02/2018 15:16:36 route,bgp,info Failed to open TCP connection: Network is unreachable
Jan/02/2018 15:16:36 route,bgp,info RemoteAddress=X.X.113.5
Jan/02/2018 15:16:37 route,bgp,info Failed to open TCP connection: Network is unreachable
Jan/02/2018 15:16:37 route,bgp,info RemoteAddress=X.X.204.2
Jan/02/2018 15:16:37 route,bgp,info Failed to open TCP connection: Network is unreachable
Jan/02/2018 15:16:37 route,bgp,info RemoteAddress=X.X.204.1
Jan/02/2018 15:16:38 interface,info Interface - link up
A bunch of link up allerts
Jan/02/2018 15:16:55 bfd,error discarding BFD packet: unsupported version 0
Jan/02/2018 15:16:55 bfd,error source: X.X.22.49
Jan/02/2018 15:17:02 route,bgp,info Connection opened by remote host
Jan/02/2018 15:17:02 route,bgp,info RemoteAddress=X.X.27.105
Jan/02/2018 15:17:14 route,bgp,info Connection opened by remote host
Jan/02/2018 15:17:14 route,bgp,info RemoteAddress=X.X.22.49
Jan/02/2018 15:17:43 route,bgp,info TCP connection established
Jan/02/2018 15:17:43 route,bgp,info RemoteAddress=X.X.204.2
Jan/02/2018 15:17:44 route,bgp,info TCP connection established
Jan/02/2018 15:17:44 route,bgp,info RemoteAddress=X.X.204.1
Jan/02/2018 15:19:14 system,info,account user admin logged in from X.X.X.X via winbox
Jan/02/2018 15:19:48 route,bgp,info TCP connection established
Jan/02/2018 15:19:48 route,bgp,info RemoteAddress=X.X.X.X
Jan/02/2018 15:19:56 route,bgp,info Connection opened by remote host
Jan/02/2018 15:19:56 route,bgp,info RemoteAddress=X.X.X.X

As you can see, all ospf crashed once the Queue was removed.

It looks like removing queues causes traffic through the router to take a hit. Tested on a different router and when removing a 300M queue, traffic went from 300M down to about 20M then back up to 350M

Speculating here but:
This sounds not so strange removing the queue will probably do (void, null) on all packages thats are currently in the cue. rendering lost packets for the affected flows and you depend on higher level protocols to recover.

Is it the same effect if you pause a queue and then deleting it after a while?

Pausing the Queue has the same effect. Looks like it hiccups all forwarding traffic for a brief moment.

Are you over clocking? Have you tried 6.41 yet?

Send the support file to mikrotik support yet? Anything back?

We are on 6.37.5. Runs very stable on ccr8g2s+. Bgp + Ospf some filtering no queues. Full routing table.

We run BGP with about 4 peers at each router. Has been some what stable. Issue is I can not afford the downtime to reboot every new firmware that comes out. Plus with the frequency of releases I would not have a long enough up-time for the issue to repeat itself. Every support ticket we open gets hit with update the firmware and test.