I have a deployment of Routerboard kit around Europe and have a problem with one CCR1036 which seems to lose it’s BGP service from time to time.
The router is in Frankfurt, Germany and has a full-table transit. It also has an iBGP peer with another 1036 in London via a pseudowire service provided by a third party.
Approx every 3-4 weeks the BGP service on the Frankfurt router just seems to stop. On the console, if I do ‘routing bgp export’ I get the comments lines and then the interface hangs (until ctrl-c)
[admin@FRARTR01] > routing bgp export
# sep/07/2014 12:25:36 by RouterOS 6.18
# software id = KYWQ-UYW1
#
If I look in the web interface, both BGP instances and BGP peers are blank:
The only way to recover (that I have found) is to reboot the entire router. Incidentally, this takes a long time - approx 2 minutes from issuing the reboot command on the console until the router stops pinging. When it reboots BGP is fully operational again:
It is almost as if the BGP daemon ‘crashes’ and the router cannot restart it?
The router was deployed in June with routeros v6.15. This issue occured 3 times. I upgraded to v6.18 around 3 weeks ago and this failed with the same issue yesterday.
Has anyone else seen a similar issue with a service just stopping working? Is there any way to debug or find out why this is happening? I had hoped that a software change would have fixed it but having used two versions of the operating system, unless someone else can confirm the same bug, I am starting to think this is a hardware issue. Obviously that is a huge difficulty as the router is in Frankfurt and I am in London!
Can you make a supout.rif while you encounter the BGP problem?
It’s a long shot, but if you succeed in getting a supout, maybe Mikrotik support can find something out.
I have a 2x 1036s running 3 full v4 and v6 feeds.. one is at 47 days uptime on 6.17 right now… the other was up to about 60 days before I rebooted to test something (6.15).
Yes, this is what MT support asked me to do too. Next time it happens I will do that. I was really wondering whether anyone else had seen the same problem and found a solution for it.
That’s disturbing.
I have two CCR1036s in one of the centers and they are exchanging full BGP table with two upstream ISPs.
I’ve just checked the uptime - 7 weeks.
I’ve had some flapping BGP sessions during these 7 weeks, but it was always a L2 device in between that was breaking the connection.
What you describe is pretty similar to a problem I noticed on the CCR, it only happened once though. The problem was that almost all of the bridge interfaces were gone! A restart of the device solved it. But again, as you describe - in my case the bridge config was shown as empty.
There are no file operations possible and it wiped the admin passwd !!!
Thank god filtering kept the world to login to this ospf router and kill our network.