Community discussions

 
User avatar
bigcw
Member Candidate
Member Candidate
Topic Author
Posts: 105
Joined: Mon Sep 08, 2014 2:38 pm

Loss of BGP function after 3-4 weeks

Mon Sep 08, 2014 3:00 pm

Hi everyone

I have a deployment of Routerboard kit around Europe and have a problem with one CCR1036 which seems to lose it's BGP service from time to time.

The router is in Frankfurt, Germany and has a full-table transit. It also has an iBGP peer with another 1036 in London via a pseudowire service provided by a third party.

Approx every 3-4 weeks the BGP service on the Frankfurt router just seems to stop. On the console, if I do 'routing bgp export' I get the comments lines and then the interface hangs (until ctrl-c)
[admin@FRARTR01] > routing bgp export
# sep/07/2014 12:25:36 by RouterOS 6.18
# software id = KYWQ-UYW1
#
If I look in the web interface, both BGP instances and BGP peers are blank:

Image
Image

The only way to recover (that I have found) is to reboot the entire router. Incidentally, this takes a long time - approx 2 minutes from issuing the reboot command on the console until the router stops pinging. When it reboots BGP is fully operational again:

Image

It is almost as if the BGP daemon 'crashes' and the router cannot restart it?

The router was deployed in June with routeros v6.15. This issue occured 3 times. I upgraded to v6.18 around 3 weeks ago and this failed with the same issue yesterday.

Has anyone else seen a similar issue with a service just stopping working? Is there any way to debug or find out why this is happening? I had hoped that a software change would have fixed it but having used two versions of the operating system, unless someone else can confirm the same bug, I am starting to think this is a hardware issue. Obviously that is a huge difficulty as the router is in Frankfurt and I am in London!

Thanks for any pointers offered.

Chris
Ecom International Network - Operators of AS61337 with POPs in Europe and North America - www.ecomltd.co.uk
Colocker Data Centre - The data centre with a difference! - www.colocker.com
 
User avatar
wulfgard
Frequent Visitor
Frequent Visitor
Posts: 86
Joined: Wed Oct 17, 2012 1:06 pm
Location: France
Contact:

Re: Loss of BGP function after 3-4 weeks

Tue Sep 09, 2014 11:35 am

Hello

no problem with BGP and CCR1036 but you must post more data about your setup,load and audit your L2 pseudowire

you are speaking about crash every 2 weeks but on your picture we can see that session are up for 13 minutes !

regards
Thierry
System and Network Engineer
Mikrotik Trainer - MTCNA MTCRE
Official French Mikrotik Distributor
 
User avatar
bigcw
Member Candidate
Member Candidate
Topic Author
Posts: 105
Joined: Mon Sep 08, 2014 2:38 pm

Re: Loss of BGP function after 3-4 weeks

Tue Sep 09, 2014 12:53 pm

you are speaking about crash every 2 weeks but on your picture we can see that session are up for 13 minutes !
Yes, I took the screenshot after I had rebooted the router and the sessions had loaded fully, hence why they have only been up for a few minutes.
Ecom International Network - Operators of AS61337 with POPs in Europe and North America - www.ecomltd.co.uk
Colocker Data Centre - The data centre with a difference! - www.colocker.com
 
User avatar
wulfgard
Frequent Visitor
Frequent Visitor
Posts: 86
Joined: Wed Oct 17, 2012 1:06 pm
Location: France
Contact:

Re: Loss of BGP function after 3-4 weeks

Tue Sep 09, 2014 5:03 pm

can you post configuration or send me off list
System and Network Engineer
Mikrotik Trainer - MTCNA MTCRE
Official French Mikrotik Distributor
 
hedele
Member
Member
Posts: 338
Joined: Tue Feb 24, 2009 11:23 pm

Re: Loss of BGP function after 3-4 weeks

Wed Sep 10, 2014 5:58 pm

Can you make a supout.rif while you encounter the BGP problem?
It's a long shot, but if you succeed in getting a supout, maybe Mikrotik support can find something out.
 
roadracer96
Forum Veteran
Forum Veteran
Posts: 714
Joined: Tue Aug 25, 2009 12:01 am

Re: Loss of BGP function after 3-4 weeks

Wed Sep 10, 2014 9:33 pm

I have a 2x 1036s running 3 full v4 and v6 feeds.. one is at 47 days uptime on 6.17 right now... the other was up to about 60 days before I rebooted to test something (6.15).

Plus some queuing and simple policy routing.
 
User avatar
bigcw
Member Candidate
Member Candidate
Topic Author
Posts: 105
Joined: Mon Sep 08, 2014 2:38 pm

Re: Loss of BGP function after 3-4 weeks

Thu Sep 11, 2014 5:35 pm

Can you make a supout.rif while you encounter the BGP problem?
It's a long shot, but if you succeed in getting a supout, maybe Mikrotik support can find something out.
Yes, this is what MT support asked me to do too. Next time it happens I will do that. I was really wondering whether anyone else had seen the same problem and found a solution for it.

Chris
Ecom International Network - Operators of AS61337 with POPs in Europe and North America - www.ecomltd.co.uk
Colocker Data Centre - The data centre with a difference! - www.colocker.com
 
lz1dsb
Member Candidate
Member Candidate
Posts: 222
Joined: Wed Aug 07, 2013 11:48 am

Re: Loss of BGP function after 3-4 weeks

Mon Sep 15, 2014 6:42 pm

That's disturbing.
I have two CCR1036s in one of the centers and they are exchanging full BGP table with two upstream ISPs.
I've just checked the uptime - 7 weeks.
I've had some flapping BGP sessions during these 7 weeks, but it was always a L2 device in between that was breaking the connection.
What you describe is pretty similar to a problem I noticed on the CCR, it only happened once though. The problem was that almost all of the bridge interfaces were gone! A restart of the device solved it. But again, as you describe - in my case the bridge config was shown as empty.
 
Petzl
Member Candidate
Member Candidate
Posts: 207
Joined: Sun Jun 30, 2013 12:14 pm

Re: Loss of BGP function after 3-4 weeks

Sat Sep 20, 2014 5:39 pm

why u use webfig and not winbox ?

just a question, i prefer winbox over webfig
 
User avatar
bigcw
Member Candidate
Member Candidate
Topic Author
Posts: 105
Joined: Mon Sep 08, 2014 2:38 pm

Re: Loss of BGP function after 3-4 weeks

Wed Sep 24, 2014 6:58 pm

I usually use SSH. It was just easier to get the screenshots from webfig.
Ecom International Network - Operators of AS61337 with POPs in Europe and North America - www.ecomltd.co.uk
Colocker Data Centre - The data centre with a difference! - www.colocker.com
 
User avatar
bigcw
Member Candidate
Member Candidate
Topic Author
Posts: 105
Joined: Mon Sep 08, 2014 2:38 pm

Re: Loss of BGP function after 3-4 weeks

Mon Sep 29, 2014 1:19 pm

Ok this just happened again, 21 days from last time. Have sent a supout to Mikrotik for analysis, will report back when they respond to me.

Chris
Ecom International Network - Operators of AS61337 with POPs in Europe and North America - www.ecomltd.co.uk
Colocker Data Centre - The data centre with a difference! - www.colocker.com
 
doush
Long time Member
Long time Member
Posts: 621
Joined: Thu Jun 04, 2009 3:11 pm

Re: Loss of BGP function after 3-4 weeks

Mon Sep 29, 2014 4:30 pm

We had the same thing. It is not about BGP. Most prob It is about the latest configuration set is not saved somehow and is lost.

We had a few routes and IP addresses added to CCR1036 lately and a few days later all were gone.

Could you please confirm that BGP was one of the last config changes made to this router ?
Hi everyone

I have a deployment of Routerboard kit around Europe and have a problem with one CCR1036 which seems to lose it's BGP service from time to time.

The router is in Frankfurt, Germany and has a full-table transit. It also has an iBGP peer with another 1036 in London via a pseudowire service provided by a third party.

Approx every 3-4 weeks the BGP service on the Frankfurt router just seems to stop. On the console, if I do 'routing bgp export' I get the comments lines and then the interface hangs (until ctrl-c)
[admin@FRARTR01] > routing bgp export
# sep/07/2014 12:25:36 by RouterOS 6.18
# software id = KYWQ-UYW1
#
If I look in the web interface, both BGP instances and BGP peers are blank:



The only way to recover (that I have found) is to reboot the entire router. Incidentally, this takes a long time - approx 2 minutes from issuing the reboot command on the console until the router stops pinging. When it reboots BGP is fully operational again:



It is almost as if the BGP daemon 'crashes' and the router cannot restart it?

The router was deployed in June with routeros v6.15. This issue occured 3 times. I upgraded to v6.18 around 3 weeks ago and this failed with the same issue yesterday.

Has anyone else seen a similar issue with a service just stopping working? Is there any way to debug or find out why this is happening? I had hoped that a software change would have fixed it but having used two versions of the operating system, unless someone else can confirm the same bug, I am starting to think this is a hardware issue. Obviously that is a huge difficulty as the router is in Frankfurt and I am in London!

Thanks for any pointers offered.

Chris
Last edited by doush on Mon Sep 29, 2014 10:21 pm, edited 1 time in total.
 
ste
Forum Guru
Forum Guru
Posts: 1803
Joined: Sun Feb 13, 2005 11:21 pm

Re: Loss of BGP function after 3-4 weeks

Mon Sep 29, 2014 5:25 pm

Have a look at the memory usage. May be a memory leak?
 
doush
Long time Member
Long time Member
Posts: 621
Joined: Thu Jun 04, 2009 3:11 pm

Re: Loss of BGP function after 3-4 weeks

Mon Sep 29, 2014 10:22 pm

I dont think a memroy leak. I think something about disks because whenever something like this happened to us, it cant do file operations.
 
FutileNetworks
newbie
Posts: 36
Joined: Tue Jan 15, 2013 9:14 pm

Re: Loss of BGP function after 3-4 weeks

Tue Sep 30, 2014 2:15 am

Had this happen to a CCR 1009 of mine yesterday also on 6.18, upgraded to 6.19 so we'll see what happens.
 
ste
Forum Guru
Forum Guru
Posts: 1803
Joined: Sun Feb 13, 2005 11:21 pm

Re: Loss of BGP function after 3-4 weeks

Tue Sep 30, 2014 12:01 pm

I dont think a memroy leak. I think something about disks because whenever something like this happened to us, it cant do file operations.
This effect I have seen on one of our CCRs.

There are no file operations possible and it wiped the admin passwd !!!
Thank god filtering kept the world to login to this ospf router and kill our network.
 
doush
Long time Member
Long time Member
Posts: 621
Joined: Thu Jun 04, 2009 3:11 pm

Re: Loss of BGP function after 3-4 weeks

Wed Oct 01, 2014 3:10 pm

I dont think a memroy leak. I think something about disks because whenever something like this happened to us, it cant do file operations.
This effect I have seen on one of our CCRs.

There are no file operations possible and it wiped the admin passwd !!!
Thank god filtering kept the world to login to this ospf router and kill our network.
Yes you are right. Admin password is set to blank also. It is one of the symptoms of this bug.

Who is online

Users browsing this forum: No registered users and 4 guests