Community discussions

 
techmngr
just joined
Topic Author
Posts: 9
Joined: Wed Mar 08, 2017 9:38 am

OSPF and BGP Issues

Fri Jun 16, 2017 12:58 pm

Hi Sir/s,

I just would like to check if anyone has encountered the issue we're having now on our 1072s.

For the past 12hrs, an unusual behavior occurred already thrice- Internet services would all of a sudden stop working.

If we check the logs, we would just see OSPF and BGP going down.

We'd resolve it by rebooting one of the 1072s - both are connected to each other.

Could there be something triggering the behavior? No changes done whatsoever before the issue occurs.
Configurations have been in place for years now and I'm thinking if this is a configuration issue then why
would a reboot solve the issue and not a configuration change?

Version is at 6.33.2 - CPU and Memory utilization is low so I guess that is not an issue.

Any suggestions what else I can check on?

Thanks in advance.
 
airbanduk
newbie
Posts: 45
Joined: Mon Jun 12, 2017 2:30 pm

Re: OSPF and BGP Issues

Fri Jun 16, 2017 1:09 pm

Have you tried a later firmware release? When was the last update and configuration change made?

I've been using 6.35 on the 1072 and they've been really stable. The only time I've seen OSPF play up without a config change is on wireless links if the signal degrades, seems the remote router needs a reboot to reconnect for some reason.
 
techmngr
just joined
Topic Author
Posts: 9
Joined: Wed Mar 08, 2017 9:38 am

Re: OSPF and BGP Issues

Fri Jun 16, 2017 1:24 pm

Here's an example of what I see on the logs..unfortunately, it just says OSPF and BGP went down..

"12:15:25 route,ospf,info OSPFv2 neighbor 10.0.0.1: state change from Full to Init
12:16:00 system,info,account user noc logged in from 103.25.176.2 via winbox
12:16:13 interface,info <customer> link down
12:17:16 route,bgp,error HoldTimer expired
12:17:16 route,bgp,error RemoteAddress=45.64.80.146
12:17:37 route,bgp,error Received notification
12:17:37 route,bgp,error Hold timer expired, subcode=0
12:18:16 route,bgp,info Failed to open TCP connection: No route to host
12:18:16 route,bgp,info RemoteAddress=45.64.80.146
12:18:17 route,bgp,error Received notification
12:18:17 route,bgp,error Hold timer expired, subcode=0
12:18:24 route,ospf,info OSPFv2 neighbor 10.0.0.1: state change from ExStart to 2-Way
12:18:28 route,bgp,info Failed to open TCP connection: Network is unreachable
12:18:28 route,bgp,info RemoteAddress=10.0.0.1
12:18:32 route,ospf,info OSPFv2 neighbor 10.0.0.1: state change from ExStart to Init
12:18:36 route,bgp,info Failed to open TCP connection: No route to host
12:18:36 route,bgp,info RemoteAddress=45.64.80.146
12:18:56 route,bgp,info Failed to open TCP connection: No route to host
12:18:56 route,bgp,info RemoteAddress=45.64.80.146
12:19:12 route,bgp,info Connection opened by remote host
12:19:12 route,bgp,info RemoteAddress=10.0.0.1
12:19:16 route,bgp,info Failed to open TCP connection: No route to host
12:19:16 route,bgp,info RemoteAddress=45.64.80.146
 
techmngr
just joined
Topic Author
Posts: 9
Joined: Wed Mar 08, 2017 9:38 am

Re: OSPF and BGP Issues

Fri Jun 16, 2017 1:29 pm

Have you tried a later firmware release? When was the last update and configuration change made?

I've been using 6.35 on the 1072 and they've been really stable. The only time I've seen OSPF play up without a config change is on wireless links if the signal degrades, seems the remote router needs a reboot to reconnect for some reason.
sir,

just 6hrs ago I upgraded to latest bug fix version 6.37.5 and so far, for the past 6hrs or so the issue hasn't re-surfaced. as for configuration change - none. No changes done hours and days before the incident occurred. no wireless configurations as well on the 1072s, we're currently using it as edge router since we're an ISP company. is it safe to assume to this is not a configuration issue?

thanks for your response.
 
airbanduk
newbie
Posts: 45
Joined: Mon Jun 12, 2017 2:30 pm

Re: OSPF and BGP Issues

Fri Jun 16, 2017 1:49 pm

Those errors are exactly what I see on CCR1009/1016 in the access network when the wireless links cause the neighbours to drop. On one side the neighbour comes up in 'Full' state, but the other cycles through the OSPF FSM in the way you've shown. I have to reboot the one that thinks it's Full to bring the neighbours back up correctly. As you don't have wireless links I can't say why it might be happening, but the symptoms seem identical.

Again, no config changes on our CCRs before this happens. If the wireless signals are tuned to a strong level, the problem disappears. Suggests to me the cause is a bad link, but the CCR must have a bug somewhere that stops OSPF from forming correctly again. I've tried using different OSPF link types - broadcast, nbma, ptp, ptmp - non of them have solved the issue. I've reverted to a script to automatically reboot the router that thinks it's 'Full', but in the core/edge I don't see how you could do this.
 
techmngr
just joined
Topic Author
Posts: 9
Joined: Wed Mar 08, 2017 9:38 am

Re: OSPF and BGP Issues

Fri Jun 16, 2017 2:06 pm

sir,

in Cisco you have a "sh tech" command that we can actually analyze - does Mikrotik have any similar commands? I'm a newbie with Mikrotik and I was hoping I could check something out of the normal "log" files in Mikrotik that would somehow give me a clue as to what is causing or being a trigger to the sudden and random "down" of ospf and bgp?

thanks.
 
User avatar
IPANetEngineer
Trainer
Trainer
Posts: 1053
Joined: Fri Aug 10, 2012 6:46 am
Location: Jackson, MS, USA
Contact:

Re: OSPF and BGP Issues

Fri Jun 16, 2017 3:07 pm

sir,

in Cisco you have a "sh tech" command that we can actually analyze - does Mikrotik have any similar commands? I'm a newbie with Mikrotik and I was hoping I could check something out of the normal "log" files in Mikrotik that would somehow give me a clue as to what is causing or being a trigger to the sudden and random "down" of ospf and bgp?

thanks.
supout.rif is the equivalent of a show tech in the Cisco world. You can log into your account and view the contents as well as send it into MikroTik with a ticket.

https://wiki.mikrotik.com/wiki/Manual:S ... utput_File
Global - MikroTik Support & Consulting - English | Francais | Español | Portuguese +1 855-645-7684
https://iparchitechs.com/services/mikro ... l-support/ mikrotiksupport@iparchitechs.com
 
User avatar
IPANetEngineer
Trainer
Trainer
Posts: 1053
Joined: Fri Aug 10, 2012 6:46 am
Location: Jackson, MS, USA
Contact:

Re: OSPF and BGP Issues

Fri Jun 16, 2017 3:10 pm

As far as RouterOS version, I advise all of my clients to run bigfix code as it is much more stable in production. One other practice that can contribute to OSPF/BGP instability is running a lot of mismatched versions on the routers. 6.37.5 bugfix has worked well for a lot of our clients that depend on BGP/OSPF.
Global - MikroTik Support & Consulting - English | Francais | Español | Portuguese +1 855-645-7684
https://iparchitechs.com/services/mikro ... l-support/ mikrotiksupport@iparchitechs.com
 
techmngr
just joined
Topic Author
Posts: 9
Joined: Wed Mar 08, 2017 9:38 am

Re: OSPF and BGP Issues

Mon Jun 19, 2017 1:55 pm

As far as RouterOS version, I advise all of my clients to run bigfix code as it is much more stable in production. One other practice that can contribute to OSPF/BGP instability is running a lot of mismatched versions on the routers. 6.37.5 bugfix has worked well for a lot of our clients that depend on BGP/OSPF.
Hi Sir..thanks for your response..so far I've upgraded both my 1072s to 6.37.5, will try to upgrade 2 x 1036s this weekend to the same bug fix version. I was able to generate a supout.rif file and was able to open it via the supout viewer. My question would be when would be the best time to do generate the file - right after an unusual behavior is encountered? The log files are deleted every time you reboot the router and in my case, a reboot is done to resolve the issue - temporarily that is. I guess I was hoping to have a means of finding out what triggers the behavior. Thanks again.
 
techmngr
just joined
Topic Author
Posts: 9
Joined: Wed Mar 08, 2017 9:38 am

Re: OSPF and BGP Issues

Mon Jun 19, 2017 2:00 pm

Those errors are exactly what I see on CCR1009/1016 in the access network when the wireless links cause the neighbours to drop. On one side the neighbour comes up in 'Full' state, but the other cycles through the OSPF FSM in the way you've shown. I have to reboot the one that thinks it's Full to bring the neighbours back up correctly. As you don't have wireless links I can't say why it might be happening, but the symptoms seem identical.

Again, no config changes on our CCRs before this happens. If the wireless signals are tuned to a strong level, the problem disappears. Suggests to me the cause is a bad link, but the CCR must have a bug somewhere that stops OSPF from forming correctly again. I've tried using different OSPF link types - broadcast, nbma, ptp, ptmp - non of them have solved the issue. I've reverted to a script to automatically reboot the router that thinks it's 'Full', but in the core/edge I don't see how you could do this.
Thank you sir..though we don't have any wireless features enabled on both 1072s. Last I did was to delete files on my HDD since I've also noticed it has reached 80% utilization, that gives me 20% free space on my HDD. Could it be a factor? I mean will an 80% utilization on my HDD probably cause the router to hang or stop working? I mean as I've notice every time it happens, uptime doesn't really reset so technically router is still UP, it's only my BGP and OSPF neighbors that break and recover after the reboot. :-(
 
Kevo
Frequent Visitor
Frequent Visitor
Posts: 55
Joined: Wed Oct 12, 2011 1:38 am

Re: OSPF and BGP Issues

Mon Nov 20, 2017 11:56 am

We've seen this problem a couple of times now on our 1072. It looks like something happens with OSPF and then a little while after we get the hold timer error with BGP and the routing fails. After some minutes bgp will come back up. We've only run bugfix releases and this has happened before on 6.37.5 and now on 6.39.3.

Log shows

862 Nov/19/2017 20:42:59 memory route, ospf, info OSPFv2 neighbor 172.17.2.11: state change from ExStart to Down
863 Nov/19/2017 20:43:17 memory route, ospf, info OSPFv2 neighbor 172.17.2.11: state change from ExStart to Down
864 Nov/19/2017 20:43:52 memory route, ospf, info OSPFv2 neighbor 172.17.2.11: state change from Exchange to Down
865 Nov/19/2017 20:44:57 memory route, ospf, info OSPFv2 neighbor 172.17.2.11: state change from ExStart to Down
866 Nov/19/2017 20:45:53 memory route, ospf, info OSPFv2 neighbor 172.17.2.11: state change from Init to Down
867 Nov/19/2017 20:46:55 memory route, bgp, error HoldTimer expired
868 Nov/19/2017 20:46:55 memory route, bgp, error RemoteAddress=111.222.111.123
869 Nov/19/2017 20:47:26 memory route, bgp, info Connection opened by remote host
870 Nov/19/2017 20:47:26 memory route, bgp, info RemoteAddress=111.222.111.123


Is there any way to troubleshoot this when it happens again. I think it's been a few months since it happened last.

Who is online

Users browsing this forum: No registered users and 8 guests