Community discussions

MikroTik App
 
User avatar
Wolfraider
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 88
Joined: Wed Jul 15, 2015 8:06 pm

Random OSPF State Down

Mon Feb 18, 2019 6:13 am

We have a central core and several border routers that randomly go offline. We have noticed that all OSPF connections report down at the same time. We have 1 big MPLS circuit connecting everything together but we created a VLAN per site (currently 28 sites but planning on adding another 20) and setup /30 PTP connections in order to segregate and minimize broadcast. We also have MPLS configured and we are using BGP for all public addressing and only use OSPF for management and loopback addressing. We do not show any interface errors or drops. Anyone have an idea on why this might happen?

Error log
feb/17 21:19:33 route,ospf,info OSPFv2 neighbor 10.200.0.1: state change from Full to Init 
feb/17 21:19:34 route,ospf,info OSPFv2 neighbor 10.200.0.1: state change from ExStart to Down 
feb/17 21:19:48 route,ospf,info OSPFv2 neighbor 10.200.0.1: state change from Exchange to Down 
feb/17 21:19:59 route,ospf,info OSPFv2 neighbor 10.200.0.1: state change from ExStart to Down 
feb/17 21:20:14 route,ospf,info OSPFv2 neighbor 10.200.0.1: state change from Full to Down 
feb/17 21:20:38 route,bgp,info Failed to open TCP connection: Network is unreachable 
feb/17 21:20:38 route,bgp,info     RemoteAddress=10.200.0.1 
feb/17 21:20:49 route,ospf,info OSPFv2 neighbor 10.200.0.1: state change from Full to Down 
feb/17 21:21:04 route,ospf,info OSPFv2 neighbor 10.200.0.1: state change from Full to Init 
feb/17 21:21:05 route,ospf,info OSPFv2 neighbor 10.200.0.1: state change from Init to Down 
feb/17 21:21:26 route,bgp,info Connection opened by remote host 
feb/17 21:21:26 route,bgp,info     RemoteAddress=10.200.0.1 
Sanitized Configs
Core
/interface bridge
add fast-forward=no name=LoopBack
/interface ethernet
set [ find default-name=sfp-sfpplus3 ] comment="MPLS" l2mtu=2024 mtu=2000
/interface vlan
add comment="Office - MPLS" interface=sfp-sfpplus3 name="MPLS - vlan3001" vlan-id=3001
/routing bgp instance
set default as=300 router-id=10.200.0.1
/routing ospf instance
set [ find default=yes ] mpls-te-area=backbone mpls-te-router-id=LoopBack redistribute-other-ospf=as-type-1 router-id=10.200.0.1
/ip firewall connection tracking
set enabled=no
/ip address
add address=10.200.0.1 interface=LoopBack network=10.200.0.1
add address=10.0.0.33/30 interface="MPLS - vlan3001" network=10.0.0.32
/mpls interface
set [ find default=yes ] mpls-mtu=2020
/mpls ldp
set enabled=yes lsr-id=10.200.0.1 transport-address=10.200.0.1
/mpls ldp interface
add interface="MPLS - vlan3001"
/routing bgp peer
add default-originate=always name=Core1-Office remote-address=10.200.0.10 remote-as=300 ttl=default update-source=LoopBack use-bfd=yes
/routing ospf interface
add interface="MPLS - vlan3001" network-type=point-to-point priority=2 use-bfd=yes
/routing ospf network
add area=backbone network=10.200.0.1/32
add area=backbone network=10.0.0.32/30
Office Site
/interface bridge
add fast-forward=no name=LoopBack
/interface ethernet
set [ find default-name=combo1 ] l2mtu=2024 mtu=2000
/interface vlan
add comment="Core1 - MPLS" interface=combo1 name="combo1 - vlan3001" vlan-id=3001
/routing bgp instance
set default as=300 router-id=10.200.0.10
/routing ospf instance
set [ find default=yes ] router-id=10.200.0.10
/ip address
add address=10.0.0.34/30 interface="combo1 - vlan3001" network=10.0.0.32
add address=10.200.0.10 interface=LoopBack network=10.200.0.10
/mpls ldp
set enabled=yes lsr-id=10.200.0.10 transport-address=10.200.0.10
/mpls ldp interface
add interface="combo1 - vlan3001"
/routing bgp peer
add name=Core1-Office remote-address=10.200.0.1 remote-as=300 ttl=default update-source=LoopBack use-bfd=yes
/routing ospf interface
add interface="combo1 - vlan3001" network-type=point-to-point use-bfd=yes
/routing ospf network
add area=backbone network=10.200.0.10/32
add area=backbone network=10.0.0.32/30
 
User avatar
Wolfraider
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 88
Joined: Wed Jul 15, 2015 8:06 pm

Re: Random OSPF State Down

Mon Feb 18, 2019 6:45 am

I have enabled OSPF debug, waiting for it to go down again.
 
NanaK
just joined
Posts: 4
Joined: Mon Feb 18, 2019 8:53 am

Re: Random OSPF State Down

Mon Feb 18, 2019 11:32 am

I have experienced a similar problem and identified the symptoms to be "issues" on the layer 2 circuit between my CE and PE devices respectively. A packet capture done on my CE and PE device revealed that OSPF multicast traffic from my PE towards the CE devices were being dropped.

I then implemented a temporary fix by changing the OSPF network type from point –to point to nbma (note that nbma doesn’t make use of multicast for its hello messages) and this fixed the issue.

I will suggest you try and temporarily change the OSPF network type on both the CE and PE from P2P to NBMA and add the necessary NBMA neighbors respectively for one of the affected sites and see if that resolves the problem. Should this not resolve your problem, please do a packet capture on both your CE and PE device and analyses the OSPF packets.
 
User avatar
Wolfraider
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 88
Joined: Wed Jul 15, 2015 8:06 pm

Re: Random OSPF State Down

Mon Feb 18, 2019 8:19 pm

Thanks, I will test that.
 
User avatar
Wolfraider
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 88
Joined: Wed Jul 15, 2015 8:06 pm

Re: Random OSPF State Down

Wed Feb 20, 2019 4:50 am

Setup NBMA on our link. Set the core as priority 1 and the site as priority 0. Also disabled BFD. Lets see how that works.
 
User avatar
mrz
MikroTik Support
MikroTik Support
Posts: 7042
Joined: Wed Feb 07, 2007 12:45 pm
Location: Latvia
Contact:

Re: Random OSPF State Down

Wed Feb 20, 2019 11:51 am

What kind of router is this? CCR?
 
User avatar
Wolfraider
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 88
Joined: Wed Jul 15, 2015 8:06 pm

Re: Random OSPF State Down

Wed Feb 20, 2019 9:13 pm

Core - CCR1072
Client Sites -
CCR1009
RB4011
RB2011
 
User avatar
Murmaider
Member Candidate
Member Candidate
Posts: 126
Joined: Fri Oct 30, 2015 10:10 am

Re: Random OSPF State Down

Thu Feb 21, 2019 7:11 am

What is the output of the below on all your core and office site routers:
/routing ospf interface print
 
User avatar
mrz
MikroTik Support
MikroTik Support
Posts: 7042
Joined: Wed Feb 07, 2007 12:45 pm
Location: Latvia
Contact:

Re: Random OSPF State Down

Thu Feb 21, 2019 10:27 am

Most likely cause is BFD, it may report link downs on CCR router even if link is ok. I would suggest no to use BFD on CCRs.
 
User avatar
Wolfraider
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 88
Joined: Wed Jul 15, 2015 8:06 pm

Re: Random OSPF State Down

Fri Feb 22, 2019 6:11 pm

Had another outage last night that spiked the CPUs on the CCR1072 core to 100%. We disabled BFD on all OSPF and BGP links. Hoping that doesn't happen again. The one test link stayed connected but could not route since the CPUs was maxed.
 
User avatar
kiboi
just joined
Posts: 1
Joined: Wed Oct 16, 2019 9:26 pm

Re: Random OSPF State Down

Wed Dec 04, 2019 1:29 am

I have a similar issue but would not go even changing OSPF type from p2p to NBMA. I am running EBGP, IBGP & OSPF
My setup is
CCR1036-8G-2S+ ------------------CHR
|
CRS317-1G-16S+
|-------------------------------------| ------------------------------------------------------------|
CCR1036-12G-4S CCR1009-7G-1C-1S+ CCR1009-7G-1C-1S+

The OSPF keeps dropping down for all adjacent routers at once (same time) and drops iBGP too.
What could be the issue running ROS long term v 6.44.6. I ruled away my switch failure since there is a CHR directly connected to the core router and still experiencing the same.
Below is the common error experience
22:53:08 route,ospf,info OSPFv2 neighbor 102.xx.xx.6: state change from Full to Down
22:53:08 route,ospf,info OSPFv2 neighbor 102.xx.xx.17: state change from Full to Down
22:53:08 route,ospf,info OSPFv2 neighbor 102.xx.xx.18: state change from Full to Down
22:53:08 route,ospf,info OSPFv2 neighbor 102.xx.xx.19: state change from Full to Down
22:53:08 route,ospf,info OSPFv2 neighbor 102.xx.xxx.1: state change from Full to Down
22:53:08 route,ospf,info OSPFv2 neighbor 102.xx.xxx.1: state change from Full to Down
22:53:08 route,ospf,info OSPFv2 neighbor 102.xx.xx.6: state change from ExStart to Init
22:53:08 route,ospf,info OSPFv2 neighbor 102.xx.xxx.1: state change from ExStart to Init
22:53:08 route,ospf,info OSPFv2 neighbor 102.xx.xxx.1: state change from ExStart to Init
core
rout ospf int pr
Flags: X - disabled, I - inactive, D - dynamic, P - passive
# INTERFACE COST PRIORITY NETWORK-TYPE AUTHENTICATION AUTHENTICATION-KEY
0 vlan51 10 1 nbma none
1 vlan50 10 1 point-to-point none
2 vlan52 10 1 point-to-point none
3 vlan49 20 1 point-to-point none
4 vlan1205 25 1 nbma none
5 vlan58-CHR 10 1 nbma none
6 DP loopback0 10 1 broadcast none
 
sep
newbie
Posts: 25
Joined: Thu Nov 28, 2013 2:34 pm

Re: Random OSPF State Down

Thu Sep 22, 2022 1:13 am

Did you ever find a solution to this problem ?
i suspect we experience the same.
all ospf and bgp sessions drop at the same time. and return a few seconds afterwards..

even bgp sessions that do not depend on ospf drop..
running 6.48.6 on ccr 1072's and ccr 1036's
 
User avatar
mrz
MikroTik Support
MikroTik Support
Posts: 7042
Joined: Wed Feb 07, 2007 12:45 pm
Location: Latvia
Contact:

Re: Random OSPF State Down

Thu Sep 22, 2022 11:05 am

This particular problem is solved in ROSv7 by running OSPF and BGP in separate processes.
 
User avatar
nichky
Forum Guru
Forum Guru
Posts: 1275
Joined: Tue Jun 23, 2015 2:35 pm

Re: Random OSPF State Down

Thu Sep 22, 2022 2:49 pm

@Wolfraider

maybe that will be silly question , but that is the part of the troubleshooting.
Can u confirm whether L1 is all good?
 
sep
newbie
Posts: 25
Joined: Thu Nov 28, 2013 2:34 pm

Re: Random OSPF State Down

Thu Sep 22, 2022 4:34 pm

yes L1 is good. also some of the bgp peers that drop are on a separate port and dedicated fiber outside of the ospf L1

the problem happen when a paloalto or fortigate firewall with ospf to the ccr's fail over between the active and the passive firewall device.
but fortigate ospf (and unrelated global bgp on different L1) should not drop when paloalto do a failover, or vice versa.
it simply seems like the whole routing engine on ccr dies and restarts.

it is good that ospf and bgp do not die together in ROS7.
 
User avatar
mrz
MikroTik Support
MikroTik Support
Posts: 7042
Joined: Wed Feb 07, 2007 12:45 pm
Location: Latvia
Contact:

Re: Random OSPF State Down

Thu Sep 22, 2022 5:39 pm

If you want to stick to v6, then increasing OSPF dead-interval and BGP hold-time may help. Also, make sure that timers are at least set to default values or higher.
 
sep
newbie
Posts: 25
Joined: Thu Nov 28, 2013 2:34 pm

Re: Random OSPF State Down

Fri Sep 23, 2022 1:10 pm

there are no dead timers or hold intervals that time out. the same instance the failover happen. that same instance all bgp and ospf sessions go out.
also the interface list and peer list is empty for a fraction of a second.

we are testing v7 in a lab. so the plan is to eventually move.

Who is online

Users browsing this forum: No registered users and 16 guests