BGP not trying to reconnect more than once

Hello,

We’ve noticed on MT’s that when a BGP peering dies, mikrotik gets stuck on the open sent state forever.
In order to make again this peering come online you need to manually disable and then enable this peer.

While its stuck on open sent state, it will not send any packets at all to the other peer even if you leave it like this for days.
This has been seen with both older and latest firmware on cloudcore’s.

Normally no one would see this problem because if your Peer tries to reconnect with you, then after a few seconds you will see this peering up and running. However there are situations in which the other peer has its configuration set to Passive.
So since MT doesn’t try to reconnect and the other Peer won’t try due to configuration, these Peering’s stay forever down until you manually take action.

Initially I thought this was BFD specific but after testing without BFD it turned out that it will occur even without BFD.

So my question is, do I need to make some specific setting in order for the MT to try to reconnect to the other Peer or is it a bug that hasn’t been observed before ?

Thanks

We’re seeing the same thing, and I’ve posted about it before as well…

IMHO, a bug.

I have seen it happen as well. As you write, only when one side is passive.
Did you already try to report it to the support mail address?

I emailed support about this issue - feedback below:

Yes, it is a known problem, it tries multiple times except that with each try and failure interval between tries increase.
Currently solution for this problem when interval becomes too high is only disable/enable.
This will change in ROS v7.

From what I can see, the BGP peer does not seem to automatically retry - even if I leave it for long (hours) periods of time. Disable / enable immediately resolves the issue (not ideal but it works - ultimately I need reconnect / recovery to be automated).

Yup. Precisely the same story here, and the same “v7” fix -shrugs-

ROS v7 better be truly awesome, because damn, there’s a lot of many, many serious issues depending on it.

I have the same issue on multiple devices/archs.

I’ve seen it to but then again the other side is passive. if the syn packet get lost then the process is stuck it’s not obeying syn timeout and resetting itself and trying again.

Is that issue still presents in the latest ROS or fixed already?

As the latest RouterOS is still a v6.x and not v7, I’m afraid it still happens…
It usually is only a problem when the link is down for extended time, not when simply rebooting or doing some reconfig.

Here is a script that i put in scheduler to automatically restart BGP when it’s stuck.
I figured that it might be helpful to someone until this bug is fixed.
To use it you need to change antifilter to your BGP peer name.

/routing bgp peer {
	:local Peer [:pick [find name="antifilter"] 0]
	:local PeerState [get [:pick [find name="antifilter"] 0] state]
	:if ($PeerState = "opensent") do={
			disable $Peer;
			enable $Peer;
			:log warning "BGP peer restarted by script";
	}
}

Rewritten version of previous script with no hardcoded peer name

/routing bgp peer {
   :foreach peer in [find state="opensent"] do={
      :log warning "Restart stuck BGP Peer: $([get $peer name])"
      disable $peer
      enable $peer
   }
}

awesome. thanks.

saved my life. i can have a sleep now

/routing bgp peer {
   :foreach peer in [find state!="established" and disabled=no] do={
      :log warning "Restart stuck BGP Peer: $([get $peer name])"
      disable $peer
      :delay 100ms
      enable $peer
   }
}