We’ve noticed on MT’s that when a BGP peering dies, mikrotik gets stuck on the open sent state forever.
In order to make again this peering come online you need to manually disable and then enable this peer.
While its stuck on open sent state, it will not send any packets at all to the other peer even if you leave it like this for days.
This has been seen with both older and latest firmware on cloudcore’s.
Normally no one would see this problem because if your Peer tries to reconnect with you, then after a few seconds you will see this peering up and running. However there are situations in which the other peer has its configuration set to Passive.
So since MT doesn’t try to reconnect and the other Peer won’t try due to configuration, these Peering’s stay forever down until you manually take action.
Initially I thought this was BFD specific but after testing without BFD it turned out that it will occur even without BFD.
So my question is, do I need to make some specific setting in order for the MT to try to reconnect to the other Peer or is it a bug that hasn’t been observed before ?
I emailed support about this issue - feedback below:
Yes, it is a known problem, it tries multiple times except that with each try and failure interval between tries increase.
Currently solution for this problem when interval becomes too high is only disable/enable.
This will change in ROS v7.
From what I can see, the BGP peer does not seem to automatically retry - even if I leave it for long (hours) periods of time. Disable / enable immediately resolves the issue (not ideal but it works - ultimately I need reconnect / recovery to be automated).
I’ve seen it to but then again the other side is passive. if the syn packet get lost then the process is stuck it’s not obeying syn timeout and resetting itself and trying again.
As the latest RouterOS is still a v6.x and not v7, I’m afraid it still happens…
It usually is only a problem when the link is down for extended time, not when simply rebooting or doing some reconfig.
Here is a script that i put in scheduler to automatically restart BGP when it’s stuck.
I figured that it might be helpful to someone until this bug is fixed.
To use it you need to change antifilter to your BGP peer name.