BGP Failover...

I would love to hear from MikroTik users that are successfully running BGP with two or more upstream providers and that have tested failover.

As I understand our MikroTik’s are establishing a BGP sessions with a Hold Time of 0 and a Keep Alive Time of 0. The upstream provider router sees this as an indefinite setting and therefore when and if the link fails the BGP session will not clear until manually done so. Obviously this is a huge problem as that upstream will continue to announce when it shouldn’t. Bottom line is all traffic will not failover automatically.

Any ideas on how to set the Hold Time and Keep Alive Time in MikroTik?

I sure hope V2.9 has a BGP module like Zebra etc… What MikroTik is offering for BGP now is very limited and probably shouldn’t even be sold as a supported feature in V2.8!

Thanks,

Brad

It is not the hold time. I noticed mikrotik waiting / connecting / trying to connect to the upstreme router even though the link/router were down. ONce we disabled the interface it showed up “not connected” and announced to the other upstreme. I agree with you that the bgp is limited. Somebody has to get in and disable an interface or remove the rule so that announces are done to the other upstreme.

I think it should be easy to write a script for this if someone can come up with this it would atleast fulfil the failover issue.

that works for connected links, but what about when your upstream provider has routing/connection problems. You then dont have the luxury of checking for link status.

For that your provider needs to be properly bgpd so he has failover and lesser issues as well.

That is what I wanted to say
BGP feature DOES NOT work well at all even if upstreme has failed and its “connecting” the bgp failover does not work . It should however work as soon as the upstreme flaps…

Does not happen.

This isn’t the problem we recently encountered. One of our upstream providers went down. The local interface between them and us never dropped. We believe because the initial BGP Session was setup at “0” Hold Time and “0” Keep Alive Time this upstream never stopped announcing our routes even though their routers knew the circuit was offline.

This is a MikroTik problem as every other BGP peer they have sets up a “90” Hold Time and “30” Keep Alive Time. If MikroTik can initially setup the BGP session with the proper Hold & Keep Alive times the upstream will know to drop the session. As of now a “0” setting means indefinite and requires a MANUAL clearing before they will stop announcing our routes. BTW, how can you clear a session without disabling/re-enabling the entire BGP module!?!

We are just hoping with fingers crossed V2.9 will have BGP support. It is of many people’s opinion V2.8 does not support BGP even though MikroTik advertises and sells it as such. :frowning:

Best,

Brad

Although your waiting for 2.9 there will/ may be a lot of bugs in it when it is released. Maybe not good immidiately for production use. I dont know whether this is a possibility but I guess we can run a machine with zebra behind MT and have MT learn its routes from the zebra box. The zebra box can be configured to do ONLY bgp with your upstremes . Zebra is more configurable right ?

OpenBGPD on OpenBSD is also good (atleast it was for our tests).
Quagga is also an option.

well the big question is can we use zebra / openbgpd (called ZO) along with MT . Where MT is the main router and is doing bgpd (learning routes sent to it by ZO)

ZO is doing bgpd with the two isps upstreme. This way we get fine grained controlled on our bgp using ZO who advertises to isps and to our MT and then MT does the routing accordingly ?

Sure would be easier if MikroTik just included a BGP module that worked. As I understand V2.9 will introduce many BGP features not currently available with V2.8.

Is there anyone here with MikroTik that can comment on this?

BTW, I have yet to hear back from MikroTik support regarding the sup-out files I sent them they asked for regarding these BGP problems…

Best,

Brad

routing bgp
hold-timer - if nothing is received from peer for this amount of
time, then router considers peer dead and closes the conection.
keepalive-timer - interval between keepalive messages sent to
peer

By default these are ‘0’. Changing these values should solve your problems.
:sunglasses:

What are the ideal settings for these ?? (other than 0) what would be ideal ??
Lets say if we want seamless failover… .what would these settings be . (we dont have multihop)

no such command or directory (hold-timer)
no such command or directory (keepalive-timer)

Where is it ?

/routing bgp peer add hold-timer=20

Thanks for the reply I will try that
Are the hold-time and keepalive time in seconds ?

What would be ideal values for these .

Try to use “90” Hold Time and “30” Keep Alive Time.

/routing bgp peer add hold-timer=90
ERROR: no such argument (hold-timer)
/routing bgp peer add hold-time=90
ERROR: no such argument (hold-time)

I am using 2.8.18

Upgrade

Hello Eugene,

Thanks for your input. I also heard back from Normunds who enlightened me on the hold/keep alive settings. We are planning on trying these settings this weekend. Fingers crossed!

Best,

Brad

Upgraded to 2.8.24 BUT

/routing bgp peer add hold-timer=90 sent my system to 100 cpu load a nd added a 0.0.0.0 peer which is not getting removed

HELP !!!
CPU USAGE 100% constant. The BGP keeps dropping with one of the providers ! It worked perfectly over the last 4+ months now with the upgrade we are screwed. Can someone come up with something ?