We have been using MT 2.9.x and 2.8.x at AWMN (Athens Wireless Metropolitan Network - http://www.awmn.net ), and we have come to notice the following:
While manually adding a new network (/24) into BGP routing table, by issuing:
routing bgp network … etc
the new route would appear at all routers (around 300 in number).
But when trying to remove the above route from the injector MT router, the route would disappear at a glance from some routers, delay its disappearance from some other ones, but in some cases re-appear in the routing tables of the rest (other than the injector) routers in our network.
In some cases it would swing from appear to disappearing state.
I’m convinced this behavior is some sort of a MT bug.
Even today (3 days after the first injection-removal of the initial route) these manually injected routes still prolong in the routing tables of our network, even if the injector has removed them. Of course bgp restarts have taken place since then, numerous times. The routes are being advertised by other routers inside our network and arrive in the routing tables.
As I have noted above, we are running various versions of MT inside our network (each router announces it’s own C-class), ranging from 2.8.X, to 2.9.24 with routing-test package (minority). Personally I use 2.9.6 in my router.
Therefore it’s not a matter of a single router’s MT version, since the announced route would arrive to it from a router with a different (most possibly) MT version.
I am still on 2.9.6 in production at the data center. I evaluated 2.9.23 bgp and thought I had a problem with routes not being removed from the RIB (with 100,000+ routes from another MT). I didn’t get a chance to complete the testing and make a determination if it was something I did or other. I will finish this later this weekend on the 2 new routers we received today and see whats happening as well.
We have sent 2 or 3 supout related with the problem.
We can also give remote access in some routers if it is nescessary.
We play mostly with eBGP (every node has its own AS), so there’s not any exotic configuration.
The problem seems to be at the route removal messages.
There’s something wrong in this process and we end up with bgp neighbors containing invalid routes.
Consider that we have about 300 nodes in our network.
When a node fails, the route of this node is still jerking around all over the network
So it’s better for us to remove rt or let it to 3-4 nodes most to minimise the bgp effect, and wait for a fix.
The abnormal situation lures on most routers in our network, regardless if one node has downgraded/uninstalled routing-test package. The above is not a valid argument from troubleshooting scope of view.
You can still gather traces from other routers in our network running MT (some of them run on MT 2.9.24 routing-test).
this problem does not lure on any of our routers, so we need your help anyway. so please, if any of you have this problem, write a brief problem description in an email, attach a supout.rif from your BGP router and make sure that you run 2.9.24 with routing-test
This has already been done. We will try to isolate the 2.9.24 routing-test in just some routers under our control, and monitor the behavior with regards to the floating “zombie” prefixes. You will have more feedback on this.
In the meantime, please take a look in the supout.rif that we have already sent you.