[Bug Workaround] BGP Route Reflector corrupting NLRI for EVPN Prefix Routes

I'm reporting this bug here on the off chance it helps someone else who is as confused as I was. I've opened a support ticket (SUP-215684), but have not received a response since the ticket was opened on May 4th.

It appears that when RouterOS is acting as a route reflector, it mangles NLRI for EVPN routes. RouterOS fails to parse the dst-address correctly, but does forward it correctly to route reflector clients. It also fails to parse the VNI attribute correctly, and forwards it to clients incorrectly in most cases.

I haven't quite figured out the root cause for the dst-address parsing bug, but I believe I've got a pretty good idea of what's going on with the VNI parsing: RouterOS appears to be re-using it's parser for MPLS tags to parse EVPN VNI information for EVPN Type-5 routes. Both are 24bit fields, but in MPLS, only the top 20 bits represent the tag, whereas the bottom four are used for metadata (I'm not sure what exactly, I'm not a MPLS expert). EVPN, on the other hand, uses the full 24 bits for the VNI. So when RouterOS receives an EVPN tag, it seems to parse it as follows:

VNI_out = ((VNI_in >> 4) << 4) || 1

This transformation sets the bottom four bits of the VNI to 0x0001. Therefore, the result is that any VNI received by RouterOS will be rounded down to the nearest multiple of 16 plus one. So, VNIs like 1, 17, 33, 10001, etc., will be reflected correctly (testing confirms this). Any other VNIs, where the relation VNI mod 16 = 1 does not hold, are corrupted by the route reflector.

I had hoped that MikroTik would be able to resolve this quickly, but seeing as I have not received a reply, I'm posting it here in case anyone else is in need of the workaround. Note I am aware that RouterOS does not implement L3 EVPN data-plane features, and this is 100% acceptable (my leaves are not MIkroTik devices). However, I feel that it should at least faithfully work as a route reflector, and at an absolute minimum should drop unsupported routes, ideally with a warning, rather than silently corrupting them.

nice find!, that n*16+1 workaround seems to be working fine.

Glad it could help you out, and good to know others are experimenting with EVPN on MikroTik. I'll keep you posted if I hear anything back from support, but so far, still nothing since I sent this a week ago:

Hello,

Could you please let me know:

  1. If you have been able to reproduce this bug
  2. If the root cause I proposed is correct
  3. If the bug is in-scope for being fixed
  4. (Very approximately) how long before a fix is worked on, from a prioritization perspective? Am I looking at weeks, months, or years?

If this bug is not in-scope or likely to be fixed in the near future, I can plan to use external devices as route reflectors instead, although this does introduce additional points of failure. I would appreciate some communications so that I can plan what to do going forward.

btw did you manage to make bgp unnumbered work with evpn?

I haven't tried BGP unnumbered, I'm doing iBGP between loopbacks with OSPF underlay. I remember briefly looking into using either unnumbered OSPF or OSPFv3 with IPv6 link-locals to distribute IPv4s, but couldnt (quickly) get that working. I don't remember if that was an issue with RouterOS or FRR though.

This will be fixed in v7.24beta