Problems with MPLS IPv4 VPN

Hi

I have a new CCR1016-12G which I like to use as part of my MPLS network. I attached it to a Brocade XMR via VLAN 2006. On this Interface I have enabled OSPF and LDP. I made 2 BGP sessions to my VPNv4 Route Reflectors and configured a VRF on the CCR and on the two Cisco Routers.
On the CCR I have a ipip tunnel in this VRF and on the Cisco I have a loopback interface in the VRF.

I can sometimes ping from a RouterBoard behind the CCR to the loopback on the Cisco. It usualy works for 10-20s. After that it stops working. Sometimes it spontaniously starts working again for 10-20s. I can force it to work, when I change anything in the routing table on the CCR, eg. adding/removing or disabling/enabling a static route, or disabling or enabling one of the BGP sessions.

Does anyone have a similar setup or similar troubles? Any hint’s are highly welcome.
Mikrotik-NetworkMap.jpg

Hi Crami

I have a similar setup.

Please email support@mikrotik.com with your problem and a supout.rif they are actively working on problems with L3VPN at the moment.

Hi All,

I was about to make a post on this very issue.

I am running 6.0rc14 on RB1200 and RB750 platforms. ( For some reason 6.0 candidate release crashes regularly, 6.0rc14 less so )

Having come from Cisco(15yrs+) and Juniper (2yrs+) to Mikrotik (6months+) I would have thought that establishing a basic back to back MPLS VPN between two routers using Loopbacks with a couple of “LAN” computers to simulate a station at each site would have been a no brainer, but after some time I have consider this feature of RouterOS faulty !

I’ve run my config by a couple of MT engineers who seem to think this should be pretty straight forward also, but their experience with these features is primarily in 5.x. I would be interested to hear your thoughts.

I have
P2P GRE tunnel between two Mikrotiks, lets call then PE1 and PE2
Each PE has it’s own Loopback0 IP (PE1 10.240.2.1 PE2 10.240.2.2)
OSPF (P2P) is running on the GRE with Loopback0 as passive
LDP is running on the GRE with Loopback0 as the source
VPNv4 BGP is running between Loopback of PE1 and PE2 for VRF prefix distribution
Each PE has a single LAN interface in VRF2256
PE1’s in-VRF IP is 192.168.26.254/24 and PE2 is 192.168.127.254/24

When I apply the config to a new router straight from copy & paste, everything comes up as expected. OSPF exchanges Loopbacks, BGP comes up, label distribution is good, MPLS forwarding table is good. BGP Prefix to Label association is correct and the paths are working. The device on LAN1 IP 192.168.26.150 can ping device on LAN2 IP 192.168.127.150 and the world is a happy place. A no brainer !

However, after an arbitrary period of time the pings between the two computers will stop. The tables on the Mikrotik are unchanged with respect to routes and labels. Debug log shows nothing untoward at the moment the ping stops dead in the water. The most time I have got this working is 965 pings, but almost always this dies after 5-20 successful pings.

Like you, changing anything to do with the route table causes the MPLS VPN to start working again, even though the change has no relationship to the traffic that is flowing through the router. For example, I add a Loopback2256 into VRF2256 on a PE. Adding a loopback in the VRF on the PE has no bearing on the existing labels and prefixes relating to the LAN-LAN PC’s pinging each other, there is no visible change to any tables, yet MPLS will start working again, briefly. When it stops, you can do a similar thing, anything that pokes the routing table appears to cause it to start working again momentarily. It’s like the MPLS table is saying it is doing what it should, but it isn’t.

See screenshot attached
MPLS.jpg

config for PE2

/routing bgp instance
set default as=65501 router-id=10.240.2.2
/routing ospf instance
set [ find default=yes ] mpls-te-area=backbone mpls-te-router-id=Loopback0 router-id=10.240.2.2
/ip address
add address=192.168.127.254/24 interface=ether1 network=192.168.127.0
add address=10.240.2.2/32 interface=Loopback0 network=10.240.2.2
add address=192.168.5.81/24 interface=ether5 network=192.168.5.0
add address=10.200.2.6/30 interface=AGGR1-GRE network=10.200.2.4
/ip dhcp-server network
add address=192.168.127.0/24 dns-server=8.8.8.8 gateway=192.168.127.254 netmask=24 ntp-server=0.0.0.0 wins-server=0.0.0.0
/ip route vrf
add export-route-targets=65501:2256 import-route-targets=65501:2256 interfaces=ether1 route-distinguisher=65501:2256 routing-mark=C2256
/mpls
set propagate-ttl=no
/mpls ldp
set enabled=yes lsr-id=10.240.2.2 transport-address=10.240.2.2 use-explicit-null=yes
/mpls ldp interface
add transport-address=10.240.2.2
/routing bgp instance vrf
add redistribute-connected=yes routing-mark=C2256
/routing bgp peer
add address-families=vpnv4 name=PE2 remote-address=10.240.2.1 remote-as=65501 route-reflect=no ttl=default update-source=Loopback0
/routing ospf interface
add interface=Loopback0 network-type=broadcast passive=yes
/routing ospf network
add area=backbone network=10.200.0.0/16
add area=backbone network=10.240.0.0/16

Hi Guys,

L3VPN is currently unusable on RouterOS, and has been since at least 5.0rc days.

There are multiple issues with L3VPN on RouterOS:

  1. BGP route withdraw are received but not actioned on L3VPN endpoints (results in stale routes)
  2. Loopback interfaces within a VRF are reachable from within other VRF’s on the same router.
  3. Routing engine crashes on L3VPN endpoints
  4. Redistribution of “Other BGP” in a VRF does not work

Issue 1 and 2 have been there since 5.0rc days, possibly since mpls-test days… 2 and 3 appear to be more recent regressions.

Please email support@mikrotik.com with Attn:Maris in the subject, include detailed descriptions of your problem and supout.rif from all routers involved.

Mikrotik are actively working on these issues at the moment, and the more information they get the quicker it will be fixed.

That is very disappointing !

I used the MUM slideshows and Mikrotik Manual as my guide for doing this on RouterOS, aside from the fact the most recent version I can see is 3.17 or thereabouts, it all seems to work and in fact cites “no failures here” with pings going back and forward. There is no caveats/gotchas or any suggestions this is broken which it obviously is :frowning:

I then read Greg Sowell’s blog entry “Why is Mikrotik pushing MPLS” and figured well, if it worked in 3.17 surely it would work now. Bit the dust on that one didn’t I hahah !!! :slight_smile:

Without sounding too stupid, does Mikrotik publish a list of known caveats like Cisco/Juniper for each release so at least when one sees a knob that should work, if its fundamentally broken, this is identified during the release?

I read the changelogs and that is great for showing what is resolved, but it doesn’t show what remains broken (despite the fact the knobs are there giving it the appearance that it should work for more than 30 seconds.) I like to read over the known caveats list before I go jumping in.

Spent a fair bit of time bashing this out before I reached the conclusion “this is just outright broken!” so to hear that it is a known issue and has been for a long time to a new comer who has endured a baptism of fire is a bit painful. Suggestions on how to avoid a similar experience in the future? Is there a definitive reference I should be looking at when considering what features work and what don’t on a given release?

This appears to be ongoing for some time, what kind of timeline does Mikrotik have on a working MPLS VPN given they are pushing “Cloud Core Routers” ? I’m used to Cisco who would consider this a serious issue and would not push a new release out (citing a bunch of fixes) whilst such a fundamental aspect of the code remains accessible to the user, yet non functional due to immaturity?

I’m not expecting Mikrotik to be Cisco, I’m just trying to reset my expectations based on where Mikrotik is at so I don’t end up bashing my head on something which I discover was always broken :frowning:

Completely agree. We are wanting to deploy our first 100% Mikrotik L3VPN network using the current CCR as well as the new -2S+ model but until these issues are resolved we cannot.

I have raised the issues I have discovered multiple times with support over the past few years and have previously been told “All problems will be fixed in the new routing” but never got a timeline on when this would make it in to RouterOS :frowning: while we have been able to successfully use L2VPN and static VPLS, L3VPN has been completely unusable due to the stale routes issue.

The ticket I logged last week is the first time they have actively engaged with me to work through to a resolution. So we are now heading towards the possibility that this bug will finally be fixed :slight_smile:

Please email them, as the only way we will get the problems resolved is to work with Mikrotik support through their official support mechanisms, which is currently email.

I sent a mail to support@ today in the morning, but have not heard anything until now. Lets see what they come up with.

I have sent a mail to support last night about this too.
With v6.1 L3VPN stops working after a very short time, LDP is running, MPLS forwarding table is ok, BGP routes are still there. It seems, at least in my tests, that the vrf is “leaking”.

My setup:
ether1 - MPLS interface
ether2 - VRF 2:20, IP 10.12.0.1/24

My test:
I had a PC connected to ether2 with IP 10.12.0.10 and ran a ping against 8.8.8.8, and when the issue occured I sniffed traffic on ether1 (ie: tool sniffer quick interface=ether1 ip-address=10.12.0.0/24) which should not return anything since i’m running L3VPN, but it did.

My observation:
Basically the packets are IP routed, instead of getting a label pushed and forwarded as MPLS-packet.
A interesting thing is that I didn’t have a route that matched 8.8.8.8 in the main routing-table, so it cant be using that.

Can anyone verify this and see if v6 is forwarding L3VPN-packets as IP instead of MPLS?

Hi Norpan,

Your issue is different from the issue that has been discussed here, but being a problem related to MPLS is still quite interesting.

Have you looked at the packets “outside” the Mikrotik, e.g. captured them with wireshark on a mirror port? That would be really interesting if it is indeed forwarding packets unlabeled.

Well, it may or may not be the same issue, but bearmeister’s description of his problem fits exactly with what I have discovered. A ‘poke’ in the routing table makes things work again, but only for a short while.
I have just now setup a lab with two routers in virtualbox, and basic l3vpn where the problem shows up almost instantly.

  • R2: ping routing-table=vrf1_1 src-address=192.168.2.1 192.168.1.1
  • R1: tool sniff quick interface=ether1 ip-address=192.168.2.0/24

And when the ping dies with timeout, I get hits on the capture.

R1

/interface bridge
add name=lo0
/ip address
add address=1.1.1.1/32 interface=lo0 network=1.1.1.1
add address=10.1.1.1/24 interface=ether1 network=10.1.1.0
add address=192.168.1.1/24 interface=ether2 network=192.168.1.0
/ip route vrf
add export-route-targets=1:1 import-route-targets=1:1 interfaces=ether2 \
    route-distinguisher=1:1 routing-mark=vrf1_1
/mpls interface
set [ find default=yes ] mpls-mtu=1500
/mpls ldp
set enabled=yes lsr-id=1.1.1.1 transport-address=1.1.1.1
/mpls ldp interface
add interface=ether1
/routing bgp instance vrf
add redistribute-connected=yes routing-mark=vrf1_1
/routing bgp peer
add address-families=vpnv4 name=R2 remote-address=2.2.2.2 remote-as=65530 \
    update-source=1.1.1.1
/routing ospf interface
add interface=lo0 passive=yes
/routing ospf network
add area=backbone network=10.1.1.0/24
add area=backbone network=1.1.1.1/32
/system identity
set name=R1

R2

/interface bridge
add name=lo0
/ip address
add address=2.2.2.2/32 interface=lo0 network=2.2.2.2
add address=10.1.1.2/24 interface=ether1 network=10.1.1.0
add address=192.168.2.1/24 interface=ether2 network=192.168.2.0
/ip route vrf
add export-route-targets=1:1 import-route-targets=1:1 interfaces=ether2 \
    route-distinguisher=1:1 routing-mark=vrf1_1
/mpls interface
set [ find default=yes ] mpls-mtu=1500
/mpls ldp
set enabled=yes lsr-id=2.2.2.2 transport-address=2.2.2.2
/mpls ldp interface
add interface=ether1
/routing bgp instance vrf
add redistribute-connected=yes routing-mark=vrf1_1
/routing bgp peer
add address-families=vpnv4 name=R1 remote-address=1.1.1.1 remote-as=65530 \
    update-source=2.2.2.2
/routing ospf interface
add interface=lo0 passive=yes
/routing ospf network
add area=backbone network=10.1.1.0/24
add area=backbone network=2.2.2.2/32
/system identity
set name=R2

Hi Norpan,

Sorry I did not read your post completely. It does sound related.

Hopefully Mikrotik can fix these issues quickly :slight_smile:

@nz_monkey: That’s ok, I was not sure at first either but the more I look at it, and troubleshoot, the clearer it get’s. At least that’s what I’m telling myself. :wink:
I just hope that MT support can acknowledge the problem and figure it out, and get it patched really soon…

I did another setup last night with two RB750 running with the config I posted earlier, with the same result. Almost instanty the ping over the vrf fails.
With some poking around with routing and/or ldp I can get it running again, sometimes for a few seconds and at a few times it felt stable. But after a reboot it comes back.
So now I have the same issue on tile, x86 and mipsbe, which tells me that it’s not architecture related. I have not tried any rc or beta releases if it gets introduced in a specific version, but 6.0 and 6.1 behaves the same.

From my point of view MPLS L3VPN (at PE) is broken in v6, which means that CCR’s can’t be used at all.

Or is there anyone running v6 on a PE-router and it’s stable? If so, which version are you running?

Testing with 6.0beta2 now, definitely more stable than 6.1. :open_mouth:

v6.0rc2 worked for over an hour, that have never happened for me with 6.0/6.1.
Unfortunately I have managed to kill one of the routers during my upgrade/downgrade so my testing is done for now.

Summary of todays testing, versions up to 6.0rc13 it is stable so far.
From v6.0rc14 and up the issue occurs and MT support has been able to reproduce it in their test.

Hoping for a patch to come out soon.

Is it normal that MT does not reply to support mails ? Was my first request, just wondering.

What I can recall they have at least answered within 1 working day, haven’t you got a response at all?

If you still have the issues in your first post, can you try with 6.0rc13 to see if it works better?

If you don’t have it, here is a link to the torrent for all architectures:
http://www.mikrotik.com/download/routeros-ALL-6.0rc13.torrent

edit: it doesn’t have to be rc13, anything pre-6.0rc14 should work better. :slight_smile:

Norpan I think your issue is different from my primary issue. The issue I have is stale routes within a VRF, e.g. a withdraw is received by the PE router but the route is never actually withdrawn from the FIB. This bug has been in RouterOS since at least 5.0rc’s.

In my lab I have 6.1 as a route originator, and 5.12 as well as 6.0 and 6.1 as PE devices and the issue occurs on all 3 versions.

I think it depends on:

  • How busy they are
  • The subject of the issue
  • How major the problem is
  • If it is related to the “new & cool” feature

For some RouterOS components e.g. MPLS it seems there are less people in the Mikrotik support team focusing on it, I have only ever received responses from Maris for anything to do with MPLS. Where as for most other components I have had interactions with a bunch of different Mikrotik staff.

I have found generally if I log a ticket about a “new & cool” feature I will get a response quickly and any bugs get fixed quickly. If I log a ticket about something that is neglected and/or complex e.g. MPLS or HWMP the response is generally slower.