Playing with VRFs - what am I doing wrong?

Hello all,
I am doing some experiments with VRFs to implement an automatic failover between 2 ISPs and things are mostly working but I have some weird issues for which I would appreciate some help.

The context is as follows:

  • I have 2 ISPs
  • Both ISP-provided routers have the same IP 192.168.1.1
  • My LAN network is also 192.168.1.0/24

Since all of this is using the same subnet I have put each ISP router in a separate VRF:

  • The bridge IP address is 192.168.1.201
  • The router of the first ISP is connected to ether1, ether1 has the IP 192.168.1.202 and is connected to a dedicated VRF
  • The router of the second ISP is connected to ether2, ether2 has the IP 192.168.1.203 and is connected to an other dedicated VRF

Then I have a netwatch script that change the priority of my default route to go through one ISP or the other to implement the failover, but i’s not that part that I want to focus on for now.

Things are mostly working BUT I have the following issues:

  • I can only access the Internet from the LAN if the 2 IPs of the Mikrotik router in the 2 separate VRFs are assigned as /32 - that should not be the case. If I assign /24 addresses then I can ping google from the Mikrotik console but not from the LAN.
  • Checking for updates from the Mikrotik does not work, it cannot do the DNS resolution.



    Below is my config - I removed the part relevant to netwatch because for now it’s not the issue:
# model = RB3011UiAS
/interface list
add comment=defconf name=WAN
add comment=defconf name=LAN
/ip vrf
add comment=vrf_starlink interfaces=ether2 name=vrf_starlink
add comment=vrf_orange interfaces=ether1 name=vrf_orange
/interface bridge port
add bridge=bridge comment=defconf interface=ether3
add bridge=bridge comment=defconf interface=ether4
add bridge=bridge comment=defconf interface=ether5
add bridge=bridge comment=defconf interface=ether6
add bridge=bridge comment=defconf interface=ether7
add bridge=bridge comment=defconf interface=ether8
add bridge=bridge comment=defconf interface=ether9
add bridge=bridge comment=defconf interface=ether10
add bridge=bridge comment=defconf interface=sfp1
/interface list member
add comment=defconf interface=bridge list=LAN
add comment=defconf interface=ether1 list=WAN
add interface=ether2 list=WAN
/ip address
add address=192.168.1.201/24 comment=defconf interface=bridge network=192.168.1.0
add address=192.168.1.202 comment=ip_vrf_orange interface=ether1 network=192.168.1.1
add address=192.168.1.203 comment=ip_vrf_starlink interface=ether2 network=192.168.1.1
/ip dns
set allow-remote-requests=yes servers=9.9.9.9
/ip dns static
add address=192.168.1.201 comment=defconf name=router.lan
/ip firewall filter
add action=accept chain=input comment="defconf: accept established,related,untracked" connection-state=established,related,untracked
add action=drop chain=input comment="defconf: drop invalid" connection-state=invalid
add action=accept chain=input comment="defconf: accept ICMP" protocol=icmp
add action=accept chain=input comment="defconf: accept to local loopback (for CAPsMAN)" dst-address=127.0.0.1
add action=drop chain=input comment="defconf: drop all not coming from LAN" in-interface-list=!LAN
add action=accept chain=forward comment="defconf: accept in ipsec policy" ipsec-policy=in,ipsec
add action=accept chain=forward comment="defconf: accept out ipsec policy" ipsec-policy=out,ipsec
add action=fasttrack-connection chain=forward comment="defconf: fasttrack" connection-state=established,related hw-offload=yes
add action=accept chain=forward comment="defconf: accept established,related, untracked" connection-state=established,related,untracked
add action=drop chain=forward comment="defconf: drop invalid" connection-state=invalid
add action=drop chain=forward comment="defconf: drop all from WAN not DSTNATed" connection-nat-state=!dstnat connection-state=new in-interface-list=WAN
/ip firewall nat
add action=masquerade chain=srcnat comment="defconf: masquerade" ipsec-policy=out,none out-interface-list=WAN
/ip route
add disabled=no distance=2 dst-address=0.0.0.0/0 gateway=192.168.1.1@vrf_orange routing-table=main suppress-hw-offload=no
add disabled=no distance=1 dst-address=0.0.0.0/0 gateway=192.168.1.1@vrf_starlink routing-table=main suppress-hw-offload=no
add disabled=no distance=1 dst-address=192.168.1.0/24 gateway=bridge routing-table=vrf_orange suppress-hw-offload=no
add disabled=no distance=1 dst-address=192.168.1.0/24 gateway=bridge routing-table=vrf_starlink suppress-hw-offload=no
add disabled=no distance=1 dst-address=0.0.0.0/0 gateway=192.168.1.1@vrf_starlink routing-table=vrf_starlink suppress-hw-offload=no
add disabled=no distance=1 dst-address=0.0.0.0/0 gateway=192.168.1.1@vrf_orange routing-table=vrf_orange suppress-hw-offload=no
/system identity
set name=rb3011

Thanks for your help.

Thinking about it, maybe the symptoms are indicative of the fact that the traffic from the LAN into the VRFs is not NATed.

Not sure what NAT rule I need to implement for that.

If you using VRF, the firewall interface matcher does not know the real interface, it is always the VRF interface…

Would something like:
/interface list member
add interface=vrf_starlink list=WAN
add interface=vrf_orange list=WAN

work?

There have been some changes in some version of 7.14 that may allow this:
http://forum.mikrotik.com/t/vrf-routing-issue-on-7-14/174016/1

or:
/ip firewall nat
add action=masquerade chain=srcnat comment=“defconf: masquerade” ipsec-policy=out,none out-interface-list=WAN
add action=masquerade chain=srcnat comment=“myconf: vrfmasquerade” ipsec-policy=out,none out-interface=vrf_starlink
add action=masquerade chain=srcnat comment=“myconf: vrfmasquerade” ipsec-policy=out,none out-interface=vrf_orange
?
:question:

Probably yes but if you have multiple interfaces in the same vrf, you’ll be natting between those interfaces too. Maybe you should also use an IP matcher.

I tried that but it’s even worse:

[admin@rb3011] > /ping 8.8.8.8 vrf=vrf_orange 
  SEQ HOST                                     SIZE TTL TIME       STATUS                                                                                         
    0                                                              packet rejected                                                                                
    1                                                              packet rejected                                                                                
    2                                                              packet rejected                                                                                
    sent=3 received=0 packet-loss=100%

I can see the NAT is working:

Masq srcnat: in:bridge out:ether2, connection-state:new src-mac xx:xx:xx:xx:xx:xx, proto TCP (SYN), 192.168.1.104:56184->151.101.129.140:443, len 60

My assumption is that the return traffic is not routed using the correct routing table. It works when using /32 interface addresses because this creates a point-to-point link.

But don’t you already have these as “return routes”?:

add disabled=no distance=1 dst-address=192.168.1.0/24 gateway=bridge routing-table=vrf_orange suppress-hw-offload=no
add disabled=no distance=1 dst-address=192.168.1.0/24 gateway=bridge routing-table=vrf_starlink suppress-hw-offload=no

Indeed and these routes are absolutely necessary otherwise nothing works. Also this is why I need to assign /32 mask to the IP address in the VRF otherwise I end up with conflicting routes: 192.168.1.0/24 is automatically added as a dynamic route so I cannot “leak” an other route to 192.168.1.0/24 in the main routing table.

I did some tries without the return routes and using some firewall mangle rules instead, and got I some promising results, although could not manage to make it work yet.

Hello,
i’m also try to use vrf. but i have a similar problem.
for testing i try to my lab , so i use 3 different subnet
LAN = 192.168.88.0/24
WAN1 (VRF1) = 10.1.1.0/24
WAN2 (VRF2) = 192.168.89.0/24

i attach rsc.
with now i setup a dhcp server to use 8.8.8.8 and 1.1.1.1 because if i use 192.168.88.1 not work.

all client in LAN works correctly, nat is working .
netwatch works fine. after i will setup a rule to disable mangle or change distance.

my only problem is router itself can resolve names…
if i try to ping works

[admin@MikroTik] > ping 1.1.1.1                            
  SEQ HOST                                     SIZE TTL TIME       STATUS                                                                                     
    0 1.1.1.1                                    56  57 10ms895us 
    1 1.1.1.1                                    56  57 10ms422us 
    sent=2 received=2 packet-loss=0% min-rtt=10ms422us avg-rtt=10ms658us max-rtt=10ms895us

if i try to specify vrf/ether works

[admin@MikroTik] > ping 1.1.1.1 vrf=vrf1
  SEQ HOST                                     SIZE TTL TIME       STATUS                                                                                     
    0 1.1.1.1                                    56  57 10ms831us 
    1 1.1.1.1                                    56  57 9ms938us  
    sent=2 received=2 packet-loss=0% min-rtt=9ms938us avg-rtt=10ms384us max-rtt=10ms831us 
[admin@MikroTik] > ping 1.1.1.1 vrf=vrf2
  SEQ HOST                                     SIZE TTL TIME       STATUS                                                                                     
    0 1.1.1.1                                    56  54 41ms812us 
    1 1.1.1.1                                    56  54 56ms234us 
    sent=2 received=2 packet-loss=0% min-rtt=41ms812us avg-rtt=49ms23us max-rtt=56ms234us

but if i try to resolve, don’t work

[admin@MikroTik] > ping www.google.com
invalid value for argument address:
    invalid value of mac-address, mac address required
    invalid value for argument ipv6-address
    while resolving ip-address: could not get answer from dns server

of course i can’t update package…

i try with and without latest mangle rules (output)…

repeat only problem is dns on mikrotik don’t work…
i try also telnet from mikrotik but not connect …

can you help me?
thank you
export.rsc (6.52 KB)

tried wrapping my head around that and i cannot clearly make it up how a routing decision is made in that setup?
a packet from “LAN” (src 192.168.1.x) goes out to “WAN” (either VRF “starlink” or “orange”) … how would the return path look like when everything is 192.168.1.0/24 ? i know VRFs are a meaning to resolve ip overlapping but i still cannot see how a router is able to decide the RP from WAN back in either VRF ?

@aleab
The issue you are having is a known one.
There is not (yet, seemingly things are in the works) support for DNS in vrf’s.

I have a similar setup, in my case I “reversed” the vrf, putting it on the LAN side, so that the interfaces on the WAN side are on “main”, and thus the DNS works normally..

See this thread:
http://forum.mikrotik.com/t/attempting-to-evolve-from-cavemans-failover/170048/59
starting from here (plain vrf’s):
http://forum.mikrotik.com/t/attempting-to-evolve-from-cavemans-failover/170048/59
up to here:
http://forum.mikrotik.com/t/attempting-to-evolve-from-cavemans-failover/170048/59 (the “reversed” vrf configuration)

@spippan
Check the same links above, it does work, but we (at least myself) don’t really know why exactly, the key is having the static route(s) to the ISP modem(s) as /32 and return route(s) added to the vrf tables, in “main” the return route (LAN side) is automatically added (comes out as DAC in /ip route print).

ok, thank you.
now 'im reading your link.

big thank you!!!

thanks @jaclaz i’ll go through that … curious about that.

Maybe I’m missing something here… But what is the point of using VRF for ISP failover? — VRFs have nothing to do with “automatic failover”. Failover works without VRFs, and so layering VRF on top of failover mechanisms just make config even more complex.

all address spaces in this setup are in 192.168.1.x according to OP

The point is about having multiple ISP routers pre-set to the SAME IP address (usually 192.168.1.1) that you cannot modify (either because the routers themselves are not accessible or because you have some devices on the network with 192.168.1.1 set as gateway that as well cannot be changed or - to be changed - need to wait several hours or days for an intervention either remote or on site, that BTW may or may not be free).

In my particular case (and in my simplicity, caveman but attempting to evolve) I am now using an Ax Lite as a “transparent device” i.e. it has 192.168.1.1 on the LAN side and connects to other devices (ISP routers) that also have 192.168.1.1, while it provides a failover feature, it can be bypassed any time by simply taking out the ethernet cable coming from the switch/network from the LAN port of and inserting it directly in a LAN port of the (chosen) router.

Only for the record, I have as a side-side project an alternative configuration with the three ports to the three ISP modems bridged with the network, with two of the three ports disabled/temporarily removed from bridge.
When internet connection is not working, I can disable/remove the current “towards modem” port and enable/add another one, it needs a gratuitious ARP to update timely the MAC of the device with 192.168.1.1, manually it works but I have still to find the time to better study the scripting syntax and produce a (even if half-@§§ed) working script to automate the failover.
Moreover at the time I setup GNS3 on a spare PC and now I am having issues to install GNS3 on the laptop I am using, and until I find a way to do so, I won’t have the possibility to make progresses with this approach.

Sure, there are use cases for VRFs.

More just saying that having multiple same subnets are allowed without VRF. Now it means the default route 0.0.0.0/0 needs to be % qualified, so gateway=192.168.1.1**%etherX-toWAN-Y**.

Failover happens by using check-gateway=ping (or more complex netwatch/recursive routing approaches) on primary route with distance=1. And backup route get distance=2. The fact they have same subnet should not matter if on WAN side, but the interface-qualified % is needed (which should be added by DHCP client automatically)

If the LAN side is also on 192.168.1.x/24? :confused:

This is where I always fail to get a proper answer.

I have always thought (and I may well be very wrong) that if the LAN side interface is 192.168.1.x/24 the other interface(s) can be either:

  1. in the same 192.168.1.x/24 and then the device needs to be setup as bridge/switch
    or
  2. in another range and then the device needs to be setup as router

If you have time/will can you post a more complete example?

While you can generally pick your own LAN side subnet to NOT conflict (further), and avoid these esoteric RouterOS questions… But let’s assume LAN absolutely has to be 192.168.1.1 and two WANs have to be 192.168.1.1… AFAIK that too should be fine without VRFs. Now where you MIGHT run into trouble is the firewall… but basically any IP-based matchers (wherever in filter/managle/nat/address-list) always need specify an SOME interface based match, since IP alone is not unique.

In terms of example, I’m more a Layer3 purist so each subnet should be unique so everything is routable across a larger network. When that’s not possible, another approach is to use “netmap” action in NAT to essentially remap some 192.168.1.0/24 - but this is more useful if you have some routed multi-site L3 architecture already. i.e. so rest of the network sees some edge with 192.168.1.1 as something else like 10/192/172.a.b.x… Basically NAT’s “netmap” lets you “alias subnet” like 192.168.1.x to 192.168.101.x to make it unique. Similar other tricks with “netmap” are possible as alternative to VRFs.

Anyway more food for thought. Maybe VRFs are the right approach if really everything is 192.168.1.0/24, but I try hard to avoid that case before getting to VRFs.

Last point, the OP has a starlink, so kinda question 0 is why not use by-pass mode to avoid 192.168.1.0/24 and also avoid a double-NAT…