RFC: Internet Tunnels as backups for RF Connected Network

joey · July 2, 2016, 3:47pm

Hi Gang,

Background:

I help maintain an RF connected network using various Mikrotik devices and I need some help with redundancy:

The network is used for emergency communications.
The network consists of just under 40 sites.
80% of the sites are on mountain tops which are inaccessible in winter and have battery backups.
Of the Mountain sites, ~ 50% of them have access to commercial internet.
Each site is connected to at least one other using an RF shot.
Each site has it’s own subnet (e.g. 10.30.20.1, 10.0.5.1, etc)
Each router has it’s own static routing table. (e.g. on 10.30.20.1 (Mount Thorodin) a route says that 10.30.30.1 (Squaw Mountain) is accessible over ether3-squaw. That ethernet port is hooked into an ubitik which does the actual RF traffic.)

The Problem:

The network is not yet RF redundant (We’re volunteers. The network is self-funded with no commercial or government funding). We routinely take lightening strikes or have some sort of temporary power outage which breaks a segment of the network (i.e. a site goes dark and the RF links to it goes offline). Sometimes the site that goes down is the only way into/out of a geographical area. Our current and problematic solution is to manually turn on a VPN connection to bypass the dark site and on both routers make all the necessary routing changes. (It can get more complex than this unfortunately but it’s a good summary.) The VPN work around is 100% manual and takes us a few hours to get everything correct. On our two main hubs our static routing tables are at the point where they are getting to be unmanageable. Our main hub has 200 route entries with VPN workarounds.

What we’re thinking:

We’ve been looking into creating some sort of always-on, or dial on demand, VPN connections with a routing distance of 2 and using OSPF to sort it out. We attempted to do this previously. It worked on our test bed of 5 routers but when we turned it on in production we encountered VPN flapping (We have one site that has personal VPNs as well as site-to-site VPNs and OSPF tried to use the personal VPNs.), OSPF advertised public WAN IPs (we likely goofed up the config), and we encountered what we think was a firmware bug which we believe is now fixed. We’re looking at attempting this again, with OSPF, before the summer is over. We’re also looking at possibly using EoIP (with IPSEC enabled) with RIP instead. EoIP would allow us to make static routes of distance 2 and possibly use RIP to figure out when the ethernet connection of distance 1 is down.

Where you come in:

Despite the size of our network, none of us are trained network engineers. We’re hoping some helpful folks here could offer up some comments and suggestions about how we might solve the problem of a site going dark in some (ideally mostly) automated fashion. We’re eager to solve this problem because a) the manual process is time consuming and b) the network is used for public service emergency traffic and we want it to be as robust as possible. Does anyone have any ideas on how we can make this system better?

Thanks for your time.

Joey

pe1chl · July 3, 2016, 10:48am

We’re doing that with BGP. I have skimmed over OSPF documentation and it was so complicated…
BGP is easy. Just configure the default BGP instance to “redistribute connected” and setup peers
on each node like you now have set static routes, and everything works OK. When you have
redundancy in the network, link outages will automatically be routed around.
When you want to use VPN as well, make sure it is a “tunnel interface” type VPN, i.e. not a plain
IPsec tunnel. Then you can include those VPN links in the peering tables and they take part in
the scheme.
We are setting up such a network as well and we additionally configured radio links to have preference
over VPN links, but that is only because as radio amateurs we prefer radio. In your case it may be
different.

joey · July 4, 2016, 2:39am

Thanks. We gave this a try today. When the route is only over VPN (SSTP, L2TP, PPTP) the remote router receives the BGP table as distance metric 20, routes are created, and resolution goes out over the WAN (which we set to distance metric 30 to test) and not over the VPN. Any chance you could past an example from one of your VPNs? It sounds like you are doing EXACTLY what we’re doing so we would very much benefit from your guidance.

pe1chl · July 4, 2016, 7:01am

Like all solutions it is pretty simple once you have found all pitfalls.
As mentioned, in the BGP instance confguration for “default” set the redistribute-connected checkmark.
Set an AS number for the site, e.g. from the private AS number range 64.512 – 65.534 or 4200000000 - 4294967294.
Configure BGP Peer entries in each router for the routers it can reach directly (over WiFi or VPN), with the
AS number of the peer, the address of that router on the link, TTL 1.
Of course you need to delete your static routes once you have set that up.

When you want to distribute the default route via this mechanism, you can set the default-originate option to
“if installed”. However, when you are using VPN there should be a route to your VPN server which in normal
situations would be obtained from some internet connection and would be the default route. So in that case
it needs more work, either with a hardwired route for the VPN server or with policy routing.

When you want to use different preferences, don’t touch the distance values yourself but let BGP manage that.
This is possible by using “BGP community values”. Create routing filters and configure them for the peers.
You can have a single input filter like this:

/routing filter
add bgp-communities=1:50 chain=filter-in set-bgp-local-pref=50

Then all incoming routing info with community value 1:50 (just an arbitrary number) will get BGP local preference 50
which is lower than the default of 100. This means a route with this attribute will not be selected when another
route without it is available.

Also create an output filter on the VPN server (in our system we have a single central VPN server and all routers are
clients of that server) where this community value is set for the outgoing routes.

It is also a good idea to put some subnet filtering in the route filters so your BGP routing distributes only the subnets
you use in your network and not any internet connection addresses you use:

add action=accept chain=filter-out prefix=10.0.0.0/8 prefix-length=8-32
add action=accept chain=filter-out prefix=0.0.0.0/0
add action=discard chain=filter-out

You add these to a “filter-out” filter that you use as output filter on all the peers.

joey · July 6, 2016, 1:01am

We were using default. Thanks for TTL 1. We missed that option.

When you want to use different preferences, don’t touch the distance values yourself but let BGP manage that.
This is possible by using “BGP community values”. Create routing filters and configure them for the peers.
You can have a single input filter like this:

/routing filter
add bgp-communities=1:50 chain=filter-in set-bgp-local-pref=50

Then all incoming routing info with community value 1:50 (just an arbitrary number) will get BGP local preference 50
which is lower than the default of 100. This means a route with this attribute will not be selected when another
route without it is available.

Also create an output filter on the VPN server (in our system we have a single central VPN server and all routers are
clients of that server) where this community value is set for the outgoing routes.

Thanks. We are playing around with this but haven’t run into a situation where we need this. Yet.

It is also a good idea to put some subnet filtering in the route filters so your BGP routing distributes only the subnets
you use in your network and not any internet connection addresses you use:

add action=accept chain=filter-out prefix=10.0.0.0/8 prefix-length=8-32
add action=accept chain=filter-out prefix=0.0.0.0/0
add action=discard chain=filter-out

You add these to a “filter-out” filter that you use as output filter on all the peers.

Yes, we already have these and a few others.

What is still happening though is VPN flapping on the remote side. It get’s the BGP tables and sets a recursive route over the WAN instead of the SSTP tunnel. IP ROUTES shows that our static routes are still in effect at Distance 1 and the BGP routes are Distance 20. Instead of saying 10.0.0.0/8 via sstp-tunnel is says 10.30.20.1 recursive via ether1-WAN. We only get the routes when have the IP address family turned on. l2vpn and vpn4 are on but don’t do anything. I even sent the 0.0.0.0/0 route as distance 200 and it still prefers that over the tunnel at distance 1. So something else is probably saying “don’t use the tunnel” but we can’t seem to find it.

e.g.
remote router

/routing bgp instance
set default as=1030222 redistribute-connected=yes router-id=10.30.222.1
/routing bgp network
add network=10.0.0.0/8
/routing bgp peer
add address-families=ip,l2vpn,vpnv4 in-filter=thor-in multihop=yes name=Thorodin \
    out-filter=thor-out remote-address=10.30.20.1 remote-as=20 ttl=1 update-source=\
    bridge1

and filters

/routing filter
add action=discard chain=thor-in prefix=10.30.222.0/24
add action=discard chain=thor-in prefix=0.0.0.0/0
add bgp-communities=1:50 bgp-local-pref=50 chain=thor-in
add action=accept chain=thor-out prefix=10.30.222.0/24
add action=discard chain=thor-out

Main router

/routing bgp instance
set default as=20 out-filter=bgp-out-filter redistribute-connected=yes \
    router-id=10.30.20.1
/routing bgp network
add network=10.0.0.0/8
[size=100]add address-families=ip,l2vpn,vpnv4 multihop=yes name=NV0N remote-addres[/size]
    10.30.222.1 remote-as=1030222 ttl=1 update-source="ether9-LAN MASTER

and filters

/routing filter
[size=100]add bgp-communities=1:50 bgp-local-pref=50 chain=bgp-out-filter[/size]

We do have BGP working router to router. We just can’t get it to transit over VPN. We’ve got to be doing something really stupid I think.

pe1chl · July 6, 2016, 8:17am

Try setting nexthop-choice=force-self on all the Peer definitions.
Then it should only create routes that are directly between the routers, not all those strange recursive routes.
But in any case, when you want to have default routes distributed over BGP and have a VPN over internet that
is outside that network, you need to do something to get the VPN packets themselves properly routed.
So either set a fixed /32 route for the destination of your VPN (hopefully it is a fixed address), or use a separate
routing table for the 10.x network (specify it in the BGP instance using routing-table= and use policy routing
(IP Route Rule) to use that traffic for your own network and not for the VPN tunnels (select on source 10.0.0.0/8).

joey · July 7, 2016, 12:26am

That didn’t work unfortunately. I have

add check-gateway=ping distance=1 dst-address=10.0.0.0/8 gateway=sstp-THORODIN
add check-gateway=ping distance=1 dst-address=172.16.0.0/16 gateway=sstp-THORODIN
add check-gateway=ping distance=1 dst-address=192.168.0.0/16 gateway=sstp-THORODIN

and when I turn BGP on I get lots of records that look like this when I do an export but when you see it in the dialog box they all say RECURSIVE via ether1-wan.

 3 A S  10.0.0.0/8                         sstp-THORODIN             1
 5 ADb  10.20.2.0/24                       10.30.20.1               20
 6 ADb  10.20.3.0/24                       10.30.20.1               20
 7 ADb  10.30.20.0/24                      10.30.20.1               20
 8 ADb  10.30.100.0/24                     10.30.20.1               20

Unfortunately nexthop is always ether1-WAN distance 30 instead of the SSTP link.

pe1chl · July 7, 2016, 7:38am

I have no experience with this SSTP VPN. Does it create a separate interface with a /30 network?
I always use /30 networks for direct links (like GRE over IPsec VPN) and /29 for a radio link (to have IP
addresses for the WiFi equipment in the same subnet) and then the IPsec peer is defined with the
address from that small subnet. It works perfectly here, all the BGP routes are created with the
correct nexthop (the remote router IP on the /30 or /29).
Don’t put all connecting IP addresses in the same subnet (e.g. /24) because BGP will believe there is
full connectivity in that subnet.

joey · July 18, 2016, 11:30pm

We weren’t able to get this working.

We switched back to OSPF and the issue we have is that we need to apply weights to a VPN. If we add it to the OSPF interfaces, once the VPN goes down the interface shows up as unknown and when it’s backup it’s auto-added by OSPF with the default weight (which is too low).

I played around with a few different interfaces and EoIP seems to be a possibility but you have to give it a different address.. we run a 10s network and OSPF sets it as passive until we change the address to 172.16.*

it’s been very frustrating for us to get any sort of networking. What works on Cisco doesn’t work on Mikrotik.