Hi Gang,
Background:
I help maintain an RF connected network using various Mikrotik devices and I need some help with redundancy:
- The network is used for emergency communications.
- The network consists of just under 40 sites.
- 80% of the sites are on mountain tops which are inaccessible in winter and have battery backups.
- Of the Mountain sites, ~ 50% of them have access to commercial internet.
- Each site is connected to at least one other using an RF shot.
- Each site has it’s own subnet (e.g. 10.30.20.1, 10.0.5.1, etc)
- Each router has it’s own static routing table. (e.g. on 10.30.20.1 (Mount Thorodin) a route says that 10.30.30.1 (Squaw Mountain) is accessible over ether3-squaw. That ethernet port is hooked into an ubitik which does the actual RF traffic.)
The Problem:
The network is not yet RF redundant (We’re volunteers. The network is self-funded with no commercial or government funding). We routinely take lightening strikes or have some sort of temporary power outage which breaks a segment of the network (i.e. a site goes dark and the RF links to it goes offline). Sometimes the site that goes down is the only way into/out of a geographical area. Our current and problematic solution is to manually turn on a VPN connection to bypass the dark site and on both routers make all the necessary routing changes. (It can get more complex than this unfortunately but it’s a good summary.) The VPN work around is 100% manual and takes us a few hours to get everything correct. On our two main hubs our static routing tables are at the point where they are getting to be unmanageable. Our main hub has 200 route entries with VPN workarounds.
What we’re thinking:
We’ve been looking into creating some sort of always-on, or dial on demand, VPN connections with a routing distance of 2 and using OSPF to sort it out. We attempted to do this previously. It worked on our test bed of 5 routers but when we turned it on in production we encountered VPN flapping (We have one site that has personal VPNs as well as site-to-site VPNs and OSPF tried to use the personal VPNs.), OSPF advertised public WAN IPs (we likely goofed up the config), and we encountered what we think was a firmware bug which we believe is now fixed. We’re looking at attempting this again, with OSPF, before the summer is over. We’re also looking at possibly using EoIP (with IPSEC enabled) with RIP instead. EoIP would allow us to make static routes of distance 2 and possibly use RIP to figure out when the ethernet connection of distance 1 is down.
Where you come in:
Despite the size of our network, none of us are trained network engineers. We’re hoping some helpful folks here could offer up some comments and suggestions about how we might solve the problem of a site going dark in some (ideally mostly) automated fashion. We’re eager to solve this problem because a) the manual process is time consuming and b) the network is used for public service emergency traffic and we want it to be as robust as possible. Does anyone have any ideas on how we can make this system better?
Thanks for your time.
Joey