MultiWAN with RouterOS

pcunite · January 18, 2023, 6:01pm

Title:
MultiWAN with RouterOS

Welcome:
This article aims to bring clarity to the daunting and confusing task of routing multiple WAN and ISP provider connections in and out of your network. We will be using RouterOS version 7 firmware to accomplish these techniques. Examples will include Static IP wan connections, DHCP assignments, PPPoE encapsulations, and using LTE as a backup or main internet connection. Each style has its own unique requirements to get the most out of the WAN availability. What happens when one goes down? Do you want to dedicate one ISP to a certain type of traffic? How does incoming traffic reach internal targets? These solutions and more will be discussed.

MultiWAN is not SD-WAN:
Utilizing more than one internet provider is a way to bring more capacity and uptime to your network. If you have latency sensitive traffic, it can be helpful to put it on another WAN connection. However, multiple wan connections by themselves don’t necessarily create a seamless experience for end applications because of the way current TCP protocols are designed. An ISP failover may result in temporary application disruption.

Advanced applications can cope if they are programmed a certain way but streaming style applications are very suspect of background changes. There is a way, using a layer of external servers and routers, to achieve that high level of ISP float technique, where the applications are not aware of what is happening below. Illustrating this is not the purpose of this document which instead focuses on a more rudimentary way to manage multiple internet providers given you only have one router to manage.

Why MultiWAN?
This sounds like a trick question. If a single provider had amazing latency, enormous bandwidth, and never experienced any downtime, why indeed would there be the need for another network? I suppose there would not be much of a need. But in the real world networks aren’t everything we wish they were. Sometimes internet providers can only offer so much latency, or they go down from events taking place in the area.

Sometimes it is more cost effective to put an application on one ISP and the main bulk of your network on another. There is also the possibility that you have a physically different connection to a network resource and pathway you own and control. Still another scenario is for when networks come and go based on where you might be. The router maybe a moving device and thus different wireless networks come and go. They don’t fail exactly, they just are not there for awhile.

MultiWAN Overview:
There is a lot to cover so make sure you read slowly. This article series will illustrate a scenario that will have several moving parts. You will not implement everything because your environment is different and custom to you. This means you will not be able to simply copy/past the examples. You will need to add, remove, or customize what has been presented. To do that, you will need to understand the mono sized router example.

The example shows a single router having many WAN connections. You will not have this exact arrangement. Do you have LTE? If not, do not implement that section of the example. A best effort has been made to create a single example that can easily shed various parts that are not needed. This article assumes a VLAN environment. So, consider that a requirement in your learning journey.

Disclaimer:
What follows is my best understanding of how to implement the stated goals in RouterOS v7 based on the generous feedback from many forum community members. I am a student and spend time in the forums to learn and give back when I can. I am not an expert nor am I even a forum guru as my forum title humorously states. There are far more qualified persons in attendance. My skill is taking what others have shared and then building a presentation around their thoughts and techniques. Thank you. Note that this article is new and has many mistakes. It will be updated many times until the community considers it the gold standard.

pcunite · January 18, 2023, 6:02pm

Example1:
We will lead into this topic using a generic example that covers the majority of situations. We have four WAN connections and need a method to determine when a primary connection is down. When primary is deemed down, the router should switch the environment over to another WAN network. Because we have multiple WAN connections, some traffic is configured to use a specific WAN. Static, DHCP, PPPoE, and LTE style WAN types are shown. Remove or rearrange assignments that don’t apply to you.

Are we down?
Finding out if a network is down is tricky. The absolute best way would be an application inside your network that makes connections to resources outside your network for every type of application you deem important. That would be a very special application. Lacking that, a low cost, easy to use, and included with RouterOS option, is to use a technique called recursive nexthop lookup (aka recursive routing). This just means that we validate the entire path to a host, instead of just the one connected to our router directly. This way, your WAN’s connection to the internet and thus outside of your ISP network, can be verified.

A caveat of course is that the host you are checking is outside your ISP and is itself not down. Because a single host could be down it is therefore highly recommended to check two separate hosts. You could check as many as you feel warrant a decision. Here we show two.

Gateway Ping
Route verification is performed with ping checks. Every ten seconds a ping is sent to a remote Host. Failing that, another ping is attempted. Two failed ping replies will set the route as invalid and unreachable. The check-gateway parameter helps us to accomplish this but not on its own. It is necessary to link two route commands with each other.

Scope and Target Scope
To validate a path to a remote host, we use route entries that are connected to each other over a Scope and Target Scope arrangement. These two parameters are an unfortunate abstraction that we as network administrators must deal with. I’ll attempt an explanation. Think of scope as the area of your concern. How much of an area, how far, how deep, and how wide of an area do you care about? This is your scope size.

Target Scope would then be the next area you care about. Since we are only concerned with next hop paths, the way we tell RouterOS to use a particular route as a next hop to be validated, is to set the Target-Scope higher than the Scope, effectively increasing the size of your standard scope. It takes two command lines to show this awkward representation and linking.

Forum member anav has beautifully hacked this concept by always setting the default Scope to 10 and using the Target Scope parameter to change the relationship. Note that two linked route entries is enough for validation as shown in the simplified diagram. But in our examples, you will note that we also add one more route for return traffic. This is because we want the ability to use the other ISP connections even when primary is up.

If we are up
When both networks are up, we show other networks being utilized instead of leaving them always idle. We also enable traffic to connect remotely into the network from any available WAN. Also shown is having certain traffic always leave out of a specific WAN.

DHCP WAN Type
If you have a DHCP connection to your ISP, you will note that we use a static route in the example. Attached to your DHCP client, is a curious script provided by rextended (who is author?). With a static route, the script becomes necessary if the ISP changes the IP Address or gateway for any reason. The script keeps your dhcp client and manually added route in sync.

When this event occurs, the script will fire and compare the client gateway value with any route that has a comment of ISP2_Monitor. The script reads like so: if this dhcp client has an ip address, search through all routes for a route in which the gateway does not match our own and which has a comment of “ISP2_Monitor”. If both conditions are true, change the gateway value.

LTE WAN Type
If you have LTE availability in your area, the best way to utilize this service is with an LTE enabled MikroTik router running ROS v7 which has better support for modems and behavior. To us, these hockey pucks are radios and that’s about all. To that end, enable the passthrough interface feature in the APN configuration. However, if the LTE hardware only has one ethernet interface (or you only want to use one interface), you’ll loose the ability to manage the LTE unit. This is not a problem, however with a simple VLAN between the two devices. Our example shows this arrangement. See the LTE Router Example linked below.

MultiWAN Router Example1
Example1.rsc (10.7 KB)
LTE Router Example
LTE_Router.rsc (2.08 KB)

pcunite · January 18, 2023, 6:02pm

Reserved

pcunite · January 18, 2023, 6:03pm

Reserved

pcunite · January 18, 2023, 6:03pm

Reserved

anav · January 18, 2023, 8:26pm

Excellent pcunite…if only my posts were so well put together as your approach LOL.

pcunite · January 18, 2023, 9:36pm

@anav,

That means a lot. Your help on the forums is felt and you’ve helped me personally. You know more than me! These long form articles take a long time to produce. So, don’t feel bad about that. We all have our own strengths. This is a way for me to give back.

Give me a few weeks to get this article in order and then we can hammer on it. I’m writing, producing content as I go. Then I’ll publish out the syntax.

holvoetn · January 18, 2023, 9:41pm

If it is the same quality as your vlan series, it will be a great contribution !
Subscribed to get the updates.

Amm0 · January 19, 2023, 4:34am

Good idea! Some significant number of posts on the forum involve some kinda “MultiWAN situation”. And @anav was going to run out of letters in his “new user” post.

I like the approach starting with the 4 WAN types you have. Think focus on “recursive lookup” (vs “netwatch techniques”) for upstream failure detection seems a good call… I use RRs (with PBR) today, but with all the new netwatch detection mechanism in 7.7, I can see a well-design script doing better than RR – but as generic/“general purpose” approach might be tough to explain. Now I guess, depending on when and what form BFD comes, that may offer a 3rd failure detection option – but that’s still not in V7 to even consider.

Anyway, I’ll follow with interest. Glad someone is taking this on – MT’s docs have long lacked on any kinda of respectable “user guide”, and offer very limited cookbook/recipe style docs to even hint at some canonical network architectures/designs.

rextended · January 19, 2023, 10:00am

@pcunite

Bravo.

pcunite · January 19, 2023, 12:53pm

I agree. I hope to see a new netwatch routine make it into this series. In the opening graphic, the text bubble states: “recursive lookup or netwatch techniques”. I wanted to lead with RR because it seems more popular on the forums. So, I speak to that. It is a mess to make sense of in the route menu, however. I would personally prefer a script, but that is going pretty deep into the woods. Actually, I would prefer an application in RouterOS to identify and process a failed network. I do hope to engage yourself and rextended to create something easy to follow as an alternative to RR. I’ll have some ideas to offer when we get to that section. BFD will take the forum by storm because sockets are the best way to identify and process downed networks.

holvoetn · January 19, 2023, 1:06pm

Is PCC also going to be part of the scope ?

pcunite · January 20, 2023, 7:35pm

A perfectly cooked beef tenderloin served with buttered mashed potatoes, lemon roasted asparagus, and ranch covered salad spears is not enough for you? You want … dessert too?

anav · January 20, 2023, 11:31pm

Pssst he’s Belgium has a France complex, dont mention desserts!!!

holvoetn · January 21, 2023, 8:15am

1 i don’t like dessert
2 isn’t pcc suitable for multi wan ?

Amm0 · January 21, 2023, 1:07pm

I’m sure they’ll be time for a debate about load balancing, e.g. PCC vs ECMP vs … But in reality you almost always want some failure detection, regardless if a failover or load balancing case.

Perhaps a nod to the “check-gateway=ping” that’s required in routes, before explaining the scopes and recursive routing? That’s actually what triggers/blocks the route recursion.

anav · February 2, 2023, 4:45pm

pcunite I noticed on ex1, if I am not mistaken, you are using 1.1.1.1 for an ISP address?? This is not an ideal choice as that is the IP address for clouldflare DNS services and I happen to use this as a host to check my recursive routes… very confusing when I saw that next to ISP… ??? Please change…

pcunite · February 2, 2023, 5:05pm

Opps, yes I had that incorrect in the diagram. Thank you. The rsc file was correct, however. DNS Host1 is 1.1.1.1, it is what the example is pinging. The ISP is 10.1.1.1 and the recursive route is naturally 1.1.1.1.

anav · February 2, 2023, 5:10pm

Can you explain how your script works ( ip dhpc script in first example)
a. what each command is invoking but in english and not script language, in other words right the script in words,
b. what does it do functionally
c. why is it needed.

pcunite · February 3, 2023, 7:58pm

Okay, Example1 is ready for intensive criticism. What could be more clear? Are there any errors?

A note about Example1:
This is a recursive routing example. It is supposed to stand on its own as a deliverable. It answers the question: “How do I wire up multiple WANs and make sending/receiving, custom routes in/out, and fail-over just work?” It is simple and will get people going with the basics. However, let’s see the kinda of questions we get.

For Example 2, maybe we have a scripting version (Netwatch) instead of recursive? Example3 would be WAN Load Balancing per holvoetn request?