Multiwan + QOS (Voip etc) on hEX

I wonder if anyone could help point me in the right direction to get started on a configuration?

Our office is out in the countryside with relatively poor broadband (10Mbps down, 1Mbps up).
As we work with a lot of data we need to run multiple DSL lines to get the bandwidth we need. We currently have two lines from different providers with plans for a third.
To make matters more complex, we also have no mobile phone service so need to use Femtocells/boost boxes for mobile coverage and we use VOIP phones .

I’d recently purchased a Mikrotik hEX router to see if it might help with better QOS vs our current TPlink router. (R470t+) to allow us to have good quality voip as well as as much bandwidth as possible. (if it works we’ll buy a bigger router :wink:

There are two phases to the plan, the first is simply to replicate what we have, but with better QOS control, and the second phase is to add a third DSL line.

We have about 15 computers in the office doing everything you might expect in a busy small data science company (from SSH to large files transfers overnight with lots of voip and web in between)
There are 6 IP phones on the same LAN and using the same IP subnet but with their own range of addresses.

I would like to share the bandwidth between two lines as evenly as possible.
I would like several queues for traffic, with the highest priority to VOIP traffic from dedicated devices (voip phones and boost boxes… these have fixed IP addresses within a defined “voip” range on our LAN). The lowest priority would be file transfers, including knowing when a desktop is downloading something huge via http (or windows updates).
Ideally we would have control over bandwidth limits and priority for at least two or three other queues… so maybe a total of 5?

Where should I begin to understand both QOS configuration and multiwan load balancing?

(one of the wan links is via a routed connection to the ISP’s router (BT) and the other is via a PPPoE connection (HG612 modem).. does that complicate things? I prefer the PPPoE solution as it gives the router it’s proper (fixed) IP address)

  1. once that is working I would like to add a third DSL line.
    However, if the mikrotik will support it, I would like the VOIP traffic and one or two selected PC’s to use that line most of the time, but in the event of a failure on WAN1 Or WAN2 this third line would also become a failover line.
    Is this even possible?

Also on the wishlist:
VPN server support to allow us to connect to the office from remote locations (anyone tested the hEX throughput?)
Line failure notification? (we use PRTG internally but the router will mask a line failure as it switches over traffic automatically)

ok.. not much luck with that one :slight_smile:
could anyone point me to the most relevant manual pages or examples of someone accomplishing something similar?

I would advise you to play around with it first separately.. QoS with redundant links can quickly become very complicated, and won’t function properly if it is set up wrong.

First, learn to set up QoS with only one uplink; then, without QoS, learn to load balance links (PCC). Then, find a way of merging them together.

It will be very complicated to try learning both for the first time and setting them up in an integrated fashion. I would say it is nearly impossible for someone new to MikroTik to successfully set up both together simultaneously as a first time setup. QoS becomes several orders of magnitude more difficult to set up when you are load balancing links.

There are many good tutorials for QoS and PCC on their own, I don’t know of any for both together.

as mducharme says, mixing PCC and complex QoS on the same device is hard to accomplish.

I have an idea to overcome this:

use the hEX to do PCC load balance, firewall, NAT and fail over, use 1 hap lite (very cheap) to do complex QoS, using 1 hAP lite to each wan connection, i have tested hAP lite and it can move around 30 to 40mbps of traffic with a heavy QoS config (163 mangle rules) (26 queues queue-tree)

about the QoS topic, keep in mind one key principle: you only can apply proper QoS when you are who queue the packets, at the moment another device out of your control do queuing your QoS is nos effective, because that you have to leave 15-30% of available bandwidth on your internet connection(s) to QoS work properly.

good luck, i think mikrotik routeros is powerful enough to do excellent traffic control

Many thanks for the feedback and apologies for the slow response. (I was not getting notifications of posts)

Having experimented with Mikrotik software for a couple of hours it certainly seems very powerful and indeed complex.
I’ll definitely try one thing at a time per your advice @mducharme

I’m surprised this is not a more common requirement though. I would have though that priority queuing over at least a pair of load balanced lines would be something that many people/companies would want to do.. esp when some very important work takes very little bandwidth, but video streaming/ OS updates etc take huge bandwidth but are lower priority.

(I was also hoping that the queuing logic might be somewhat separated from the routing one… i.e I could define sets of logical queues that could feed a sort of virtual WAN link that in reality represents the real ones)

@chechito I was trying keep it all in one box for admin reasons (and because there is also another box on each line acting as a modem/bridge/router as well)… so having 7 boxes starts to be a bit messy and adds to the admin complexity… and I have to hand this off to other team members when it is operating… but I am certainly going to look at the option as it’s probably cheaper to use hardware than to spend the time on complex software :slight_smile: thanks for the suggestion

I’ll keep looking at the various tutorials and see what comes up… but if anyone has experience with what they feel the “best” tutorials are I would appreciate a link.
(and if i get it working I will post the links that helped)

This is a common requirement, and is in many cases easier - but not with Internet connections. Load balanced lines and load balanced Internet connections are two very different things.

Consider this diagram:

    ISP
     |
Main Router (Company)
  |             |
  |             |
Branch Office (Company)

There, there is a single ISP providing service to the company, with load balanced / redundant private circuits between the main router and branch office are inside the company’s network and under the company’s IP space. This means that everybody can talk to each other. The load balancing solution is fairly simple, since it can be accomplished just with route costs, and QoS by setting up interface-attached queue trees on the interfaces on either side.

It is much more complicated in a situation where you have two ISP links (assuming you are not peering with them with BGP):

ISP-A  ISP-B
   |     |
   |     |
Main Router

Because then, the two links that you need to load balance with are NOT under the control of the company, and have different address spaces given by the ISP (or perhaps different ISPs). Most networks have routing rules that, if a reply packet comes from the wrong address or different route, it should be dropped. That means that you end up needing to create extra rules to make sure that when the router sends a reply to a packet sent by ISP-A, that the ISP-A interface is used to send the reply and not the ISP-B interface, and that keeps a session on the same interface rather than sending some packets from one interface and some from the other. That’s where the complexity comes in. Good documents on PCC setup take care of this part, but don’t address the QoS aspect, which becomes much more complicated (as you will see) when you have PCC.

If you get your own IP space from ARIN or whatever authority it falls under, and you get two redundant BGP peer connections, most of this complexity vanishes. However, that kind of connection is not generally available for the same price point as a regular DIA, cable, or PPPoE internet service.

It is separate, but you actually have to pull it together. I’ll explain how you would probably do the QoS part, once you get the PCC working.

Queueing upload is fairly easy to do - the trick is that MikroTik needs to be told how fast your upload connection is. You can make a few queue trees connected to the upload interfaces to the two ISPs, and those queue trees would need a max-limit of a speed just below what that ISP provides you with. It is very easy to handle just upload.

There are two problems with queueing download. Normally you would want to queue download on the other side of the slow link by having a router there as well, and queuing it as upload (that is what you would do in the case of redundant links), but here that is not a choice. So, the two problems are:

  1. You can only queue download after it has gone across the congested link. This is imperfect by definition, it is not guaranteed. What this means is, if you have a 10Mbps download, and you set your download queuing to 9.5 or something (you have to limit to just below your actual download), the rate of packets coming in the connection can still exceed 10Mbps because your queue is not taking effect until after. The TCP windowing mechanism will ensure that TCP streams slow down to your 9.5Mbps queueing; however UDP is another matter. Certain UDP streams like Torrent traffic fire packets at the user like an Uzi. This overwhelms the congested link at the ISP end, before it even hits your router and before your router has a chance to drop it. Unlike your careful QoS setup, the ISP would drop all packets equally, and drop voice packets, torrent packets, whatever is necessary in order to rate limit you to your purchased 10Mbps. Still, at least queueing download on your side is somewhat effective with TCP streams and other normal traffic, and so it is worth doing for that reason.
  2. You have to queue download to just below the maximum rates for the internet connection it comes in on - meaning, you need two sets of queues, one for each download link, and match packets coming in on that link with the appropriate queues.

While with upload queuing, you can use interface-attached queue trees, for download you would need to use a global-attached queue tree and mark the packets appropriately not only based on their content but also based on what connection they came in on so that they will be associated with the correct queue.

It is much simpler on upload. On upload you can stamp any packet (with a ‘packet mark’), by creating mangle rules based on criteria, for instance you might use the following marks:

voice-up
importantdata-up
no-mark  <-- default, used for normal data

Then, interface-based queue trees will take care of assigning the traffic shaping and QoS allocations to the two ISP connections. You would create two queue trees.

ISP A tree (limit to just below ISP A upload rate) (parent = WAN interface that connects to ISP, not global)
       L voice (match packet mark voice-up)
       L importantdata  (match packet mark importantdata-up)
       L normal (match packet mark no-mark)
ISP B tree (limit to just below ISP B upload rate) (parent = WAN interface that connects to ISP, not global)
       L voice (match packet mark voice-up)
       L importantdata  (match packet mark importantdata-up)
       L normal (match packet mark no-mark)

Download is possible, but much more difficult. You will need to have a series of packet marks for instance:

ispa-voice-down
ispa-importantdata-down
ispa-normaldata-down  (here you can't use no-mark!)
ispb-voice-down
ispb-importantdata-down
ispb-normaldata-down  (here you can't use no-mark!)

Then those will need to be attached to a single queue tree with parent=global, with the following structure:

Download Queueing - parent (global attached)
   L  ISP A tree (limit to just below ISP A download rate) (child of 'download queueing')
             L voice (match packet mark ispa-voice-down)
             L important (match packet mark ispa-importantdata-down)
             L normal  (match packet mark ispa-normaldata-down)
   L  ISP B tree (limit to just below ISP B download rate) (child of 'download queueing')
             L voice   (match packet mark ispb-voice-down)
             L important (match packet mark ispb-importantdata-down)
             L normal (match packet mark ispb-normaldata-down)

Creating the queues is fairly easy, the difficult part is going to be constructing the mangle rules in such a way that identifies not only what type of data the packet is (voice, important data, normal data, whatever other categories you want), but also identifies what ISP interface the packet arrived through. If you accidentally mark a voice packet from ISP B as ‘ispa-voice-down’ then the whole thing falls apart because the MikroTik starts thinking the voice packets from ISP-B are counting towards the download rate of your ISP-A tree and not the download rate of the ISP-B tree. If some download packets aren’t marked at all (ex. if you forget a rule, or there is a logic error in your rule design) then the queue tree can think it has free capacity when it actually doesn’t, and again the QoS falls apart. You also have to make sure that you don’t mark an upload packet with a mark you are using for download (ex. marking a voice packet going to ispa as ispa-voice-down instead of voice-up). In that case, it would not be counted towards the upload queue and would incorrectly be counted on the download queue, again causing QoS to not work properly.

As to how to get there, I would recommend first trying out QoS on its own, as I said. There are two scripts that I might suggest you try out, since one would be my recommended approach for upload and the other would be my recommended approach for download:

My own script, showing interface-attached HTB (queue tree), which is the easiest and best way for you to handle upload traffic for the two connections: http://forum.mikrotik.com/t/fasttrack-friendly-qos-script/102401/1
Greg Sowell’s script, which uses global-attached HTB (queue tree), which is the only way you could handle download traffic from two providers: http://gregsowell.com/?p=4665

Greg Sowell’s script does some other things differently - he doesn’t use DSCP at all or packet priority (IP precedence) and directly marks packets based on the criteria. With mine, you would apply a DSCP tag to the packet and then the priority and the packet mark would be derived from that. In your case, you use either approach, as long as you are consistent. if your packets are passing through more than one router, the DSCP approach might make more sense; if that is your only router, marking the packets based on the criteria might be more efficient.

Keep in mind that in your case you will need to disable fasttrack entirely to do QoS with two connections. That means disabling the fasttrack-connection rule in IP->Firewall.

WOW… many thanks for taking the time to write such a comprehensive explanation.
I understand the challenges now.
I’ll go through it all and see how it translates to our own situation … and how we might be able to simplify it in the short term to get something running

(I was thinking about just keeping the third dsl line for voice for now and routing all traffic from voip devices down that route until we properly crack the config for more complex QOS/Queueing .. all the voip phones and mobile boost boxes are in a contiguous ip range… and we could even put them on a separate subnet if that helped).. not sure if this helps or hinders in terms of routeros config though

The extra DSL for voice is probably a safer option, when taking into account the possibility of UDP streams like torrents congesting the circuit at the ISP side before it gets to your router. At least if the line is only for voice, you know that shouldn’t happen.

In general it is better to set up a voice VLAN, so a different subnet. This is mostly useful for your switches to prioritize the voice traffic in the event of layer 2 congestion. To do this, you can use ‘set priority’ action in mangle on Mikrotik, and if the traffic has a VLAN tag it should automatically copy this priority value to the tag’s CoS value, which can be understood by most managed switches as voice traffic as long as they are configured properly. If you anticipate congestion on your switch<–>router interconnect, you can either use two interconnects or do queueing there.