Multi-WAN Load Balancing Starlink issue

Hello Sindy,

I am pleased to say that our VPN connection is working again and I have verified that I can access all local devices in Unalakleet remotely. I am interested in the housekeeping script even though 5 minutes isn’t a huge deal since it’s just for remote access. That’s one problem solved.

Now I just need to get our Sonar Billing instance re-established as it controls the DHCP server and customer account packages. Sonar is integrated with the MikroTik using the secure API on it’s default port of 8729. Connections from sonar come from a source IP of 52.158.209.86 which is static. This IP is standard across all Sonar instances in the USA per documentation. On the Sonar side under the networking settings. The only settings needed for the MikroTik are: WAN IP, Username, Password and Port Number as shown in attached screenshot. This was all working prior just like the VPN so just need some guidance on what to change on the MikroTik. The difference is this is a connection coming into the MikroTik.

Currently I have no filter rules that would block incoming connections, Firewall is wide open until I get it working then I will work on the security.

I assume I will need separate routing table and address list so those I have already created. I would like to use the same interfaces 9 and 10 to keep things simple if possible.

Thank You!

Jaysen

A big fat NO for this. Firewall is the first thing to deal with when you connect something directly to the internet, always, no exceptions. The filth from the net is incredibly fast to squat in. Since the VPN works, you should be able to only allow connection to the router itself via the VPN and via LAN as backup and block everything else except what you know needs to be open.


Great.


The router always answers incoming connections from the same address to which they have arrived, but you need to make sure it will also send them via the correct interface because it does not choose a routing table up to a source address automatically. We need to let it use a routing table that contains a single route through the correct interface or at least has such a “correct” route as the most preferred one. If the WAN addresses were static, you could use routing rules for that, but that’s not possible here (unless we would make the lease script update them as well) so we need mangle rules again. The generic way to do this is to use connection marking where we save the information about the in-interface of the initial packet of the incoming connection to the context of that connection maintained by the connection tracking module and use it to assign a routing mark to the response packets. So add a rule to the top of chain prerouting of mangle, and keep the rest of rules in that chain disabled for now so that none of them would overwrite the connection mark assigned by this one.

/ip firewall mangle print chain=prerouting where !dynamic
/ip/firewall/mangle/add chain=prerouting place-before=0 in-interface=sfp28-10-wan10 connection-state=new action=mark-connection new-connection-mark=use-wan10

Then, add a rule to chain output of mangle that will assign an appropriate routing mark based on this connection mark:

/ip/firewall/mangle add chain=output connection-mark=use-wan10 action=mark-routing new-routing-mark=for-l2tp passthrough=no

Now the router should respond via wan10 any connections that arrive to the IP address assigned to wan10.

It is just a quick solution because I am almost asleep; tomorrow I’ll give you some additional information regarding possible redundancy scenarios.

Thanks again for the help Today and I do agree with the points you made regarding the firewall. Now that VPN is working again, I will do that. I am going to work on fixing the Sonar connection and then setup firewall like I have with all my Oregon sites. Have a good rest. I’ll look for any responses here Tomorrow.

A couple of questions regarding Sonar:

  • can it use an FQDN of the router it manages rather than an IPv4 number?
  • does it require a continuous connection or it is not an issue if it loses contact for minutes?

Hello Sindy,

Sonar is working again as of last night. Unfortunately it does not have the capability to use FQDN and not sure if they will add that in future updates. I would like that option though. I don’t think a few minutes would be too big of an issue but it’s preferred that it be continuous as it controls the DHCP server leases and plan speeds. When it’s offline I have to do everything manually on the MIkroTik when adding/removing customers.

I would look at BigLeaf.

Let them Bond and Distribute Across the feeds back to their VPS. Then on to the internet.

Thank you for the suggestion. I will look into them. In the meantime, I hope to get the rest of these routes working today and the rest of the terminals into my NMS monitor. I do appreciate everyone’s patience and assistance with all of these issues but progress is being made and client is happy. I have approx 14+ more villages in Alaska coming up so this is quite a learning experience. Unalakleet is the first 2.5 Ghz broadband installation in the state of Alaska (So I’ve been told) so they are going to be the example that we showcase to the rest of the villages. This is huge and people’s lives are going to be positively impacted due to all this work. This will enable access to distance learning opportunities, Access to TeleMedicine and of course the occasional Call of Duty match. :smiley:

gotsprings looks like your trying to put mikrotik wan solutions under the bus LOL. Here I am trying to figure out optimal failover WAN approaches and it turns out I just need to use BigLeaf…
Please send $$$$

The reason for the question was that the router could update a DNS record in your company DNS if the latter has an API for that, or using the Mikrotik “ip cloud” service, or using some 3rd party dynamic DNS service that has a simple enough API. The Mikrotik service has a limitation of a single IP address per device, a fixed hostname generated from the serial number of the device (so to stay safe when using the FQDN to reach that device from too many other devices or if a VPN connection established towards that FQDN is the only way how those other devices can be reached, you still need another DNS with a CNAME pointing to that fixed FQDN based on device serial number), and I have seen it to be down for days in the past which apparently wasn’t Mikrotik’s fault but for some it was a really tough time.

The use case here would be that if the chosen terminal would fail, the router would use a backup one to update the dynamic DNS with the public address of the public one, and the Sonar could reconnect.


Well, my question was whether it was a provisioning tool (which your explanation seems to confirm) or whether it directly controlled the traffic (in terms of e.g. cutting off a client if they run off their quota, I have no clue what your business offer is up there). A provisioning tool only has to work when you actively use it, so you can manually change the address if the currently configured one changes or dies; a traffic policing tool needs a constant connection.

What you can do to make VPN traffic switch to a backup WAN far sooner than the VPN client detects an outage and re-establishes the connection via the backup WAN is to establish two VPN tunnels, one using strictly the preferred WAN and the other one using strictly the backup one, and let one of these VPN tunnels be a backup for the other one. This approach doesn’t suffer from the issue of src-nated connection surviving an outage of the uplink until the lease is lost as mentioned earlier, but it requires a compatible setup at the VPN server side.

With such a setup, you can also set up port forwarding for Sonar on one of your machines in a datacenter with a static public address to the CCR in Unalakleet via this pair of VPN tunnels. But it is still only a protection against failure of a terminal, it cannot handle an absence of a functional satellite within reach or a failure of the gateway machine.

Totally unrelated, you have mentioned that people keep getting security warnings. Some web sites like to handle requests within the same application session by different servers at their end, but check whether all the requests come from the same public address at client side and either reject them or at least issue security warnings if not. To avoid this, the hash in the per-connection-classifier matcher must be calculated solely from the src-address. This means that all outgoing connections of a given LAN address will always get mapped to the same WAN address (unless it fails of course), so the traffic will not be distributed as evenly as when you hash both addresses and ports, but it may be bearable (and also controllable to some extent, you can change the addresses of the clients that generate most traffic to evenly distribute them over the WANs manually).

I can assist, just bear in mind the time shift. But your last config posted should work if you enable the mangle rules and the routes and redo the QoS-related mangle rules in chain forward so that they would not use connection marks, because they overwrite those assigned in prerouting.

To do so, take all the match conditions of the mangle forward rule that assigns the connection mark, add them to the mangle forward rule that currently translates that connection mark to a packet mark, remove the match on connection mark from the latter rule, and remove the former rule completely. Do this 4 times and that’s it.

If no route is active in a routing table indicated by the routing mark, the system uses routing table main instead (unless a routing rule explicitly prohibits that). So the existing configuration does contain a backup for the case that a single Starlink terminal stops working.

As I wrote earlier, it is better to replace both-addresses by src-address in the per-connection-classifier.

Hi Sindy,

It might be best to start fresh with the mangle rules. The ones I had originally created were based on another posters advice which is different. While I appreciate their help. What you and I have done up to this point, things are beginning to work. Since they are all disabled, It might be easier for me to follow what you are saying if I start fresh. I’ve been pulled in all different directions the last few days with multiple clients so my train of thought on this is a little off. I am thinking we get rid of all the disabled ones in the screenshot or at least the ones that are not needed. Then make corrections to the ones I do need based on what you are saying. It’ll make more sense to me that way. I appreciate your understanding as this is new to me.

Thank You

Speedify has a lab that ran that at one point.

But Big Leaf seams a bit more business oriented.

Let’s go that way if you like, but I’d prefer a more interactive communication channel than the forum. This kind of “share the wisdom” sites is great to describe typical setups and principles so that others could follow them, but there are so many topics dealing with load distribution&backup here that I can’t see any point in documenting the process here for the 500th time, so only the waste of time remains. So please consider following this post.

FWIW the PCC youtube vid made by MT is quite good.
https://www.youtube.com/watch?v=nlb7XAv57tw

Used it again to clean up an AC3 LTE setup for PCC sharing across VDSL and LTE.
Only, THIS time I disabled the subtitles which all of a sudden made me see a couple of important things I missed the previous time I saw that video.
And now it works stable and reliably :laughing:

Just saying …

Hello Sindy,

I did more work on it last night. I updated and activated all the mangle rules plus added a few more. I also added in all the masquerade rules for each interface. So far things appear to be working great. All interfaces are responding to ping and SNMP queries so I am now monitoring them in our NMS. Sonar and VPN connections are still working and no complaints so far from the village that internet isn’t working. I’m out of the office most of the morning but plan to work more on it later Today.

I did setup some basic firewall rules using the section on that in the Wiki but it still needs a little more work which I also plan to complete Today. I’ll post an updated config later tonight and you can make any suggestions on how to improve things.

A question I have on that lease script. I assume that is meant to be placed inside the script section in each DHCP client? It makes sense to me that it would be appropriate to put it there but figured I would ask.

I do have a remote windows box at the tower as a backup way into the network in case VPN fails for any reason. I am going to play around with some DDNS clients then work with Sonar to see if they have a backend way of making things work with an FQDN.

Thanks again for all the help

Awesome, I’ll take a look at it. Thanks for sharing

Actually, I just viewed this video and I think it has some serious flaws LOL, and no I am not just saying that to contradict holvoe, as much as fun as that is. :wink:.

To be precise, it is meant to be placed just once into the /system script section, and its name to be placed to the script item of each DHCP client. You could put the complete script to the script item of each DHCP client (that item is interpreted in a contextual way, i.e. if the contents is a single word, it is interpreted as a script name, otherwise it is interpreted as a script code), but that would be a maintenance nightmare.

Referring to the script this way still allows you to create a modified instace for some special purpose or just for testing new features and let just one of the DHCP clients use it.

Ok Anav.
Eagerly awaiting your version then …
Let’s see if it is better.