Dual Wan Failover help needed

I’ve been working with Mikrotik for over 10 years. I’m not a network engineer by any means, but I’ve learned a lot over the time.

I’m trying to set up a dual wan failover on routeros 6.x. I have tried many ways and from the simple to the complex the routes do go up and down properly.

The issue I am having is when failover occurs I can no longer access the network from the outside. I think somehow the routes are holding on to old connections, but I’m not sure.

I have read almost every post on the subject, and tried everything except for mangle type setups involving firewall rules. I suspect this is the only way to get it done though.

I’m trying to keep things as simple as possible with as little impact on the network as I can. If there is a way to accomplish it without mangling and firewall rules I would love to learn. I have tried every method as explained in all the posts but it just does not work.

I’ve spend the past week reading and experimenting 8-10 hours per day on the subject. I’m reaching out here because I know there are a lot more people that know a lot more than I do.

Thanks for the help.

Could you expand a bit more on these symptoms you’re having? What I mean more specifically, is that typically if a failover occurs, you’re not expected to be able to access the network from the outside on the existing connection which has gone down (for the duration that it’s down). So that would be completely “normal” or expected.

Did what you mean to say was that when the failover occurs, the devices on your own network may temporarily lose connection from wherever they were connected and be unable to reconnect for a certain period of time?
Or are you saying that, for the duration of using the failover connection, you are unable to connect in as you normally would from the outside, even if using the failover IP address ?

Setting up dual-wan from scratch can be a tricky one. Most of us have spent a lot of time on the subject so your frustration is definitely understandable! There are also a few “variations” in how I’ve seen it set up / implemented, but the basic underlying concepts are the same and there are going to be a few essential aspects of configuration which you must have in place in order to achieve the desired operation correctly and with minimal interruptions (in which some scripting may also be necessary, like to properly flush the firewall connections when the failover takes place for instance)

As always, the best thing you can do right now is post your config /export - otherwise it’s not going to be easy for anyone to see what’s wrong with your setup and offer suggestions or point out any obvious mistakes

this is what happens:

  • isp1 goes down
  • failover occurs to isp2
  • internal network is able to access the internet
  • vpn, port-fowards, etc are NOT accessible from the outside for the duration of failover

I would like to be able to have a seamless transition so that no matter what route is active, failover or main, the network operates exactly the same as it would without a dual wan setup.

I have a section on this thread - see item I. that describes the very basic setup… https://forum.mikrotik.com/viewtopic.php?t=182373

Just using that to get a sense of where you are at on your IP route journey!


A network diagram is always helpful but most important is to look at the config
/export hide-sensitive file=anynameyouwish

Just ensure you replace any public IPs or gateway Ips with xx.xxx. etc…


Seamless is a relative term, clearly if one has started a session via one WANIP, normally that session will be lost if the WAN goes down…
If you want truly seamless, then you need some form of HA, which means two ROUTERS HOT, one live, one ready to take over and not even sure MTs firmware has got their yet in terms of truly seamless.

As the old saying goes, “been there… done that”.

I’ve read everything you wrote on the subject, and it’s great by the way. I’ve learned so much.

I have a feeling that I cannot accomplish what I want unless I use mangling and firewall rules.

I just don’t understand exactly what the problem is.

The routes always update and change to the correct distance, etc, but something gets stuck.

I have noticed in the firewall connections when everything is working the dst address is equal to the gateway.

When it is not working, the dst address is stuck on the previous gateway.

I have tried every method I know to get it to switch but no luck.

To add to what anav has said. In a nutshell, here’s what I’d look to accomplish if I were you:

  • Each wan connection needs its own routing table with a default route set in that routing table for its designated wan gateway respectively
  • You should have policy based routing firewall configuration - so mark connection for all packets (incoming, outgoing, forward) on each wan interface, and then additional rules to mark those connections for routing to the tables you created in the previous step
  • Regardless of how you check for connectivity and switch between the wans, you should wipe firewall connections with the connection mark of the failed wan interface as soon as the switchover occurs so that affected vpn’s etc will not simply “time out” but immediately/actively try to re-establish a connection over the backup interface. If this is executed and timed properly, then tunnels such as wireguard will not notice barely any downtime because the handshake over the backup connection will take place almost instantly. The same goes for users on your network playing games or browsing the web. They may be disconnected from their sessions, but the delay in reconnecting should be eliminated because the firewall is no longer holding on to that connection

I don’t think you can accomplish what you’re trying to do unless you bring in firewall rules but I’ll let the qualified experts take over from here lol ;D

is there a way to simulate this to give it a try before I get into the firewall marks?

I thought if I wiped all firewall connections it should accomplish the same thing?

I’ve tried all the methods by extended and anav and others and none of them worked.

Is there a method that does work?

Okay it sounds familiar but everyones situation is unique in some aspect.

I had a situation with two providers, cable, no issues always gets a new IP and the manual IP route I have set in always gets populated with the new ISP gateway IP.
However my fibre op ISP doesnt play well.
Whenever the IP address is changed, due to being done by the ISP (dynamic but not frequent), or I pull the plug or power outage etc, IT REFUSED to put the new gateway IP in the manual route created… Hence I had to do this via script.

So if you can isolate if one particular ISP connection is the problem, I would bet that when you check under IP DHCP settings, that under STATUS, when the ISP comes back online, that you see a new IP address and new IP gateway, but when you go down to your IP route rule there is no connectivity because the ISP gateway IP is set at the old one!!!

This is just one possibility of course and am guessing but what is what was happening to me.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

So the way I tackled it, with much help from others, certainly beyond my scope LOL…
Was to say yes to ADD default route in IP DHCP client and then add a script right in the settings…
So I went to advanced settings and added the below to the open box area.
I cannot remember why but I put in a default route distance of 255, probably so it was never actually chosen for routing purposes but served the purpose of adding the updated gateway IP to the current IP route so the much lower distanced routes would take precedence.

The script appears to say define the parameter iface as the gateway IP. and if the IP DHCP value is bound then find the local gateway and get the value of the gateway IP and then
put that value anywhere you find the key words… Someone else can likely state it far more clearly…

:if ($bound=1) do={
:local iface $interface
:local gw [ /ip dhcp-client get [ find interface=$“iface” ] gateway ]
/ip route set [ find comment=“PrimaryRecursive” gateway!=$gw ] gateway=$gw
/ip route set [ find comment=“SecondaryRecursive” gateway!=$gw ] gateway=$gw
/tool e-mail send to=“myemail@address.ca” subject=([/system identity
get name]) body=" This is your new gateway IP: $gw";
:local sub3 ([/system clock get time])
/tool fetch “https://api.telegram.org/bot…/sendMessage?chat_id=-9999999&text=At+$sub3+BellFibre+Changed+WANIP
:log info “Telegram notification sent VlanBell IP Changed”
}

Which you could shorten by removing the comms stuff at the bottom too:

:if ($bound=1) do={
:local iface $interface
:local gw [ /ip dhcp-client get [ find interface=$“iface” ] gateway ]
/ip route set [ find comment=“PrimaryRecursive” gateway!=$gw ] gateway=$gw
/ip route set [ find comment=“SecondaryRecursive” gateway!=$gw ] gateway=$gw
}

I do this by ensuring the key or connecting word is included in my comment but nowhere else in the router config ( unique ).

/ip route

add comment=PrimaryRecursive distance=3 dst-address=1.0.0.1/32 gateway=ISP1-gatewayIP scope=11
add comment=SecondaryRecursive distance=4 dst-address=9.9.9.9/32 gateway=ISP1-gatewayIP scope=11

To put it in a different way, before you go down the rabbit hole of mangling, simply asking to ensure/determine that the WAN connectivity is or isnt the real issue.
If its determined that WAN connectivity up and down and getting new IPs and new IP gateways IS NOT the issue then it would certainly seem that ensuring complex LAN traffic would be appropriate.

However as I stated before,
/export hide-sensitive file=anynameyouwish

There is no point in anyone here continuing to guess if you dont provide more information.
I would include what your users should or shouldnt be able to do, including any specific WAN usage expectations…

I don’t want to go down the rabbit hole!

I have tried what you suggested, but until now I didn’t realize exactly why.

I am going to test it out more, and then I will get back to you.

Thank you for taking the time to help! It is very much appreciated.

I can confirm that the correct routes are populating on the interfaces before and after failover.

What is the best way to check the traffic flow once I failover? I’d like to see if I can track down why I cannot connect at that point.

And, when I failover and cannot connect, how can I reset the router so I can connect on the failover route? I’ve tried cleaning the firewall connections, disabling and enabling the DHCP interface and physical port but nothing works. Are there commands I can use to try to reset the connection?

All of this info will help come to a solution in the end. Even if I have to use firewall rules, at least I will know exactly why.

I have narrowed it down to what seems like this:

  • when the failover occurs, the previous isp address is still showing up in the firewall connections. You can see this in ip firewall connections dat-nat.
  • after about an hour or so, the connections switch to the proper ones and everything is accessible again from the outside.

I have tried so many things to flush the dns, turn on/off firewall tracking, removing firewall connections, and nothing helps.

There has to be a way to make the router think it’s starting fresh to clear everything out- it happens automatically after about an hour, but how can I make it happen right away?

Thanks.

Im am confused, what does the ISP firewall address have to do with FIrewall RULES??
Not sure where they are showing UP??? I dont have them on my firewall rules???

In other words, please post latest complete config
/export hide-sensitive file=anynameyouwish
but hide public Ips wherever they maybe but ensure consistency so that if an ISP is indicated by ISP1 somewhere, its called ISP1IP in all locations etc…

Upon countless hours of research I’ve come to the conclusion the only reliable way for me to have dual wan failover with access from the outside is to use recursive with firewall mangle rules.

That being said, I see there are many scripts out there.

Does anyone have one that they use that is tried and true reliable?

I appreciate the input.

Thanks.

I have access to my router from the outside without any mangling and I have dual wan failover. :slight_smile:

There are many options!!!
You have free dyndns URLs that will keep the IP udpated on clients that are trying to reach the main router.
You have IP Cloud on the router itself which will update to the working reachable public IP address.

If using wireguard you could always go cheap and simply create two wireguard interfaces, with two different listening ports.
If you cannot ping public IP1, then use WAN2 associated wireguard interface option on your client device, when you want to access the router.

anav if you have it working without mangle and firewall rules, would you mind sharing your script?

I have tried everything and cannot get access to the outside upon failover or failback.

The routes just hang on to the first thing they connect to no matter what. No clean out has worked.

Maybe you have something I don’t.

Well for starters on my CLIENT MT ROUTER for example, or my IPHONE wireguard settings,
the ENDPOINT is not an IP address its the IP Cloud address of the the MAIN router (server).

So I dont really care what ISP is up, I get connected.
The input chain listening port is agnostic to what ISP is being used for the connection.
The Router if failover works, keeps track of which route is available in the main table.

(to be fair I actually wireguard into an RB450Gx4 that is behind my main router, as the main router is still on VERS6 but the principal is the same and I do use the MAIN router IP cloud address).

What do you do???

my setup is very simple. I have a cable ISP and Verizon 5g isp. Both are wired incoming connections.

I have 2 routes set up with cable as 1 and Verizon as 2.

When cable goes down, Verizon kicks in.

At this point I can not access from the outside, even though internally everything works fine.

When I look in the firewall settings, all connections are still going through the cable IP address, even though the Verizon address is active.

Apparently there is no way to release what needs to be released to make it work. After an hour or so it will work, and I’m guessing that something times out to make it work.

If I could find the magic that makes it time out, I would be ok, but I have tried everything and nothing makes it work from the outside.

If I didn’t need outside access it would be fine, but I do.

So, if anyone does have a working script that will mangle and firewall to make it work I would love to see it and give it a try.

Mangling wont help if you have issues understanding and utilizing your ISP connections in the first place.
Nothing you said makes sense regarding them not working for failover.

Further, no one is going to help without seeing the why to the above which means doing what you have avoided to really get help and that is provide your config
/export hide-sensitive file=anynameyouwish.

Since your in LA…
https://www.youtube.com/watch?v=hZM_x-P_AVk
… or we may need to send W. Smith over to knock some sense into you.

Or I am no scooby doo
https://www.youtube.com/watch?v=pT20g6lTZ-k

The issue I am having is not with failover. The failover works great. It’s the fact that I have ports forwarded for cctv and others that DO NOT WORK after the failover occurs. When you look in the firewall connections, it still shows the old route and connections are still going through the old route until they time out which takes forever.

Anav, can you absolutely confirm that you have a working failover that allows you to access your forwarded ports from the outside when the failovers occur without waiting forever for them to time out? If you do, I am asking how are you getting this to happen?

Maybe I’m not being clear or explaining myself correctly. I am not a network engineer so I may be saying the wrong thing.