WAN Failover and/or recursive routing issue

I’m trying to implement a temporary automatic WAN Failover on a CCR2004 according to official help page “Failover (WAN Backup)”
https://help.mikrotik.com/docs/pages/viewpage.action?pageId=26476608.

I cant reproduce the exact steps of the tutorial becuase:

  • my secondary wan router gives addresses by an internal DHCP, but I disabled “add default route” in ros dhcp client for that interface;
  • I currently use a large set of connection marking in mangle rules for queues.

NAT, Raw Filtering, Filtering in Firewall seems ok because if I do a “manual failover” (enabling the correct secondary default route in /ip/route and disabling the primary), I can access Internet from my LAN.

But when I leave the routing decision to ROS no, I can’t even ping internet hosts from the router.

What i did is summazied in the following code.

/ip/firewall/mangle
add action=mark-routing chain=output comment="Mark routing ISP1" new-routing-mark=ISP1 out-interface-list=\
    "WAN ISP1" passthrough=no
add action=mark-routing chain=output comment="Mark routing ISP2" new-routing-mark=ISP2 out-interface-list=\
    "WAN ISP2" passthrough=no



/ip route
add disabled=yes distance=1 dst-address=0.0.0.0/0 gateway=172.30.32.126 pref-src="" routing-table=main \
    scope=30 suppress-hw-offload=no target-scope=10 vrf-interface=vlan-wan-isp-l2
add disabled=yes distance=1 dst-address=0.0.0.0/0 gateway=192.168.0.1 pref-src="" routing-table=main \
    scope=30 suppress-hw-offload=no target-scope=10 vrf-interface=vlan-wan-isp-l2
    
add disabled=no distance=1 dst-address=8.8.8.8/32 gateway=172.30.32.126 pref-src="" routing-table=main \
    scope=10 suppress-hw-offload=yes target-scope=10
add disabled=no distance=1 dst-address=8.8.4.4/32 gateway=192.168.0.1 pref-src="" routing-table=main \
    scope=10 suppress-hw-offload=yes target-scope=10

add check-gateway=ping disabled=no distance=1 dst-address=0.0.0.0/0 gateway=8.8.8.8 pref-src="" \
    routing-table=ISP1 scope=30 suppress-hw-offload=yes target-scope=11
add check-gateway=ping disabled=no distance=2 dst-address=0.0.0.0/0 gateway=8.8.4.4 pref-src="" \
    routing-table=ISP1 scope=30 suppress-hw-offload=yes target-scope=11

add check-gateway=ping disabled=no distance=1 dst-address=0.0.0.0/0 gateway=8.8.4.4 pref-src="" \
    routing-table=ISP2 scope=30 suppress-hw-offload=yes target-scope=11
add check-gateway=ping disabled=no distance=2 dst-address=0.0.0.0/0 gateway=8.8.8.8 pref-src="" \
    routing-table=ISP2 scope=30 suppress-hw-offload=yes target-scope=11

Any help?

Understanding the requirements is key.

a. identify user(s)/device(s) or groups of users/devices on the network
b. identify what traffic needs they have including the admin.

Why do you need queues.
What is the purpose of failover → Primary to Secondary?
Do all users go to Primary or is there a mix.
Do any external users come into play ( port forwarding to servers, or VPN into router )?

You are few steps forward.

Everything was OK with ISP1 only. Now ISP1 is failing since few days and I just added an ISP2. I can do a “manual failover” as described on ISP2.

But if I try to setup the automatic failover / restore as of https://help.mikrotik.com/docs/pages/viewpage.action?pageId=26476608 nothing works. From what I understand can be routing issue (since Firewall NAT/Raw/Filter is fine with manual failover).

The today’s requirement is simply the automatic failover /restore.

Simple restore

/ip route
add distance=5 check-gateway=ping dst-address=0.0.0.0/0 gateway=ISP1_gateway_IP routing-table=main
add distance=10 dst-address=0.0.0.0/0 gateway=ISP2_gateway_IP routing-table=main

This doesn’t address the requirements of the help page I mentioned.

I dont give a rats ass about a help page, the help page is not in your brain, the help page has nothing to do with your requirements. Make love to the help page for all I care.
I asked for requirements you have not been cooperative.
If you dont want help then I will move along.

Don’t understand your slang but seems unkind to me.

I was asking help about “WAN Failover and/or recursive routing” and the the steps of the official help page I mentioned at the very beginning of my post.

I don’t write well in english so I preferred to mention the official “Failover (WAN Backup)” help page because it’s exactly what I want to setup because my understanding was that that method was the best and recommended way to accomplish the wan failover.

I think that official help page was derived from this other, older, disccussion http://forum.mikrotik.com/t/advanced-routing-failover-without-scripting/136599/1 From this last one:

what if your modem is up, and telephone line is down? Or one of your ISP has a problem inside it, so traceroute shows only a few hops - and then stops…

This is why your proposal doesn’t work.

One way to handle this case without scripting is apparently called “recursive routing” but I’m unable to explain to you how this is supposed to work better than that page or that discussion.

If I were, I’ll probably be able to fix it in my setup by myself.

We want the same thing, your routing to work.
Follow the instructions for routes to ensure that the basic routing approach works, then we can add recursive.
Its called walking before running.

Also I need the requirements, as requested to help ensure the config design fits needs/expectation…
Google translate it…
a. identify user(s)/device(s) or groups of users/devices on the network ( both inside the network going out or from outside the internet coming in )
b. identify what traffic needs they have including the admin. (where they need to be able to go)

Sorry for late reply.

The basic WAN Failover approach didn’t work for us: the internet connectivity involved a “provider of our ISP1”, so our ISP1 router was reachable all the time and primary route never invalidated by ping-check. That’s the reason I seached for a different approach and found that recursive routing.

About our requirements we basically have two vlan, office and guests. With Firewall filter I already allowed office to use ISP1 and failover ISP2 (LTE), but the guests can use only ISP1. For now this can be enough

The most important thing for me now is implement this recursive routing i.e. automatic failover and understand why doesn’t work as expected.

/ip route
add distance=2 check-gateway=ping dst-address=0.0.0.0/0 gateway=1.0.0.1 routing table=main scope=10 target-scope=12
add distance=2 dst-address=1.0.0.1/32 gateway=ISP1_gateway_IP routing-table=main scope=10 target-scope=11
add distance=5 dst-address=0.0.0.0/0 gateway=ISP2_gateway_IP routing-table=main

Is as simple as it guest. you can even check two outside DNS addresses before flipping to WAN2.

/ip route
add distance=2 check-gateway=ping dst-address=0.0.0.0/0 gateway=1.0.0.1 routing table=main scope=10 target-scope=12
add distance=2 dst-address=1.0.0.1/32 gateway=ISP1_gateway_IP routing-table=main scope=10 target-scope=11
add distance=4 check-gateway=ping dst-address=0.0.0.0/0 gateway=9.9.9.9 routing table=main scope=10 target-scope=12
add distance=4 dst-address=9.9.9.9/32 gateway=ISP1_gateway_IP routing-table=main scope=10 target-scope=11
add distance=10 dst-address=0.0.0.0/0 gateway=ISP2_gateway_IP routing-table=main

The provider installed the LTE antenna in the past days, I’ll test this routing configuration in the next days.

I tested your configuration and it works as expected. Thanks.

I was and I’m still curious about the approach of

Thanks to the topic, I think I understood a bit better the matter, but I had to read many posts. I understood that both links are addressing failover and laod balancing wich is interesting becauase it’s my next step (if secondary LTE WAN has decent bandwidth).

So I’m now testing this approach. Following the help page I was unsuccessul, and, as a begginner, now it don’t understand some choice of this page, for example:

  • given that for laod balancing all traffic should be marked with ISP1 or ISP2, why the mangle rules are in output chain? marking in forward and output chain seems more correct;
  • there is a need to mark connection and then all packets from that connection? marking only packet is not enough?
  • why the mangle rules in output chain mark with ISP1 the packets that are leaving trough interface 1? if packets already have some out-interface, how a routing mark can affect the routing and steer them to another out-interface?
  • scope and target-scope in routes seems bizar.

I tried then to follow forum topic. I understand it better. In magle rules I mark all forwarded and output packets with ISP1 with few exceptions marked ISP2. Bu I still have some issue. Nothing works unless I also add default routes for both ISP in main routing table. Not sure if this affect results.

The need of default routes in main routing tables is not mentioned in the topic. But I read in help abut policy routing: For a user-created table to be able to resolve the destination, the main routing table should be able to resolve the destination too.

Maybe this behavior changed since the time the topic was written? It is correct to add two default routes in main table for both ISP with different distances?

Think about it.
You either have default routes from ppoe or IP DCHP client because on checked the box for default routes.
OR
You have to create them manually.
++++++++++++++++++++++++++++
If one was adding a private IP address for the WANIP directly again you would then have to add a corresponding default route.

For initial traffic to and from the ISP and for any services to establsh handshakes, the router needs its primary routes.

We don’t have DHCP Client o PPPoE.
WAN Interfaces have fixed IP addresses from private range of ISP L2.

EDIT: please ignore following routes, see next post.

I think I made some progress. There was maybe some minor mistake in the previous routes, but there was a bigger mistake in magle roules.

Now fowarding work as expected and default routes in main table is not anymore required for correct load balancing and failover when forwarding a packet.

What still doesn’t work at all without a default route in main table is traffic originating from the router itself. Ping from terminal to any Internet host gives “no route to host”.

I tried to add routing marks to packets (or connection marks to connection + routing marks to packets) in many combinations in mangle output chain but it seems packets from the router are never marked and so never routed.

To start simple, what is the correct mangle rule to apply my “ISP1 routing mark” to every packet so that everything goes to ISP1 routing table.

5 X  ;;; Routing Mark for ISP1 (load balancing default)
      chain=output action=mark-routing new-routing-mark=ISP1 passthrough=yes src-address-type=local 
      dst-address-list=!rfc6890_not_global_ipv4 log=yes log-prefix="output-mark-ISP1"

Network diagram would be helpful to understand what devices are involved etc.
config to see the routing and mangling parts and firewall rules in context…
Also what is the problem of manually creating a standard route for the WANs… this will not get in the way of anything??

If for example the problem is the gateway is the same, simply do this
/ip route
add dst-address=0.0.0.0/0 gateway=145.26.22.1%ether1-wan1 routing-table=main
add dst-address=0.0.0.0/0 gateway=145.26.22.1%ether2-wan2 routing-table=main

Where:
/interface ethernet
set [ find default-name=ether1 ] name=ether1-wan1
set [ find default-name=ether2 ] name=ether2-wan2

The forwarding works with routes above; the output doens’t.

The router has two WAN interface, each with a fixed IP address and a fixed gateway in the L2-domains of the providers. This is the only thing I know.
The mangle output chain consist now of only the above rule.

Currently, to make things “just work”, I had to add a default route in the main table for the primary wan but, if the primary WAN link fails, traffic from the LAN goes to ISP2, but traffic from router itself doens’t.

Should I replicate the recursive routes I did in the ISP1 and ISP2 tables also in the main table? Maybe. But it would be nice if we can avoid this.

As a side note, from what I know, route marking in mangle is permitted in prerouting and output and should works. I would like to understand why doesn’t.

From the documentation about the output chain (https://help.mikrotik.com/docs/display/ROS/Packet+Flow+in+RouterOS):

Routing Adjustment - this is a workaround that allows to set up policy routing in mangle chain output (routing-mark)

Typically to ensure WAN traffic goes out proper WAN you need:

  1. Sourcenat Masquerade for that interface. Normally addressed by the default rule.
    add chain=srcnat action=masquerade out-interface-list=WAN

  2. Mangle Rules: To ensure incoming on WANX goes out WANX

add chain=prerouting action=mark-connections connection-mark=no-mark
in-interface=WANX new-connection-mark=incomingISPX passthrough=yes
add chain=output action=mark-routing connection-mark=incomingISPX
new-routing-mark=useWANX passthrough=no

/routing table add fib name=useWANX
/ip route
add dst-address=0.0.0.0/0 gateway=ISPX routing-table=useWANX

OR.

  1. Routing Rules to ensure Traffic Originating on WANX goes out WANX

/routing table add fib name=useWANX

/routing rule
add src-address=own.ip.of.wanX action=lookup-only-in-table table=via-wanX

Note: Assuming the IP of wanX is static fixed, this is easy to do.

I think NAT is working well, becuase without failover routes I can send requests from LAN and from the ROUTER itself to the Internet by both ISP1 and ISP2.

Outgoing connection going to ISP1 are src-natted with our public ip address while those going to ISP2 are masqueraded (but I think they could also be src-natted to the fixed 4.4.4.1 address of the ISP2 interface given by the provider).

The idea of ensure incoming WANX goes out WANX is interesting but not critical at the moment because we don’t have any relevant incoming traffic for now and because ISP2 give us a nonpublic ip address at the moment, so incoming traffic arrive only by ISP1.

The real puzzle now is how to send most of traffic from the router itself to the Internet by ISP1 routing table and few exceptions to ISP2 routing table. As of previous images, the ISP1 use WAN1 if available and WAN2 as secondary choice; ISP2 the opposite. They are working for forwarding but not for output.

I think I found an explanation here: http://forum.mikrotik.com/t/is-mangle-output-chain-broken/135569/5

If my explanation is correct, I can’t understand why the need in not mentioned in the specific topics about WAN Failover by recursive routes, not even in the official Mikrotik guide.