Failover PPPOE+STATIC with dynamics gateways

Nikvir · July 25, 2024, 1:45pm

Hello,

I have a problem with failover for two WAN links
WAN1 link - master - PPPOE (static IP, public ip, dynamic gateways)
WAN2 link - slave - static IP, static gateway

I was setting it up
Wan1 - distance=1
Wan2 - distance=2

For example:
WAN1 - ip:1.1.1.10/30 gw 1.1.1.1 or gw 1.1.1.2 or gw 1.1.1.3
WAN2 - ip:2.2.2.2/30 gw 2.2.2.1
LAN - 3.3.3.1/24

The switching itself worked, but the problem is connecting from the outside with both WAN1/WAN2 addresses
If WAN1 works, I cannot connect to the router with the WAN2 address and vice versa.

I have already used many guides, including recursive routing, querying DNS with a specific gateway. I used package marking and still nothing.

The biggest problem I noticed is that my main service provider (WAN1) provides me with the Internet via a pppoe connection from which I get a fixed, public IP address, but the gateway is already dynamic (it changes within three addresses - three pppoe servers of the service provider)

In the pppoe-client configuration I must select “add default route=yes” with the distance I want. That’s the only thing I can change.
If I do not select default-route=yes and enter the gateway in /ip/route manually, the link will not work.

I also added the option below and WAN1 also did not work

/routing filter add chain=dynamic-in distance=10 set-distance=1 set-routing-mark=wan1

Routerboard version 6. Currently, I only have the distance field set in the configuration. I have removed the firewall rules.

Do you have an idea what I’m doing wrong?

sindy · July 25, 2024, 3:43pm

With ROS 6 and PPPoE with a dynamically changing IP address, making the failover work requires scripting at least in terms of updating static routes with the updated address of the gateway (unless the PPPoE uplink is the backup one). To make the router respond from the address to which the connection request has arrived, you need policy routing - routing tables and routing rules as a minimum, in more complex cases also connection marks.

That is a big red no on a device that is directly connected to the internet with a public IP on it. The filth from the net is incredibly quick to squat in if you open the door this wide.

On top of disabling the firewall as explained above, no one can say without seeing the export of your configuration. Run /export hide-sensitive file=somenicefilename , download the somenicefilename.rsc, obfuscate all public addresses, usernames, and other sensitive pieces of information, and post the redacted file between [code] and [/code] tags.

Nikvir · July 25, 2024, 5:12pm

Routerboard version 6. Currently, I only have the distance field set in the configuration.

With ROS 6 and PPPoE with a dynamically changing IP address, making the failover work requires scripting at least in terms of updating static routes with the updated address of the gateway (unless the PPPoE uplink is the backup one). To make the router respond from the address to which the connection request has arrived, you need policy routing - routing tables and routing rules as a minimum, in more complex cases also connection marks.

I have removed the firewall rules.

That is a big red no on a device that is directly connected to the internet with a public IP on it. The filth from the net is incredibly quick to squat in if you open the door this wide.

Do you have an idea what I’m doing wrong?

On top of disabling the firewall as explained above, no one can say without seeing the export of your configuration. Run /export hide-sensitive file=somenicefilename , download the somenicefilename.rsc, obfuscate all public addresses, usernames, and other sensitive pieces of information, and post the redacted file between [code] and [/code] tags.

Thank you for being interested in my post. The main link is PPPOE. The IP address is always the same. Only the gate is variable.

When I wrote that he removed all firewall rules, I meant the packet marking and routing rules for both WAN’s under failover.
Tomorrow I will try to do a clean configuration with the data as I provided at the beginning of the post and send it. Maybe I’m making some trivial mistake that will immediately catch your eye

Nikvir · July 26, 2024, 10:07am

This is my test configuration on a virtual machine. He is only there so that you can diagnose the problem. I followed the guide from https://khmernetworkings.blogspot.com/p/mikrotik-fail.html

From what I see in the routing table 0.0.0.0/0 gw pppoe-out1 is reachable but there is nothing in the Routing Mark column (it should be to_ether1-ISP1-PPPOE)
Addresses 10.0.10.0/24, this is only the connection address for tests

/interface ethernet
set [ find default-name=ether1 ] comment=WAN1-MASTER name=ether1-ISP1-PPPOE
set [ find default-name=ether2 ] advertise=\
    10M-half,10M-full,100M-half,100M-full,1000M-half,1000M-full comment=\
    WAN2-SLAVE name=ether2-ISP2-STATIC
set [ find default-name=ether3 ] comment=LAN name=ether3-LAN
/interface pppoe-client
add comment=WAN1-MASTER disabled=no interface=ether1-ISP1-PPPOE name=\
    pppoe-out1 password=test user=test
/interface list
add name=WAN

/interface list member
add interface=ether1-ISP1-PPPOE list=WAN
add interface=ether2-ISP2-STATIC list=WAN
/ip address
add address=192.168.2.253/24 disabled=yes interface=ether2-ISP2-STATIC \
    network=192.168.2.0
add address=2.2.2.2/30 interface=ether2-ISP2-STATIC network=2.2.2.0
add address=10.200.10.1/24 interface=ether3-LAN network=10.200.10.0
add address=10.0.10.180/24 interface=ether2-ISP2-STATIC network=10.0.10.0
/ip cloud
set update-time=no
/ip dns
set servers=8.8.8.8
/ip firewall mangle
add action=mark-connection chain=prerouting connection-mark=no-mark \
    in-interface=pppoe-out1 new-connection-mark=ether1-ISP1-PPPOE_conn \
    passthrough=yes
add action=mark-connection chain=prerouting connection-mark=no-mark \
    in-interface=ether2-ISP2-STATIC new-connection-mark=\
    ether2-ISP2-STATIC_conn
add action=mark-routing chain=prerouting connection-mark=\
    ether1-ISP1-PPPOE_conn in-interface=ether3-LAN new-routing-mark=\
    to_ether1-ISP1-PPPOE
add action=mark-routing chain=prerouting connection-mark=\
    ether2-ISP2-STATIC_conn in-interface=ether3-LAN new-routing-mark=\
    to_ether2-ISP2-STATIC
add action=mark-routing chain=output connection-mark=ether1-ISP1-PPPOE_conn \
    new-routing-mark=to_ether1-ISP1-PPPOE
add action=mark-routing chain=output connection-mark=ether2-ISP2-STATIC_conn \
    new-routing-mark=to_ether2-ISP2-STATIC
/ip firewall nat
add action=masquerade chain=srcnat out-interface=pppoe-out1
add action=masquerade chain=srcnat out-interface=ether2-ISP2-STATIC
/ip ipsec policy
set 0 dst-address=0.0.0.0/0 src-address=0.0.0.0/0
/ip route
add check-gateway=ping distance=2 gateway=2.2.2.1 routing-mark=\
    to_ether2-ISP2-STATIC
add check-gateway=ping distance=1 gateway=pppoe-out1
add check-gateway=ping distance=2 gateway=2.2.2.1
/ip service
set telnet disabled=yes
set ftp disabled=yes
set www disabled=yes
set ssh disabled=yes
set api disabled=yes
set api-ssl disabled=yes
/ip ssh
set allow-none-crypto=yes forwarding-enabled=remote
/routing filter
add chain=dynamic-in set-check-gateway=ping set-routing-mark=\
    to_ether1-ISP1-PPPOE

I had a problem with this test setup. I entered the wrong distance for ISP2. I changed it but I can’t change the photo in the post

@Sindy, maybe you can provide me with some proven guide that I can test.
Please note that I cannot manually set the gateway for PPPOE

sindy · July 26, 2024, 12:43pm

OK, so bit of a theory and a “menu” to choose from.

A “failover” as such is one thing, the trigger of a failover is another one. And making the router itself send a response via the WAN interface through which the request being responded has arrived is yet another one, and making the router forward a response from a server in LAN via the WAN interface through the triggering request has arrived (which is not part of your OP but a common requirement) is yet another thing.

So first - for a plain failover by priority where a loss of connection to the primary ISP is the trigger, you do not need any additional routing tables, setting a lower distance of the route via the backup WAN than the one via the primary WAN is sufficient. However, if the connection to the ISP stays up but the ISP has some connectivity problems, your router will keep using the primary WAN in such case and lose access to internet.

Anything else but the above requires additional routing tables, unless you would accept that a script would be disabling and re-enabling the default route via the primary WAN depending on whether that WAN has internet connectivity.

To monitor the internet connectivity of a WAN, one typically uses pinging of an address in the internet which you can expect to be always available. For that, you can use a netwatch item, a periodically scheduled handmade script, and a check-gateway=ping setting on a default route if you choose the script-free approach based on a creative misuse of recursive next-hop search.

The first two options do not require multiple routing tables as explained above; the third one does, and so does the ability to respond via a WAN interface chosen by other criteria than just the presence of the internet connectivity on the first one.

The only case where you need the gateway of the default route via PPPoE to be set to an IP address is when you use the recursive next hop search on RouterOS 6; in all the other cases you can set the name of the interface as the gateway and everything works as needed, so it doesn’t bother you that the IP address of the gateway the ISP indicates is changing.

As for responding via the proper WAN, the solution is simpler if it is enough that the router itself does that; if you plan on having some servers in the LAN and dst-nat rules, and want the servers to be also accessible via the secondary WAN even if the primary one is working, you may want to implement a more complicated solution straight away that allows both so you don’t need to implement the simpler one to cover the responses of the router itself separately.

So knowing the menu, which possibilities do you actually want to implement?

anav · July 26, 2024, 12:47pm

Readers Digest version.

Please state the actual requirements! ( without discussing the config)

Identify all the users ( external, internal, admin, groups, devices etc.
Use case approach, detail all the traffic they need to be able to execute, ( originate, reply etc.)

Network diagram detailing WAN and LAN details and traffic flows on ports etc.