I know this question has been asked but I have done some testing and can not find a forum that applies and works for our specific scenario. This should be fairly straight forward but I am not the best at scripting or mangle rules so I am hoping the community can help! This is also a two part question!
We have a customer that has a Mikrotik as their gateway router. They have Comcast as their primary ISP and it is setup fairly standard (static public, quad zero, etc…). They have recently added a secondary internet connection with fiber via Century Link. The Comcast line connects to ether1 and the Century Link connects to ether13. There is a quad zero route for both WAN connections and both are verified. If we fail over to the fiber line, all traffic fails over as it should. Previously, they had a separate router that had a T1 line. The fiber is a bit of an upgrade. Key part here is the T1 connected to a different router than the primary line, while the fiber actually connects to the same router. This is also the corporate location for our largest and most important customer so we really do not want to risk downtime if avoidable.
Question 1
The problem is currently both WAN interfaces are not pingable. So, we can only ping the WAN interface that is active at that moment. I understand why this is happening but am not sure how to fix it. I have read some forums about setting up mangle rules that tag the inbound traffic and send it back out the same default route but I had issues getting it to work and was hoping I could get some help with the configuration. Or is there a better way to do this?
Question 2
Historically, if the Comcast line went down we were able to log into the T1 router and get to the Comcast router and fail over the LAN. Since we are decommissioning the T1 router we are going ton lose this ability. As a scenario example… If the Comcast internet connection fails a couple hops out then the default route on the gateway Mikrotik will not fail over, even though the internet is actually down. We had a previous customer that had a similar scenario and our old Network Admin setup a script that would ping something public, like 8.8.8.8, and if not available it would fail over to the other default route. The script often had issues with flapping the routes back and forth though and never worked correctly. What is the best way to handle this with Tiks? I guess if we can get the fiber pingable from question 1 it would allow us to get in and change the administrative distance if this occurred but it is not very automated.
I am more concerned with the first question currently but it would be nice to have a nice clean solution for both of these. Thoughts? I can provide configs or whatever is needed.
One of them is setting default route through an Internet host like 8.8.8.8 by using recursivity, so that the “ping gateway” ROS feature controls the failover.
I have had issues with this, and wouldn’t be surprised if your ISP, or further up the uplink filters traffic addressed to such IP as gw.
There’s another approach: using netwatch + simple up/down scripts to control the failover, while monitoring an Internet address, like 8.8.8.8 or 8.8.4.4.
Label default route on main WAN with something on its comment, like “DEFAULT”
Create secondary default route via secondary WAN, with higher distance than the DEFAULT route
Create an static route for e.g. 8.8.4.4 via main WAN, so that it can be used to monitor main WAN
As there’s a specific static route to it, it will always try to reach 8.8.4.4 through main WAN
If 8.8.4.4 is not reachable, down script will disable the route labelled “DEFAULT”. As next lower distance route is the secondary WAN, it will get active on the routing table, and traffic will start flowing through it
While this is happening, netwatch will keep trying to reach 8.8.4.4 through its specific route via main WAN; if it comes back, netwatch up script will re-enable main WAN default route.
Note: ensure customers don’t use 8.8.4.4 as DNS, as it will fail when main WAN is down (or use other reliable anycasting host for monitoring).
Best practice anyhow is setting DNS cache on the mikrotik, and making sure (DHCP, PPPoE) that DNS IP it’s handed to clients as DNS.
Thank you! I will look into what they use for DNS or find an alternate to point the route to. I like this take as it is much easier and less complicated than some of the others I have seen.
Do you have any insight into how to make both WAN’s accessible? The problem I assume I am running into is when traffic comes in to the new fiber it is going back out the comcast line so it is never making the complete connection.
For example, I can ping the public IP of the comcast line but not the fiber. If I fail over to the fiber line I can then ping the fiber but not the comcast.
I have seen some people get both WAN IP addresses accessible with mangle rules marking traffic bound for the WAN but have not had much luck with this. Thoughts?
Do you have any insight into how to make both WAN’s accessible? The problem I assume I am running into is when traffic comes in to the new fiber it is going back out the comcast line so it is never making the complete connection.
For example, I can ping the public IP of the comcast line but not the fiber. If I fail over to the fiber line I can then ping the fiber but not the comcast.
Sure, best resource to get a good understanding and fix that: have a look at Tomas Kirnak’s Load Balance / Mangle Deep Dive presentation.
Thank you Pukkita! I was able to get this to successfully route traffic in WAN1 back out WAN1 and traffic in WAN2 back out WAN2, based on the great video you linked to. The gentleman actually did a great job at explaining most of the config and I was able to follow along. Mangle rules are one of my weak areas and I absolutely felt like that video helped!
add action=mark-connection chain=forward connection-mark=no-mark comment=“Forward Traffic in Comcast to LAN mark WAN1->LANs”
in-interface=ether1 new-connection-mark=WAN1->LANs
add action=mark-routing chain=prerouting comment=“Connections in LAN marked WAN1->LANs mark route ISP1_Route” connection-mark=WAN1->LANs
in-interface=ether6 new-routing-mark=ISP1_Route
add action=mark-connection chain=input connection-mark=no-mark comment=“Input Traffic Century Link Mark Connection WAN2->ROS” in-interface=ether13
new-connection-mark=WAN2->ROS
add action=mark-connection chain=forward connection-mark=no-mark comment=“Forward Traffic in Century Link Fiber to LAN mark WAN2->LANs”
in-interface=ether13 new-connection-mark=WAN2->LANs
add action=mark-routing chain=prerouting comment=“Connections in LAN marked WAN2->LANs mark route ISP2_Route” connection-mark=WAN2->LANs
in-interface=ether6 new-routing-mark=ISP2_Route
I have a follow up question though. In the video posted he also added this configuration:
Can you give an example of where the mangle rule below would apply that the others don’t? It looks like it is telling all traffic on the connected networks to bypass all other mangle rules but I am having a hard time wrapping my head around why or an example of why that would be needed.
If you create a Netwatch monitor that reaches out to 8.8.4.4 and if it can’t reach that address is disable the main route until reachable again, won’t that IP become instantly reachable when it fails over to the second backup default route, causing it to fail back over to the failed line?
If you create a Netwatch monitor that reaches out to 8.8.4.4 and if it can’t reach that address is disable the main route until reachable again, won’t that IP become instantly reachable when it fails over to the second backup default route, causing it to fail back over to the failed line?
No, because there’s a more specific static route via main WAN.
More specific routes (dst-address=8.8.4.4/32) always prevail over less specific ones (dst-address=0.0.0.0/0 in case of default routes).
If there were a more specific route than default, but less specific like the one we used (e.g. dst-address=8.8.0.0/24), traffic towards 8.8.4.4 will still use the more (/32) specific route.
Yes, I had a momentary lapse of conscious. I posted a second message directly after my previous questions stating that I saw how you addressed this in your original message and to disregard. Sorry!
i have problem
(wan1
wan2
wan3) these are one link via load balancing continuously from same isp
wan4 is my other isp link
when internet from first 3 wan goes down than it does not start from 4th backup link
as gateways are reachable
routerboard is rb750gr3