Mangle "Mark Connection" Troubleshooting

Hi guys, wondering if someone can give me any pointers on this…

I have a working PBR config based on address lists and Mangle: If the source address is from the list “Use-WAN1”, mark it with the appropriate connection mark. And vice-versa for WAN2.

However packets from addresses that are not part of either my “Use-WAN1” or “Use-WAN2” address lists are being marked with the WAN1 connection mark regardless. At first I assumed this was because WAN1 marking was higher in the mangle order so was getting marked with that first, but even if I swap the positions of rule 1 and 2 - I get the same WAN1 connection mark… Any ideas why packets that do not match either rule are being marked regardless?

Here is a print of my mangle rules:

0 ;;; Allow connected networks to exit Mangle chain so we don’t load balance to our connected networks
chain=prerouting action=accept dst-address-list=Local Networks in-interface-list=VLAN log=no log-prefix=“”

1 ;;; Sort the traffic into WAN1 stream
chain=prerouting action=mark-connection new-connection-mark=WAN1 passthrough=yes dst-address-type=!local src-address-list=Use-WAN1 connection-mark=no-mark
in-interface-list=VLAN log=yes log-prefix=“”

2 ;;; Sort the traffic into WAN2 stream
chain=prerouting action=mark-connection new-connection-mark=WAN2 passthrough=yes dst-address-type=!local src-address-list=Use-WAN2 connection-mark=no-mark
in-interface-list=VLAN log=yes log-prefix=“”

3 ;;; Add routing mark WAN1 to the packets based on the connection mark
chain=prerouting action=mark-routing new-routing-mark=Route-WAN1 passthrough=yes connection-mark=WAN1 in-interface-list=VLAN log=no log-prefix=“”

4 ;;; Add routing mark WAN2 to the packets based on the connection mark
chain=prerouting action=mark-routing new-routing-mark=Route-WAN2 passthrough=yes connection-mark=WAN2 in-interface-list=VLAN log=no log-prefix=“”

5 ;;; Ensure traffic from the router itself returns through the proper interface WAN1
chain=output action=mark-routing new-routing-mark=Route-WAN1 passthrough=yes connection-mark=WAN1 log=no log-prefix=“”

6 ;;; Ensure traffic from the router itself returns through the proper interface WAN2
chain=output action=mark-routing new-routing-mark=Route-WAN2 passthrough=yes connection-mark=WAN2 log=no log-prefix=“”

7 ;;; Identify which WAN interface the traffic came in and mark the connections appropriately WAN1
chain=prerouting action=mark-connection new-connection-mark=WAN1 passthrough=yes connection-mark=no-mark in-interface=WAN1 log=no log-prefix=“”

8 ;;; Identify which WAN interface the traffic came in and mark the connections appropriately WAN2
chain=prerouting action=mark-connection new-connection-mark=WAN2 passthrough=yes connection-mark=no-mark in-interface=WAN2 log=no log-prefix=“”

9 ;;; Mark “management” traffic from the router on WAN1
chain=output action=mark-routing new-routing-mark=Route-WAN1 passthrough=yes out-interface=WAN1 log=no log-prefix=“”

10 ;;; Mark “management” traffic from the router on WAN2
chain=output action=mark-routing new-routing-mark=Route-WAN2 passthrough=yes out-interface=WAN2 log=no log-prefix=“”

Rule #7.

New connection from LAN which doesn’t get any mark goes out and uses default route which according to described behaviour leads to WAN1. Then first response packet comes back, and if you examine the conditions, both in-interface=WAN1 and connection-mark=no-mark match, so the connections gets WAN1 mark.

Ah, you’re right. Thanks Sob. If I disable that mangle rule I don’t get any connection mark. Was confusing to me because rule 1’s counters were appreciating, but now you’ve said it, it makes sense to me why that was.

Now I’m scratching my head about how WAN1 is selected as the default route seemingly 100% of the time… Both WANs have a static entry with the same distance:

ip route print detail
Flags: X - disabled, A - active, D - dynamic, C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme, B - blackhole, U - unreachable, P - prohibit
0 A S dst-address=0.0.0.0/0 gateway=WAN1 gateway-status=WAN1 reachable distance=1 scope=30 target-scope=10 routing-mark=Route-WAN1

1 A S dst-address=0.0.0.0/0 gateway=WAN2 gateway-status=WAN2 reachable distance=1 scope=30 target-scope=10 routing-mark=Route-WAN2

2 A S dst-address=0.0.0.0/0 gateway=WAN1 gateway-status=WAN1 reachable distance=1 scope=30 target-scope=10

3 S dst-address=0.0.0.0/0 gateway=WAN2 gateway-status=WAN2 reachable distance=1 scope=30 target-scope=10

4 DS dst-address=0.0.0.0/0 gateway=192.168.58.1 gateway-status=192.168.58.1 reachable via WAN2 distance=1 scope=30 target-scope=10 vrf-interface=ether8

Thanks again!

Default route via WAN2 is not active … and my guess the reason is exactly that … both routes seem identical.

Ok, thanks for the pointer. That led me onto the Wiki and to read that only one default route to destination in the same routing table can be active.

If I disable or disconnect the WAN1 interface, WAN2 becomes active and behaves as expected.

I read in the Wiki: “If there is more than one candidate route with the same distance, selection of active route is arbitrary”.

Yet as soon as I re-connect WAN1, it takes over as the active route. I’m trying to understand why the decision is made. It doesn’t seem arbitrary, it seems like it’s chosen every time. What am I missing (probably a lot :laughing: )?

My guess: selection is arbitrary in sense that MT don’t document the selection algorithm and are thus free to change it at any time.

To split/distribute traffic to one or the other interface use PCC (per connection classifier).
Simple to use and many example on the wiki/forum.
Mangle first the traffic you want on specific outbound towards WAN1 and WAN2
and then remaining goes into PCC to be split over WAN1/WAN2 for example.

"If there is more than one route with the same distance, selection is done in random (except for BGP "

It also mentions route in FIB, and FIB is in Linux kernel, so assume to understand why ether 1 takes precedence one needs to look at Linux docs.

https://wiki.mikrotik.com/wiki/Manual:Route_Selection_Algorithm_in_RouterOS

Thank you all for the insight!

WeWiNet made a good point that I could put the remaining connections through PCC after my policy route. I think that’s a great idea.

In the end, because I have accounted for all possible routed traffic to the Internet I am satisfied this time :sunglasses:

To keep some flexibility, use LISTs in MANGLE rules rather than interfaces or addresses directly.
Then you can easily manage specific interfaces or clients inside the Interface List or Address List by enabling those rules or not.

Example:

/interface list member
add comment="LTE Only: IOT Bridge" disabled=yes interface=Bridge_IOT list=LTE_ONLY_as_WAN
add comment="LTE + DSL per boost: " disabled=yes interface=Bridge_IOT list=Boost_Load_bal_bridges

In above example:

  • By default traffic from IOT bridge goes over DSL (WAN1)
  • If I want IOT go over the “fast” LTE (WAN2) only, I just enable the LIST entry for “LTE-ONLY” for that bridge
  • If I want IOT to load balance over DSL AND LTE (with PCC) I enable “BOOST_Load_balance” entry for that bridge.
    If I want to get back to plain vanilla DSL I just disable those to lists entries…
    You can also manage those list via port knocking and/or scheduler (enable “boost” for 2 hours and then revert back).

In my case all this is quit helpful as I have very limited WAN speeds at my disposal…