Need some explanation regarding PCC load balancing mangle rules

So in this: https://help.mikrotik.com/docs/display/ROS/Firewall+Marking#FirewallMarking-DetailedSectionOverview.2

We can see a solid and stable method to implement PCC. But I have some doubts.

1. The first rules had this statement “With policy routing, it is possible to force all traffic to the specific gateway…”, now we already put dst-address-type=!local when marking the connections, so why do we need the rules below as per the statement?

/ip firewall mangle
add chain=prerouting dst-address=10.111.0.0/30  action=accept in-interface=ether3
add chain=prerouting dst-address=10.112.0.0/30  action=accept in-interface=ether3

1.1. Let’s say we do need the above rules for some reason, how would we deploy said rules with dynamic IP addresses since chain=prerouting, hence we cannot select outgoing interface?

2. Now we marked the connections destined for the web in prerouting chain, that’s very understandable.
But why do we need to mark routing in the output chain like the following? When we already applied routing marks in the prerouting chain to the already marked connections on the said chain?

/ip firewall mangle
add chain=prerouting connection-mark=ISP1_conn in-interface=ether3 action=mark-routing new-routing-mark=to_ISP1
add chain=prerouting connection-mark=ISP2_conn in-interface=ether3 action=mark-routing new-routing-mark=to_ISP2
add chain=output connection-mark=ISP1_conn action=mark-routing new-routing-mark=to_ISP1    
add chain=output connection-mark=ISP2_conn action=mark-routing new-routing-mark=to_ISP2
  1. “local” means any address assigned to router, it does not cover anything else, so if you don’t want to break routing between other subnets, you have to deal with them too.

1.1) You can update rules from dhcp lease script.

  1. Prerouting is for traffic from other devices. Output is for traffic from router itself. In this case it’s for being able to respond to connections from internet, to send responses to same WAN from which came the request.
  1. Yes, I use an address list to cover all RFC6890 subnets, hence all private subnets are completely excluded. I still don’t understand why the following is required, I did not implement the following as I’m on PPPoE clients and nothing broke so far:
/ip firewall mangle
add chain=prerouting dst-address=10.111.0.0/30  action=accept in-interface=ether3
add chain=prerouting dst-address=10.112.0.0/30  action=accept in-interface=ether3

1.1 Both uplinks in my case are PPPoE clients with dynamic IPs, how would we handle that?

  1. So to be clear, the output chain routing marks handles the prerouting (in-interface=WAN) marked connections?

In case you want to see my PCC config, here it is, it’s been working smoothly for months now, but I’m still not clear on 1 and 1.1

/ip firewall mangle
###Accept LAN traffic###
add action=accept chain=prerouting dst-address-list=not_in_internet in-interface-list=LAN

###Redirect incoming WAN traffic to their corresponding WAN interface###
add action=mark-connection chain=prerouting connection-mark=no-mark in-interface=pppoe-out1 new-connection-mark=ISP1_conn passthrough=yes
add action=mark-connection chain=prerouting connection-mark=no-mark in-interface=pppoe-out2 new-connection-mark=ISP2_conn passthrough=yes

##Mark traffic###
add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-list=!not_in_internet in-interface=LAN new-connection-mark=ISP1_conn passthrough=yes per-connection-classifier=both-addresses:2/0
add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-list=!not_in_internet in-interface=LAN new-connection-mark=ISP2_conn passthrough=yes per-connection-classifier=both-addresses:2/1

###Sending routing marks to their destined ISPs###
add action=mark-routing chain=prerouting connection-mark=ISP1_conn dst-address-list=!not_in_internet in-interface=LAN new-routing-mark=to_ISP1 passthrough=no
add action=mark-routing chain=prerouting connection-mark=ISP2_conn dst-address-list=!not_in_internet in-interface=LAN new-routing-mark=to_ISP2 passthrough=no
add action=mark-routing chain=output connection-mark=ISP1_conn dst-address-list=!not_in_internet new-routing-mark=to_ISP1 passthrough=no
add action=mark-routing chain=output connection-mark=ISP2_conn dst-address-list=!not_in_internet new-routing-mark=to_ISP2 passthrough=no
  1. It’s to allow devices in LAN to access anything in those subnets. Ping ISP’s gateway, access modem configuration, if you’re connected behind one, etc. If you don’t need any of that, you can live without these rules.

1.1) PPPoE has equivalent of lease script in PPP profile.

  1. Yes. You can test it, if you have public addresses for both WANs, then disable these output rules and try to ping both addresses from outside. Only one will respond, because all response packets will be routed to one ISP (the one with active default route).
  1. I’m still not clear about this. I can ping both ISP’s local gateways with my current config which I posted. Am I doing something weird or something? As for modem access, yeah, I bridged the ONTs, each has a different private subnet and I can access them as they are excluded in the mangle rules as mentioned before.

1.1 I’m no good at scripting, is there one that you know of that can update the “dst-address” in mangle or even “gateway” in IP>Route dynamically?

  1. Okay I get it now.
  1. I don’t know what exactly you have, but it’s also possible that ping is taking a little longer path. Check what traceroute shows.

Let’s say the gateway for ISP1 is public address and you try to ping it from device in LAN. If you mark this outgoing ping with ISP2 mark and you don’t exclude it in any way, your router will send the packet to ISP2, even though the target address is directly reachable on ISP1 link. ISP2 doesn’t have any idea that your router is also connected to ISP1, so it will send packet to internet and it will travel to target address, which is on ISP1’s router. When that router gets it, it will respond, but it also doesn’t have any idea that source address (which belongs to ISP2) is on your router, which is directly connected to it, so the response will travel back the same way.

1.1) If you want to just update some rules, you can use this simple approach:

http://forum.mikrotik.com/t/dual-dynamic-isp-wan-dual-lan-setup/132893/5

Give a rule some unique comment, then use it to find the rule and change it.

When I run a traceroute, it either shows ISP1 completely or ISP2 completely depending on the PCC chances. So do I still need those rules in 1?

I meant traceroute to ISP’s gateway, which should always have only two hops, first your router and then ISP’s gateway right behind it. But even if you have it wrong, you still have 50% chance that it will work correctly, because it will get mark for right ISP.

I’m sure you can live without these rules. Worst case, you wouldn’t be able to access ISP’s gateway from some devices in LAN, but you can’t do much with it anyway (except ping), and it doesn’t influence other traffic forwarded via same gateway, so it won’t break internet access.

Ah, I get it now. Yeah, both my ISPs uses the 10.0.0.0/8 subnet for their local gateways, so it’s even more tricky, but it does not break anything. Even port forwarding works, so I think all is good.

Out of curiosity, I tried this and it works, any idea how? Basically PCC and Nth in a single mangle rule

add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-list=!not_in_internet in-interface=bridge new-connection-mark=ISP2_conn passthrough=yes per-connection-classifier=both-addresses-and-ports:2/0 nth=2,1

If your “not_in_internet” list contains 10.0.0.0/8, then it’s already solved by that. If you ping ISP’s gateway 10.x.x.x from LAN, it won’t get marked and router will use main routing table.

About combination of PCC and Nth, there’s probably no reason why it wouldn’t work, but whether it does anything useful, that’s different question. Not all combinations of conditions make sense, but if they are not clearly invalid, router accepts them. What exactly will this do, I honestly don’t know. It won’t visibly break anything, because if some connections won’t be matched and marked by this rule or another, they will simply go out unmarked and use primary ISP.

Yeah, 10.0.0.0/8 is in the list. Thanks for clearing this up. Thread solved.

Yeah, I guess only MikroTik would know the actual internals of such behaviour.

Although I can ping the “gateway” IP of both ISPs as expected, I can’t ping the actual public IP on either ISPs that are delegated via PPPoE clients on both. When I disable PCC/Load balancing completely, I can ping.

I tried adding dst-address=!publicIP and it didn’t do anything to help fix it. Should I bother trying to allow my LAN to be able to ping the public IP on the PPPoE clients?

It depends where those public addresses are. If they are directly on your router, they are already excluded from marking, if you kept PCC rules with dst-address-type=!local from example.

If it’s NAT 1:1 and they are in fact elsewhere, you’d need to exclude them too, and additionally add routes to them, to use the best path, because your router wouldn’t really know which ISP has which address. But ping should still work, even if it went through internet.

Why didn’t I think of that? Yeah I added dst-address-type=!local and it fixed the problem. I was under the impression that dst-address-type=!local means only the local subnets, totally forgot about public IP addresses that are assigned to the interfaces on the router at a local level. Problem solved.

Side Note: I haven’t seen any flaw-less PCC/Nth load balancing guide out there, but I think my current setup + the latest edit is probably ready to be used as a guide as it completely excludes non public traffic.

No, this “local” means only addresses on router, nothing with subnets.

I don’t think there’s one perfect config, there may be some as good starting point, but different people need different things. Most important is to understand what it does, why and how. MikroTik’s example tries to explain that and it’s not bad, but you can always try to do it better.

Personally I usually use “lazy” approach, don’t bother with excluding stuff and I get desired behaviour using routing rules like:

/ip route rule
add action=lookup-only-in-table dst-address=<local subnet> table=main

It basically overrides routing marks, so I can be sure that traffic to will always use main routing table, no matter what I do with it (marking, dstnat, anything). It may feel less clean to mark something and then override it, but it’s pleasantly foolproof.

Ahaha the “lazy” approach will result in unceassry CPU usage by the mangle rules though and definitely not optimal. But yeah !local and !not_in_internet (RFC6890) sufficiently covers all the bases.

Well, that’s a question. I can skip some conditions in mangle rules, so that may lower CPU usage a bit (or may not, it depends on the order in which conditions are evaluated). Routing rules will undoubtedly add some processing, but routing should be the most optimized part of system, so it shouldn’t be much. It would be interesting to measure the difference, but unless you have massive bandwidth and router hitting its limits, it probably doesn’t matter too much.

I agree, yeah.

I found a strange bug/problem

If I use the following, I cannot reach the router from trace routes, it’s completely invisible and shows up as 100% loss from LAN. I can ping however.

###Sending routing marks to their destined ISPs###
add action=mark-routing chain=prerouting connection-mark=ISP1_conn in-interface=LAN new-routing-mark=to_ISP1 passthrough=no
add action=mark-routing chain=prerouting connection-mark=ISP2_conn in-interface=LAN new-routing-mark=to_ISP2 passthrough=no
add action=mark-routing chain=output connection-mark=ISP1_conn new-routing-mark=to_ISP1 passthrough=no
add action=mark-routing chain=output connection-mark=ISP2_conn new-routing-mark=to_ISP2 passthrough=no

Now if I do this to the output chain for the routing mark whereby the “connection-mark” is what you’ve explained to be coming from the prerouting mark where in-interface=wan, the trace route works correctly, but there is 1-2 packet loss on hop 1 aka the router every single time.
(I have tried doubling the following with !local, didn’t help with the packet loss)

###Sending routing marks to their destined ISPs###
add action=mark-routing chain=output connection-mark=ISP1_conn dst-address-list=!not_in_internet new-routing-mark=to_ISP1 passthrough=no
add action=mark-routing chain=output connection-mark=ISP2_conn dst-address-list=!not_in_internet new-routing-mark=to_ISP2 passthrough=no

So I tried something different which is not used/shown in any guide or example, I did the following and there’s now zero packet loss on trace routes and everything is rechable as expected.

###Sending routing marks to their destined ISPs###
add action=mark-routing chain=output connection-mark=ISP1_conn out-interface=ISP1 new-routing-mark=to_ISP1 passthrough=no
add action=mark-routing chain=output connection-mark=ISP2_conn out-interface=ISP2 new-routing-mark=to_ISP2 passthrough=no

I’m looking at the packet flow in RouterOS and I’m at a 100% loss (get the pun?) here, like why do we use prerouting chain for incoming WAN traffic in the following instead of input chain?

###Redirect incoming WAN traffic to their corresponding WAN interface###
add action=mark-connection chain=prerouting connection-mark=no-mark in-interface=pppoe-out1 new-connection-mark=ISP1_conn passthrough=yes
add action=mark-connection chain=prerouting connection-mark=no-mark in-interface=pppoe-out2 new-connection-mark=ISP2_conn passthrough=yes

And above all what is causing this bug/problem in the first place which seems fixed the moment I add WAN interfaces to the output routing mark mangle rules? As per my understanding LAN-to-LAN traffic is 100% excluded from the mangle rules using !local and !not_in_internet in all the “mark connection” rules except the prerouting chain rules for WAN interfaces.

I’m just trying to understand the theory behind this problem, it seems crucial in order to understand networking deeper. Thanks for your help again.

EDIT: There is still packet loss with the router in the trace routes even with the above changes, 2-3 packets are loss each time randomly

I don’t remember this exactly and I don’t have time to play with it now, but I think that icmp packets for exceeded TTL may inherit either connection or routing mark from original packet. Which does have it, because it’s outgoing packet to some external address, so PCC rules applied to it. You can try some experiments yourself, logging rules in output checking for marks should do it.

Marking incoming connections from internet is in prerouting, because you may want it not only for router itself (where input would be enough), but also for forwarded ports.

I think you may be right about ICMP packets but why would they inherit the marks when the ICMP DST is to the router itself which is the first hop in the traceroutes? And is there a work-around for this?
I’ll look into the experiment though.
Edit: In the logs, I only see ICMP originating from the public IPs destined towards public IPs as expected, no LAN IPs are in it as per the exclusions. So I’m really confused about the packet loss now. Of course if I remove !local and !addresslist then I can see LAN IPs.