Community discussions

 
cgallery
newbie
Topic Author
Posts: 35
Joined: Tue Apr 24, 2018 5:25 am

"Optimal Mangle" from "RouterOS by Example" performance?

Wed May 02, 2018 2:57 pm

I got the "RouterOS by Example" book (Kindle Edition) by Stephen Disher.

Reading trough it, I came to discussion of the Mangling feature, where he talks about how CPU intensive packet mangling can be, if we have to look at every packet.

So he suggests an "Optimal Mangle" method, a two step process where we: (1) Identify connections that are flowing the packets we want to mangle. (2) Mark the packets."

The thing that doesn't make sense is that it seems he is still examining every packet, and now he has processed the matching packets twice.

The first rule might specify a chain of prerouting, a protocol of tcp, and a dst port of 80. Packets matching this will have an action of "mark connection" with a mark we specify like "WebBrowsingConnections."

A subsequent rule then looks in the prerouting chain for the connection mark we set in our previous rule ("WebBrowsingConnections") and then uses the action "mark packet" as "WebBrowsingPackets."

I cannot see how that is faster.

I can see how it might lead to easier/better organization, but we're still looking at every packet in order to perform that first rule that does the "mark connection?"

Am I thinking incorrectly on this?
 
pe1chl
Forum Guru
Forum Guru
Posts: 5808
Joined: Mon Jun 08, 2015 12:09 pm

Re: "Optimal Mangle" from "RouterOS by Example" performance?

Wed May 02, 2018 4:06 pm

When you have only a single rule like that, there probably is not much difference.
However, when you have a more complicated setup you would have to apply all matches to all packets and mark them accordingly.
The optimization would be to look at packets and mark the corresponding connections in prerouting, and in postrouting or other place
where you require the mark you only look for connection marks and apply the corresponding packet mark.
In the prerouting you can have a first rule that says "when connection already marked, stop here".
So, your potentially long list of matches is applied only to the first packet of each connection.
The assumption is that checking for a connection mark is quicker than matching all kinds of protocol fields.
And possibly also that you will have less different marks than you have different filters.
(e.g. both udp port 53 and tcp port 53 marked as "DNS traffic")
 
cgallery
newbie
Topic Author
Posts: 35
Joined: Tue Apr 24, 2018 5:25 am

Re: "Optimal Mangle" from "RouterOS by Example" performance?

Wed May 02, 2018 4:23 pm

Interesting, I can see how it would help with clarity and debugging but still don’t see how you avoid examining every packet and marking the packets that match.

Hmmm.
 
R1CH
Forum Veteran
Forum Veteran
Posts: 889
Joined: Sun Oct 01, 2006 11:44 pm

Re: "Optimal Mangle" from "RouterOS by Example" performance?

Wed May 02, 2018 4:31 pm

You can't avoid examining every packet, the benefit is you can shortcut the mark packet rules evaluation by ordering the rules by volume. Eg if you only care about http traffic, you mark port 80 as http, mark everything else as other, then when it comes to packet marking you have mark other first in the list since this will be the majority of your traffic, so only one rule has to be evaluated.
 
pe1chl
Forum Guru
Forum Guru
Posts: 5808
Joined: Mon Jun 08, 2015 12:09 pm

Re: "Optimal Mangle" from "RouterOS by Example" performance?

Wed May 02, 2018 4:32 pm

Instead of "examining every packet to see what protocol and what port and what this and what that" which potentially can take some time, you do that match only once for every connection, apply a connection mark, and then for the majority of packets you only need to check what connection mark it has and mark the packet accordingly (or set other properties like priority or DSCP or routing mark).
The idea is that checking for connection mark is faster than checking all those other fields, especially when there are fewer different connection marks than there are match cases.
This also depends on your use case. When you apply marks like "HTTP connection" there probably is little to gain, but when you instead have marks like "low priority", "high priority", "limited bandwidth" etc, it is more likely there are several matches of protocol/port that result in the same traffic class marking.
Last edited by pe1chl on Wed May 02, 2018 4:35 pm, edited 1 time in total.
 
User avatar
chechito
Forum Guru
Forum Guru
Posts: 1736
Joined: Sun Aug 24, 2014 3:14 am
Location: Bogota Colombia
Contact:

Re: "Optimal Mangle" from "RouterOS by Example" performance?

Wed May 02, 2018 4:33 pm

when you mark a connection you do it using more "expensive" "comparators" like protocol, port number, content or even layer7 which requires internal inspection and comparation of packet content

the savings come from not using that "expensive" functions for every packet of an all ready identified connection

the objective is avoiding unnecessary processing, that is getting only the necessary traffic to go on that expensive comparators

once you have already detected and marked a connection, is more efficient use a simple connection mark to identify the traffic and then mark every packet to take actions on them like queuing or policy based routing

i will try to explain it on a simplified way

if you design your mangle to make all traffic to go across all mangle rules your CPU usage will be higher, for example if you have 15.000 packets per second of traffic and 100 mangle rules, that is 1.500.000 comparisons per second

But if from that 15.000 packets per second the all ready marked connection packets (around 14.000) get matched at the top of mangle for example in the 10 top mangle rules you get only 1.000 packets per second across all the 100 mangle rules that is 100.000 comparisons, and 14.000 packets per second across only 10 mangle rules, that is 140.000 comparisons (simplest comparisons)

As result you go from 1.500.000 comparisons per second to 240.000 comparisons per second 100.000 of them are simplest and lighter for CPU to do, easily yo can get a 6X CPU consumption on mangle, for example going from 50% to 3-6%, very significant
 
cgallery
newbie
Topic Author
Posts: 35
Joined: Tue Apr 24, 2018 5:25 am

Re: "Optimal Mangle" from "RouterOS by Example" performance?

Wed May 02, 2018 5:02 pm

i will try to explain it on a simplified way

if you design your mangle to make all traffic to go across all mangle rules your CPU usage will be higher, for example if you have 15.000 packets per second of traffic and 100 mangle rules, that is 1.500.000 comparisons per second

But if from that 15.000 packets per second the all ready marked connection packets (around 14.000) get matched at the top of mangle for example in the 10 top mangle rules you get only 1.000 packets per second across all the 100 mangle rules that is 100.000 comparisons, and 14.000 packets per second across only 10 mangle rules, that is 140.000 comparisons (simplest comparisons)

As result you go from 1.500.000 comparisons per second to 240.000 comparisons per second 100.000 of them are simplest and lighter for CPU to do, easily yo can get a 6X CPU consumption on mangle, for example going from 50% to 3-6%, very significant
But I don't get the impression that is what the author is saying.

It seems like the author is saying that I'm gaining something by marking a packet with "mark connection," as if that means I don't have to do the exact same thing to the next packet meeting the same criteria.

I get that I should prioritize my Mangle rules such that I match as much as possible as early as possible, to avoid having to make comparisons that result in no action (I think that is what you're saying?). That once I have matched and performed an action, I (typically) stop processing more rules.

ANYWAY, I might just need to stare at this longer to absorb it. But as a contract programmer that works in several languages, this seems counter-intuitive to me, like handing every element in an array twice when once would have done. And again, that might improve clarity and ease debugging, but it won't make it faster. Also again, though, I'm acknowledging I could be looking at this wrong. It sure wouldn't be the first time.
 
yottabit
Member Candidate
Member Candidate
Posts: 160
Joined: Thu Feb 21, 2013 5:56 am

Re: "Optimal Mangle" from "RouterOS by Example" performance?

Wed May 02, 2018 5:08 pm

Comment #6 is spot on. I use this methodology, what I call 2-step QoS, on a fairly involved tree queue for all egress traffic. I only have to deep inspect the first packet of a new connection, and then rely on the fast conntrack table to apply packet marks for enforcement.

Using this method I can even get 100M+ throughput on the WAN side of a hAP ac with its single-core MIPS CPU.

I have now moved my routing into the ARM quad-core hAP ac2, for additional thoughput (faster CPU, and not using the wireless which consumes considerable CPU time), and now under full egress load of 160M the CPU runs around 20-30%.

There's enough extra CPU time available now that I run The Dude for fun.

Sent from my Pixel XL using Tapatalk

 
cgallery
newbie
Topic Author
Posts: 35
Joined: Tue Apr 24, 2018 5:25 am

Re:

Wed May 02, 2018 5:31 pm

Comment #6 is spot on. I use this methodology, what I call 2-step QoS, on a fairly involved tree queue for all egress traffic. I only have to deep inspect the first packet of a new connection, and then rely on the fast conntrack table to apply packet marks for enforcement.

Using this method I can even get 100M+ throughput on the WAN side of a hAP ac with its single-core MIPS CPU.

I have now moved my routing into the ARM quad-core hAP ac2, for additional thoughput (faster CPU, and not using the wireless which consumes considerable CPU time), and now under full egress load of 160M the CPU runs around 20-30%.

There's enough extra CPU time available now that I run The Dude for fun. Image

Sent from my Pixel XL using Tapatalk
So you're saying that, once a packet is inspected and marked as a connection, the router automatically marks all such packets as belonging to that connection?

Does anyone have a Mangle table that doesn't disclose anything private, that they could post so I can better understand what is going on?
 
Joe1vm
just joined
Posts: 22
Joined: Sat Apr 06, 2013 4:07 pm

Re: "Optimal Mangle" from "RouterOS by Example" performance?

Wed May 02, 2018 8:54 pm

You may consider this video as worthy to watch. https://www.youtube.com/watch?v=mnX6Im8GlJw
. I am stupid home user, but it helped me a lot to understand mangle rules and how to use marking of the packets for traffic prioritization. Than if you are still interesting in I can share some simple mangle rules for VoIP Vlan, Guest Vlan and LAN with prioritization and bandwidth limitations implemented on my home router. .
Last edited by Joe1vm on Wed May 02, 2018 9:09 pm, edited 1 time in total.
 
yottabit
Member Candidate
Member Candidate
Posts: 160
Joined: Thu Feb 21, 2013 5:56 am

Re: "Optimal Mangle" from "RouterOS by Example" performance?

Wed May 02, 2018 9:04 pm

Here's my setup, only slightly out-of-date. I have thorough comments added as a teaching aid. This should help.

https://docs.google.com/document/d/1G6o ... p=drivesdk

Edit: link permissions fixed.

Sent from my Pixel XL using Tapatalk


Last edited by yottabit on Wed May 02, 2018 9:04 pm, edited 1 time in total.
 
anav
Forum Guru
Forum Guru
Posts: 2967
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada

Re: "Optimal Mangle" from "RouterOS by Example" performance?

Wed May 02, 2018 9:06 pm

Excellent practical information on mangling. Much appreciated as I would tend to abuse this functionality if I didn't know any better.
I'd rather manage rats than software. Follow my advice at your own risk! (Sob & mkx forced me to write that!)
 
pe1chl
Forum Guru
Forum Guru
Posts: 5808
Joined: Mon Jun 08, 2015 12:09 pm

Re: "Optimal Mangle" from "RouterOS by Example" performance?

Wed May 02, 2018 10:57 pm

For easiest QoS in the entire network I use mangling at the entry point of the traffic to set a connection mark depending on protocol/port in prerouting for locally received traffic,
then use the connection mark to set the DSCP value (advantage is that it works for traffic in both directions automatically) in postrouting, then
use DSCP high 3 bits to set the priority (this includes both DSCP set locally via the above mechanism and also DSCP values in packets received from other routers)
and finally set the packet marks from priority and use those to select the 8 queues. The last step with packet marks is only required because RouterOS
cannot directly use the packet priority as queue priority (native Linux can do this!).
Another advantage of using DSCP is that most Wireless equipment will recognize this field for QoS as well (WMM).
 
User avatar
chechito
Forum Guru
Forum Guru
Posts: 1736
Joined: Sun Aug 24, 2014 3:14 am
Location: Bogota Colombia
Contact:

Re: "Optimal Mangle" from "RouterOS by Example" performance?

Thu May 03, 2018 12:04 am

i will try to explain it on a simplified way

if you design your mangle to make all traffic to go across all mangle rules your CPU usage will be higher, for example if you have 15.000 packets per second of traffic and 100 mangle rules, that is 1.500.000 comparisons per second

But if from that 15.000 packets per second the all ready marked connection packets (around 14.000) get matched at the top of mangle for example in the 10 top mangle rules you get only 1.000 packets per second across all the 100 mangle rules that is 100.000 comparisons, and 14.000 packets per second across only 10 mangle rules, that is 140.000 comparisons (simplest comparisons)

As result you go from 1.500.000 comparisons per second to 240.000 comparisons per second 100.000 of them are simplest and lighter for CPU to do, easily yo can get a 6X CPU consumption on mangle, for example going from 50% to 3-6%, very significant
But I don't get the impression that is what the author is saying.

It seems like the author is saying that I'm gaining something by marking a packet with "mark connection," as if that means I don't have to do the exact same thing to the next packet meeting the same criteria.

I get that I should prioritize my Mangle rules such that I match as much as possible as early as possible, to avoid having to make comparisons that result in no action (I think that is what you're saying?). That once I have matched and performed an action, I (typically) stop processing more rules.

ANYWAY, I might just need to stare at this longer to absorb it. But as a contract programmer that works in several languages, this seems counter-intuitive to me, like handing every element in an array twice when once would have done. And again, that might improve clarity and ease debugging, but it won't make it faster. Also again, though, I'm acknowledging I could be looking at this wrong. It sure wouldn't be the first time.

sorry im not a programer, but now we are tuned on how mangle rule set works from top to down lets imagine this hypothetical simplified mangle for HTTP traffic for explanation

rule1 connection-mark=HTTP set new-packet-mark=HTTP
rule2 
rule3
rule4
rule5
..
..
rule50 protocol=tcp dst-port=80 new-connection-mark=HTTP
rule51 protocol=tcp dst-port=81 new-connection-mark=HTTP
rule51 protocol=tcp dst-port=8080 new-connection-mark=HTTP
rule51 protocol=tcp dst-port=8081 new-connection-mark=HTTP
in that case only the first packet of an HTTP connection have to be compared to all 50 rules on mangle

the following packets of the connection get matched at first rule saving CPU resources required to make that 50 comparisons for every packet
 
yottabit
Member Candidate
Member Candidate
Posts: 160
Joined: Thu Feb 21, 2013 5:56 am

Re: "Optimal Mangle" from "RouterOS by Example" performance?

Thu May 03, 2018 2:39 am

For those that aren't understanding why this method is easier on the CPU, it's because tracking a connection, and then looking it up in the conntrack table is far faster, and far less CPU-intensive, than deep-inspecting values in the individual packets.

Sent from my Pixel XL using Tapatalk

 
User avatar
chechito
Forum Guru
Forum Guru
Posts: 1736
Joined: Sun Aug 24, 2014 3:14 am
Location: Bogota Colombia
Contact:

Re:

Thu May 03, 2018 2:40 am

For those that aren't understanding why this method is easier on the CPU, it's because tracking a connection, and then looking it up in the conntrack table is far faster, and far less CPU-intensive, than deep-inspecting values in the individual packets.

Sent from my Pixel XL using Tapatalk
good point !!
 
cgallery
newbie
Topic Author
Posts: 35
Joined: Tue Apr 24, 2018 5:25 am

Re: "Optimal Mangle" from "RouterOS by Example" performance?

Thu May 03, 2018 7:09 am

I think I (FINALLY) get it.

When I mark the connection, I'm not manipulating a packet but rather "labelling" a connection (with conntrack?) in a way that I can subsequently easily test.

You guys are so patient.

Who is online

Users browsing this forum: No registered users and 10 guests