QoS Tree VoIP problem

@sindy and @pcunite thank you very much for your help.
In conclusion… the problem is the hw.
Wrong buy from me.
I saw the RB4011iGS+5HacQ2HnD-IN and i like very much.
Yes i need at least 8 ether ports.
sfp port only for upgrade.
Wireless of course.
I use and apc ups with usb connection but… never mind…
Is it sure that with this hw I will be ok with QoS?
My connection is 50/5 but maybe be upgrade to 200/20 or higher.
Are this requirements adequate?

thank everyone again…

You dont have to spend all that money for a 4011, just keep the CRS109 as a “switch only” and add a Mikrotik hEX for the “Routing”.

I believe it’s a good time for upgrade…
I want to try RB4011iGS+5HacQ2HnD-IN.
I suppose is more powerfully from hEX (in cpu).
Isn’t it?

I haven’t got the 4011 so I can only extrapolate.

A few months ago I did a throughput measurement of hAP ac² with PPPoE on WAN; as QoS is essentially incompatible with fasttracking (you could fasttrack only the VoIP packets but there are so few of them that doing so wouldn’t save you much CPU power), the throughput measurement with bi-directional traffic and no fasttracking on client side is most relevant for your case (without fasttracking, every single packet has to pass through all stages of the firewall). The summary value of 1180 Mbit/s in both directions in total is the maximum you would achieve with a hAP ac² as I had no packet marking rules in place during the test. 4011 uses the same CPU architecture and number of cores like the hAP ac² but almost double the clock rate, so you should get around 2300 Mbit/s total throughput on full-MTU-sized packets. Further throughput decrease in Mbits/s comes from handling of short packets, as smaller packets mean less payload bytes transported while spending the same CPU effort for routing, firewalling etc.

In real life, there is very roughly one small packet per two large ones for one-way file transfers (a small TCP ACK is usually sent for every second data packet received and the data packets are as large as the path permits). VoIP calls use only small to mid-size packets, except the call control (signalling) which is a few packets’ exchange in the beginning of a call, but the share of the VoIP on the total traffic will likely be negligible unless you provide the VoIP service for the whole neighbourhood.

So all in all, 200/20 Mbit/s will surely be fine with 4011 even with QoS and PPPoE handling, 1000/1000 may or may not be depending on the traffic pattern. Another performance drop would happen if you’d deploy bridging to provide advanced L2 functionalities on the LAN ports (VLAN filtering, MSTP, …)

The official test results are much more optimistic on large packets but they don’t take into account the PPPoE processing.

The 4011 is also more powerful than hEX, although here the comparison would be more complex as the CPU architecture is different.

Thank you very much for your advice.
Something else…
It’s a good idea to pass the voip, ack and syn traffic from fast track?
In this way… have better priority from other traffic of QoS?

Fasttracking means that most packets of a fasttracked connection are only processed by the connection tracking stage of the firewall and are not processed by the various rule chains and queues. ACK-only packets belong to a connection, and fasttracking by its very principle handles either all packets of a connection or none, so within the same connection, you cannot selectively fasttrack only ACK-only packets and process the large packets the regular way even if it would make any sense (and it actually doesn’t, see below).

Also, as even fasttracked packets still need the CPU to get moving, fasttracking a few packets out of the total (the VoIP flow volume is negligible as compared to your torrent traffic) will not significantly lower the CPU load, so even if you fasttrack VoIP packets and keep queueing the rest, there will be losses in your scenario.

I will have the RB4011iGS, maybe, Wednesday.
I did some tests with full dw and voip call.
And i have a spreadsheet with result from wireshark.
When i have the RB i’ll post the sum of all results for compare.

Anyway…
I change the rules:
now i mark connections first and then packets.
I haven’t seen any difference in cpu util with only packets marks.
I have seen difference with simple queue vs tree.
With simple queue the cpu have max peek ~70-80% until 90-100% with trees.
I continue mark only upload traffic in postrouting.
Some suggestion for the mangles?

The goal is to make every packet pass through as little firewall rules as necessary, and let these few rules’ match conditions be as simple as possible. To fulfil this goal, it is best to translate complex match conditions into connection-marks when handling the first packet of each connection, so that you don’t need to re-evaluate all those conditions for every single packet, and to translate the connection-mark into the packet-mark as the first thing to be done for packets which already have got a connection-mark because the connection tracker has found them to match a marked connection. So what you get is something like:

/ip firewall mangle
#first of all, send the packets belonging to already marked connections to the chain translating connection marks into packet marks
add action=jump chain=postrouting connection-mark=!no-mark jump-target=connmark2pktmark

#now stop handling these packets in the postrouting chain as we've already done everything we wanted about them
add action=accept chain=postrouting connection-mark=!no-mark

#only initial packets of not-yet-marked connections can get to this point; let's set the connection marks representing priorities according to various match conditions
add action=mark-connection chain=postrouting new-connection-mark=P1 [condition1.1 [condition1.2 [condition1.3 [...]]]] passthrough=yes
add action=mark-connection chain=postrouting new-connection-mark=P2 [condition2.1 [condition2.2 [condition2.3 [...]]]] passthrough=yes
...
add action=mark-connection chain=postrouting new-connection-mark=P8 [condition8.1 [condition8.2 [condition8.3 [...]]]] passthrough=yes
#it is important to assign some connection-mark to ALL connections - if you don't, subsequent packets of non-marked connections
# would come here again thanks to the first two rules

#and now, let these initial packets which have just got their connection mark assigned also get their appropriate packet mark, otherwise
# they'd get away with none and would not be put to the proper queue
add action=jump chain=postrouting jump-target=connmark2pktmark

#this is THE END of the postrouting chain in mangle

#this is the chain translating connection-marks into packet-marks; all rules have passthrough=no because there is no point
#in checking the other ones once we've done the translation

#first, accept the download packets as you say we aren't queueing them at all
add action=accept chain=connmark2pktmark out-interface-list=LAN passthrough=no

#next, do the translations, starting NOT from the highest priority ones but from those responsible for most trafic - because to reduce the CPU load, 
# the more packets in a class the less rules you need to check them against
add action=mark-packet chain=connmark2pktmark connection-mark=P8 new-packet-mark=P8 passthrough=no
...
add action=mark-packet chain=connmark2pktmark connection-mark=P1 new-packet-mark=P1 passthrough=no

This is the rough skeleton. Real life may be more complex, and I don’t say this will surely save your CRS109 from trash because the optimisation won’t do a dramatic change , but you may give it a try. I’d recommend to start from just two priorities - VoIP and the rest. If doing so saves the VoIP calls, it makes sense to spend effort on fine-graining the other priorities.

@sindy
Thank you for your advice.
I have some questions:

  1. In connections mark, why i need to have passthrough=yes?
    In this way when some connection mark as C1 and next check all the others rules.
  2. Why I need to have jmp for mark connections at the end of connections mark.
    When finish with connections mark, do not continue to packet mark?
  3. In packet mark you say to accept the download packets.
    Why?

I had this mangles:

# maybe not need the next

...

because after assigning a connection-mark, you need to continue handling the packet in order to translate the connection-mark into a packet-mark also for this packet, otherwise it would not get the packet-mark at all, only the subsequent packets of the connection would get it.


No, as you cannot jump to another rule in the same chain or call a portion of the same chain, so the chain “connmark2pktmark” must be called from chain “postrouting” in order that it could be called from both places where I do that. Of course you might put the same rules one more time to the end of chain “postrouting” instead of calling that other one, but that single jump is not too costly in CPU and is only done for initial packets of connections, so it is better to have each translation rule only once.


Because you’ve written somewhere that you don’t prioritize them. If you do, modify that accordingly - at best create another rule translation chain for the other direction to minimize the number of rules traversed by each packet.

Other than this, prioritization of download traffic has to be different from prioritization of upload traffic - there is no point in slowing down UDP or TCP ACK packets in download (WAN->LAN) direction because they’ve already used the uplink bandwidth and by slowing these down you do not affect further packets in the same direction. By slowing down the download direction TCP packets with a payload (no matter whether they contain the ACK or not, only their size matters here), you slow down the recipient’s ACK on them, and thus indirectly slow down the sending speed at sender’s end.

In upload direction, you can slow down anything; of course, if you slow down some UDP streams so much that you fill up their respective queue and it starts dropping them, it will affect the service. Just think about not slowing down ACK twice - once indirectly by slowing down TCP with payload in download direction, and the second time directly in upload direction. It would be another waste of resources to discriminate between TCP packets with and without payload in download direction and then again in upload direction, so better to slow down just the upload and save the size comparison in both directions.

@sindy

Now my configuration is:

/ip firewall mangle
o

But i can’t see any traffic in accept rule for connection-mark=!no-mark.
Why?
Any comments for the new configuration?

Because your chain=pktmark works well and does not leave any packets unhandled :slight_smile: That action=accept rule was there just to catch packets for which the translation of connection-mark to packet-mark did not happen for any reason in the chain=pktmark and not let them get further. However,

yes, there is an important one related to the above - as you have added out-interface=pppoe to the first two rules as compared to my suggestion, all packets in the download direction do pass through all the connection-marking rules followed by passing through all the chain=pktmark rules, because neither of the first two rules matches on them (as they have a connection-mark assigned but don’t have out-interface=pppoe). So if you don’t want to enqueue download packets, remove the out-interface=pppoe from the accept rule with connection-mark=!no-mark, so it will accept all download packets belonging to already marked connections and thus will not let them pass all the connection-marking rules (as doing so generates an unnecessary CPU load). So all download packets (except the initial ones of connections initiated from WAN side which probably don’t exist) will be handled by just two rules - the first one which won’t match on them, and the second one which will accept them. In my initial suggestion, the first rule in the chain=connmark2pktmark was responsible for the same, but it caused the upload packets to pass through that one extra rule, so the way described just above is more efficient.

Other than that, it seems fine to me. So if after implementing the change above you still experience RTP packet loss, there is no more optimisation I could suggest.

But if i remove from first and second rule the out interface, packet mark count additional the dw traffic.
Before… i catch only upload in connections and packet mark.
Even if all dw traffic pass from all connections mark, i have last rule to jump to packet mark with out-interface.
So the dw traffic don’t pass to packet mark.
Is a good idea to make one more jump for connections mark?
Like this:

add action=jump chain=postrouting comment=Packets connection-mark=!no-mark \

And change all chains in connections mark from postrouting to conmark.

Don’t remove out-interface=pppoe from the first rule, remove it only from the second one.
First rule will send only connection-marked UL packets to the mark translating chain (and they will never return from there as they get packet-marked with passthrough=no there), and it will ignore all DL packets so they won’t be sent to the mark translating chain.
Only all the DL packets and the UL packets with no connection-mark will reach the second rule, and that rule will match on all of the DL packets belonging to already connection-marked connections and will thus accept them, so only packets in either direction which have no connection-mark will get past this second rule (and eventually get a connection-mark assigned followed by getting assigned a packet-mark, but there are so few of them that it doesn’t matter).


Sorry, can you re-word this? I’m lost a bit in what is “before” and which “last rule” you have in mind. But as it is the last one, it means to me that there are some rules before it, through which the DL packets had to pass. And even though none of those rules before matched those packets, the evaluation may take the same amount of CPU time when it doesn’t match as when it matches - although I suppose that the mismatch of the whole rule is declared as soon as the first mismatch of one of the conditions is encountered, the first condition failing to match may be the last one evaluated.

ok…
I fixed it.
Now i have:

/ip firewall mangle
o

And now all counts are correct.

If i change the accept rule with:

add action=jump chain=postrouting comment=Packets connection-mark=no-mark \
    jump-target=conmark out-interface=pppoe-out1

Is this ok?

Well, there is little difference whether you use a separate chain=conmark and jump to it from chain=postrouting for packets which need connection-marking, or whether you do the connection-marking within chain=postrouting itself and bypass that part of it for packets which do not require connection-marking. So sorry if I mislead you, I didn’t go deep into your configuration in post #29 because I’ve understood from what you wrote there that it was an excerpt from the one you’ve published earlier and in those ones there were no dedicated chains for connection marking and packet marking.

Now i have this configuration (according to your suggestions):

/ip firewall mangle
add action=jump chain=postrouting comment=MrkConJmp2PckMark connection-mark=\

I believe it’s ok now.

if there is no error lets see the queue.
I have this rules (my vdsl is 50/5):

/queue simple

To me it seems fine, just double-check whether you have the connection-mark=no-mark condition in all action=mark-connection rules except the first one, to avoid accidentally rewriting an already assigned connection-mark. You obviously realized such need on oyur own, but as you haven’t shown all these rules for easier readability (good!), you have to double-check yourself.

What are the practical results?

In connection mark i haven’t checked for nomark except last rule for all packets.
In this way some connections may be change park.
I believe all connections would be have connection-mark=no-mark as the last rule.
I leave it without because you hadn’t anything mark in your suggestion.
I add them:

add action=jump chain=postrouting comment=MrkConJmp2PckMark connection-mark=\
s

I hadn’t any problem without connection-mark=no-mark (with double/replace mark) because the rules is very tight.
But with add state nomark it’s like most correct.
I add nomark and the first rule.
It’s unnecessary but i don’t believe to have additional cpu load.

It’s amazing!!!
With new QoS configuration and RB4011 i have <5% cpu load on full dw.
In the same time i have voip mean jitter 0.21ms (in CRS with none dw i had 0.22ms), max 0.35ms (in CRS with none dw i had 0.54ms) and 0 packet loss.

Many many thanks to all!!!
Especially @sindy for QoS and @pcunite for the suggestion of RB4011.

@sindy
Dο you know some book for MT QoS analysis?