New firmware update slows internet speeds

It's been done plenty of times already ... even by Mikrotik them selves. Just take a look at all those "firewall reinforcement" recipes which add lots of rules to explicitly drop traffic which would get dropped anyway if rules were written with "allow necesary, drop else" concept in mind. And add some (unnecessary) mangle rules which make disabling fasttrack a requirement.

Unintentionally or accidentally doesn't count, in a good competition for bad coding participants must be determined to produce the best worst possible code or - in this case - firewall configuration.

I think this bears some further repetition.

Generally a "forward" firewall ruleset starts with:

  1. (optional) deny invalid
  2. (optional) fasttrack-connection
  3. accept established/related

Only "new" connections make it further. ("untracked" connection don't really matter for us right now.) This means that for basically all purposes all firewall rulesets are 1-3 rules long. Whatever drop/allow everything else is behind these only gets executed for new connections and doesn't really have much implication for normal performance.

Additionally, NAT rules are only executed once when the connection is added to the conntrack table (i.e. once per new connection). This means that the number of NAT rules is, again, not of much consequence.

@pe1chl points out that - all this being true - we must pay for the evaluation of each rule if/when it it executed. This is no doubt true.

I think that many people over-estimate the cpu usage associated with executing rules.

To support this, and to have some data on record for better understanding these things, I did a little experiment.

I measured the cpu usage of a hap ax2 that I had on hand while pumping 100 Mbps of 512 byte packets through it. The traffic is fully synthetic.

I made sure that all packets hit the same core. Then, I added firewall rules of the form add chain=forward action=drop src-address=127.0.0.100 with different addresses for each one.

The CPU load for these tests was observed as reported by the profile tool. Alas, averaging was done simply by eyeballing. The percentages only apply to the single loaded core.

FW rules   CPU0%   fw%
3          50       7
10         53      10
25         58      14
50         62      21
100        72      32
fasttrack  19      ~0

Note: just for giggles, I also measured the same with the connection notracked. As expected, the results mirror the normal (tracked) case. Notrack is useful to avoid creating unnecessary conntrack entries, but for normal traffic there is no benefit (at least in cpu terms.)

So... my conclusions:

  • Unsurprisingly, the firewall has a "default cost" unrelated to the number of rules.
  • Yes, in fact you do have to pay for executing more rules, but as long as one sticks to a sensible number, performance doesn't collapse. Also, the increase is non-linear.
  • Fasttrack accelerates things way more than simply "having no rules" or "having a single rule".
3 Likes

Small nitpick, but the "deny invalid" rule doesn't have to be the first one, and is perfectly fine when it's right below the "accept established/related" rule. Because both check and trust the connection-state property of the packet and the values are mutually exclusive. Placing "invalid" below reduces the number of rules to check by one for most of the packets.

And of course, what we wrote here about "most of the time only one rule needs to be checked" when fasttrack is not used only applies if the user doesn't abuse the RAW table. Unfortunately, plenty of "guides" on the web and YouTube put tons of drop rules in the RAW table for some hypothetical DoS and hacking situation that never actually happens. And because those rules are most of the time drop rules, or if there are accept rules, then those rules usually don't match the return traffic from WAN. Which means for most of the normal traffic (normal download heavy web usage), the packets have to be checked against every one of them (because usually there is no RAW rule in those configurations that can match those packets early), from top to bottom, to be finally accepted when none of the rules matches.

Here is an example of such RAW table, although the issue of the OP of this post Speed issues with specific services, probably due to firewall misconfig - RouterOS / Beginner Basics - MikroTik community forum are due to the ISP and his config has fasttrack enabled anyway. But without fasttrack, the 46 RAW rules in his firewall are not optimal. For a typical response TCP packet from WAN for example, most of those rules, with the exception of the few ICMP rules have to be checked.

1 Like

I’m the OP from the topic you mentioned. Actually glad to be an example of what not to do, as it can help someone not waste as much time as I did.

But I’m curious about RAW table.I thought that those guides focus on RAW filtering because it’s processed faster and it can tolerate a little more complexity?

Side-note, I fully agree that my rules were over-designed and default rules after reset are pretty close in the end result while being significantly simpler. But, to be fair to me, I did deliberately put a lot of accept rules on top of RAW table exactly to reduce amount of packets that have to go through the whole table.

@lurker888

Would this be a valid graphic representation of your experimental data?

EDIT: Removed wrong graph.

Normally, you only put ton of drop rules in the RAW table if you expect to reject the majority of the traffic you receive, one example is when you are under heavy DoS/DDoS attack. You want to drop those packets from the attack as soon as possible, without letting the packets reach further in the FW and causing the creation of connection entries.

But normally you don't get attacked all the time. Most of the traffic arriving at the router would be legitimate packets that the router will receive and process (chain input) or transfer further (forward). Only a small portion will be dropped/rejected. And also, most of the packets will also be from existing connection and have connection-state=established. You can look at the counters next to the rules yourself. If you don't have fasttrack then the accept rule with established,related,untracked will be the one catching the most traffic, and if you have fasttrack enabled, then that rule plus the dummy rules at the top of the tables will be the ones counting the most traffic. You'll probably see that those rules have caught 1000× more packets then all your drop rules. If you are not being a target of an active, massive attack, then normally accepted packets clearly outweigh rejected ones.

Now, if you don't have fasttrack active, and 40+ rules in the RAW table, then every single of the legitimate packets (which probably are > 99.9% of the packets) will have to be checked against all those RAW rules. And because none matches, the packets will have to go to the next stage with connection tracking anyway, and most of them will be handled by the "accept established,related,untracked" rule. That's a lot of processing.

If you don't expect to be under massive attacks, then you can still have the same drop rules but in the normal filter table, and placed under the "accept established,related,untracked" rule. The same checks with 40+ rules will only be performed for the first packets of a connection. If a connection is legitimate and is no dropped by them, then normally all the following packets of that connection won't need to be checked against those rules anymore, and can be accepted immediately by the "accept established,related,untracked" rule at the top. Let's say you have a TCP connection that transfer 1 MB data with full MTU 1500, which means the 40+ rules only need to be checked against one packet of that connection, while the 700+ other packets only have to be matched against one single rule.

The RAW table has another use, that is when you want to allow certain kind of packets through without creating entries in the connection tracking table (because that save resource). In that case you would have action=notrack rules matching certain very specific conditions, and later in the filter table the packets would normally be accepted very soon by the "accept established,related,untracked" rule at the top. Obviously, this is only suitable if the packets don't need the facilities brought by connection tracking.

1 Like

The only reason to use the RAW table is that when some packet that matches it comes in, it can be dropped without further consequence. When you put the same rule in the filter table, at that time a “new” connection has already been created in the connection tracking table. So for some DDoS scenarios where e.g. an attacker sprays a lot of packets from many different source addresses, the connection tracking table could grow very large and consume resources (memory and CPU).

This is usually not a realistic situation to combat, because at that time the input line of the router probably is saturated anyway, and dropping traffic will do nothing to enable valid traffic (that has already been dropped at the other end) anymore. There basically is not much you can do on your own router against DDoS, it has to be done further upstream (at the ISP or some DDoS washing service).

Of course that does not inhibit creative newbies to post videos on Youtube describing this config.

1 Like

@jaclaz For the non-fasttrack cases, sure. Fasttrack however should not be graphed like this (I just lumped it in with the other data in the table.) It can either be left out entirely, or visualized as a horizontal line. It was only meant to provide a baseline. (It's horizontal because once fasttrack is established, the cpu load doesn't depend on the number of rules anymore, because the firewall is simply not consulted.)

@pe1chl While I still has the config available, I did some additional measurements to support when the raw table should be used.

For this test the same traffic was submitted to the router for forwarding, but instead of allowing all of it, all of it was dropped. The same measurement as before was done with a single firewall rule dropping the traffic - in one case this rule is in the filter forward chain, in the other it is in raw prerouting.

scenario                CPU0%   fw%  nw%  rt%
1st rule drop fwd       53      15   19    6
1st rule drop raw       25       3   10    0

(fw - firewall, nw - networking, rt - routing)

Comparing the first case to the previous table we actually see increased fw load. This means that it's actually more work for the firewall to drop packets than to forward them. This is easily explained because the conntrack lookup still has to be done, but after that, instead of using the present entry, a new one has to be created and then destroyed, for each packet.

In the raw case, as expected, firewall load is lower, as expected. However, significant savings are realized by terminating the packet flow early: other networking steps and a routing become unnecessary.

In all, if one is trying to protect against (d)dos, extensive scanning, etc. it's beneficial to drop such traffic in raw. Otherwise (from a purely filtering perspective) raw is best left alone.

My ultimate suggestion would be to have a single raw rule that matches the src-address against a blacklist. If we detect someone sending unwanted traffic, their address can de added to the list with an appropriate timeout. Ultimately, however, I think that worrying about ddos attacks is rarely worth the effort.

Yep, I had this doubt.
So when fasttrack is enabled, roughly CPU usage is 19-20% independently (or almost independently) from the number of firewall rules.
Maybe we can get rid of the fw% which is anyway "parallel" to CPU usage and just have CPU usage %?

Like?:
EDIT: removed graph, see below

There is always a difference depending on the type of traffic that you send, and depending on the actual use case of the router.

Assuming the use case is a router for typical home usage, the normally expected traffic is “reply traffic to outgoing connections”. That is handled very efficiently by FastTrack, and also quite efficiently by “accept established/related”. Further firewall rules are usually not an issue, there will be a rule “block all not from LAN” or 2 rules “allow all from LAN” and “block all”.

The CPU load may be higher when you fire lots of traffic not matching connection tracking to this device, but that is not very interesting because that is not the normal situation for a typical user. It may be that they are the victim of a DDoS (apparently popular in gaming circles) but that isn’t solved with whatever kind of rules and it is not what is normally happening when downloading some files at ISP maximum rate.

@jaclaz Yep. I would label the diagram as "FW rules before accept established/related". Otherwise it will be misquoted/misunderstood. Otherwise it's absolutely fine.

@pe1chl I always have a bit of difficulty with understanding the people who worry about (d)dos. For several reasons, actually. First of all, if you size your router to be able to handle the whole link traffic, then you have nothing to worry about. If, on the other hand, your link is flooded, there's nothing you can do about it in the firewall. The only situation I can imagine this being useful at all is when someone sends small sized packets your way, enough to overwhelm your firewall but not enough to saturate your link... yes, probably only in this fantasy situation it would make sense to include the raw rule.

We must come up with a better title or change some values, that one is confusing (or at least it confuses me).
You did not start with an empty firewall (i.e. 0 rules) you started with the 3 (I could add if needed a 0 rules datapoint that should have 0% CPU):

  1. (optional) deny invalid
  2. (optional) fasttrack-connection
  3. accept established/related

actually, if the fast-track connection one is a separate affair, from 2 rules, or 1 before the established/related, or if also the invalid is optional, from 1 rule (the established/related) and so 0 rules before it.

or am I misunderstanding? :astonished_face:

Yes, that is also what I have written before. But people write “recipes to filter DDoS”, put them on forums or even more often in Youtube videos, people are gonna copy it, and get in trouble.

It is like the “whenever someone sends me a SYN packet to port 22 or 23, I put him in a blocklist in the raw table”. Fine, but that brings absolutely nothing (because you do not have those ports open anyway) and people send those packets with spoofed source address like 8.8.8.8 causing unwanted side effects.

Still, those recipes are available and copied.

:slight_smile: Now it's my turn to be a little confused. The "before accept established/related" is meant to emphasize that we are only counting rules that are executed for all packets. Basically, everything before the usual "accept established" rule, and the ones after it (that are only executed once per connection) don't matter as much in terms of performance. That is: any optimization that one might otherwise be inclined to do should stop at that point. This test was specifically for rules that are actually executed.

If your point was whether we should count the "accept" rule of not makes no difference.

This is a bit of a weird semantic category for a few reasons. First of all, does the implicit accept (that is used in my test) count as a rule or nor? More importantly, if no rules are specified (and in the absence of net, etc.), the router automatically reverts to fastpath, which is its own can of worms.

By the way, this is why I didn't include the 0 rules measurement, because it only raises further question on what exactly that setup means.

In a sense the whole point of this benchmark was that the firewall has a baseline performance penalty, that we must pay, even in the absence of rules, and therefore it doesn't materially matter if we have 0, 3 or 5 rules. And in comparison to all these scenarios fasttrack is way faster.

The results for 0 rules with conntrack active is:

FW rules   CPU0%   fw%
0          49       6

(conntrack active)

So CPU doesn't go to 0, and neither does fw load.

The 50% CPU usage with 0 rules seem to me logical (and given the highly sophisticated :wink: way you recorded and averaged results within expected approximation), I could "round it down" to 49% only to have not a horizontal line between 0 and 3, but I would rather have not the 0 datapoint BUT making the 3 (actually 1 or 2) the real 0, explaining it.

A graphic (as I see it) should be something that visually represents something and it must be immediately or nearly immediately understandable.

We all agree that firewall rules AFTER the:

  1. accept established/related

will have a much lesser impact.

BUT (IMHO) to make the graphic meaningful, we need to find a way to explciit the way the measurements were made.

What if we specify that #1 (deny invalid) is NOT optional, that the #2 (fast track) is optional , and that the #3 ( 1. accept established/related) is actually the LAST rule, so that the number of rules represent those n inserted between 2 and last (n+2)?

I.e. a scheme like:

1 . deny invalid
2. (optional) fasttrack-connection
...
<n inserted rules>
...
n+2. accept established/related
...
<other rules that will have a much lower impact as they are after the n+2 one>

So the title "FW rules before accept established/related" would become more understandable

"deny invalid" (fewer matches) should be placed below "accept established/related" (much more frequent) like I wrote above. It's also the order in the defconf firewall:

The defconf rules have the rules that make "ipsec" skips fasttrack in front of "accept established/related" in the IPv4 forward chain and can be disable if there is no need for IPsec.

Ah, OK, then it will be even easier, if I am allowed to count starting from 0.

0 . (optional) fasttrack-connection
...
. <n inserted rules>
...
n . accept established/related
n+1 . (optional) deny invalid
...
<other rules that will have a much lower impact as they are after the n/n+1 ones>
...

New attempt:
(replaced with updated version, hopefully "final"):

@jaclaz Looks nice. Just a few tiny remarks :slight_smile:

I would position the fasttrack rule as "sticking" to the accept established/related. So after the "+n" and before "n". It1s optional, so I'm not sure whether to number it of not... Or maybe it's enough and more correct to say that the red (no-fasttrack) diagram is for the case where it is not there, and the green (fasttrack) line is for the case when it is.

The trend line for the non-fasttrack case is fine. 0.25% / rule is about the appropriate number.

The trand line for the fasttrack case however is not correct. The fasttrack cpu representation should be a flat line. The number of rules has no influence whatsoever, because for these flows the firewall is not consulted at all.