CHR nat masquerade performance

Hello, unfortunately the speed of masquerade nat is extremely low, my internet is 1 gigabit and the speed received in chr and x86 is 100 megabits, only src-nat to src-nat has the correct speed and gives the full 1 gigabit. What should I do? What is the problem? In fact, load balancing is canceled with this masquerade situation.

2025-05-13 14:22:54 by RouterOS 7.18.2

/interface ethernet
set [ find default-name=ether1 ] disable-running-check=no
/ip address
add address=192.168.1.91/24 interface=ether1 network=192.168.1.0
/ip dns
set servers=8.8.8.8
/ip firewall nat
add action=masquerade chain=srcnat
/ip route
add disabled=no distance=1 dst-address=0.0.0.0/0 gateway=192.168.1.90
routing-table=main scope=30 suppress-hw-offload=no target-scope=10

/ip firewall nat
add action=masquerade chain=srcnat

Usually a masquerade in src-nat is configured for a given interface or interface-list, i.e. either:

/ip firewall nat
add action=masquerade chain=srcnat out-interface=ether1

or:

/ip firewall nat
add action=masquerade chain=srcnat out-interface-list=WAN

and in some recent RoS v7 version it has added:

ipsec-policy=out,none

Cannot say if it is connected to the issue you are reporting, and of course there may other issues in the rest of the configuration.

It has no effect. The important question is, has anyone been able to transfer at speeds higher than 100mb on chr and x86 with masquerade enabled?

It depends on the actual hardware involved but I suspect more the configuration.
To allow some of the more expert members to comment on the issue, you should post your complete configuration, follow these instructions:
http://forum.mikrotik.com/t/forum-rules/173010/1

/ip firewall nat
add action=masquerade chain=srcnat

that part should have its output interface. otherwise you are natting the whole router interface. incorrect nat. hence you have awful nat router performance.

The output interface makes no difference.

all config my system
system spcs :
cpu i9 12900k
mainboard giga z690
ram 64gb
internet connectivity 1 gigabit 5g !
mikrotik x86 install on system route
also tested in virtualbox and vmware !
all result same!

\

2025-05-17 08:31:50 by RouterOS 7.18.2

/interface ethernet
set [ find default-name=ether1 ] disable-running-check=no name=Lan
set [ find default-name=ether1 ] disable-running-check=no name=Wan
/ip settings
set ipv4-multipath-hash-policy=l4
/ip address
add address=192.168.56.1/24 interface=Lan network=192.168.56.0
add address=192.168.1.66/24 interface=Wan network=192.168.1.0
/ip firewall nat
add action=masquerade chain=srcnat out-interface=Wan src-address=
192.168.56.0/24
/ip route
add disabled=no distance=1 dst-address=0.0.0.0/0 gateway=192.168.1.2
routing-table=main scope=30 suppress-hw-offload=no target-scope=10


Unfortunately, MikroTik’s goal is to sell by limiting the chr and x86 versions in the nat section and pushing people to buy expensive routers.

Are you sure of this:

/interface ethernet
set [ find default-name=ether> 1 > ] disable-running-check=no name=Lan
set [ find default-name=ether> 1 > ] disable-running-check=no name=Wan

It seems like ether1 is BOTH LAN and WAN?

I have a i5-9500 system successfully doing several Gbit/s of NAT, so this is a nonsense claim.

As far as I can tell, under-the-covers “masquerade” is exactly the same as “srcnat”, except it automatically determines the src address to use by whatever is chosen as pref-src of the outgoing interface. There should be no performance difference between the two.

I have not tested srcnat vs. masquerade on x86, and on my x86 example from above, I’m not using masquerade. But I have compared both of them on some of the cheapest Routerboard models…old RB951G models that were released back in 2013, so over 12 years ago. And the original retail price was USD$80…hardly expensive. Single-core MIPS processor running at 600MHz. There is no performance difference between srcnat and masquerade on these very-not-expensive models. And even without fasttrack (which does not work on x86 anyway), I can get close to 300Mbit/s with either srcnat or masquerade (both give exact same performance as each other). This is still 3x what you are getting.

It’s possible there is a software bug of some kind that you are running into. I will try to put together a “masquerade” test on x86 myself in the next few days, and see if I can reproduce.

One thing I notice in your latest screenshot is that the “L2MTU” of ether1 and ether2 are different. What this tells me is that these are two different ethernet chipsets (or, if this is a picture of a virtualized CHR, which it appears it might be based on MAC address starting with OUI of 00:0C:29 which is VMware, then you might have chosen different “adapter type” to emulate for each interface). Perhaps there is a bug in the underlying RouterOS driver for one NIC or the other…perhaps try changing them to both be the same, and switch around which adapter type to try (make them both VMXNET3, or make them both E1000, etc. …see if one somehow performs better than the other).

Okay, I just finished my tests. I did these on a HP t730 thin client PC that I installed a 2nd PCIe gigabit NIC into (Broadcom BCM5719 multiport one, as it was the only one I had handy at the moment). t730 has a 10-year-old quad-core AMD CPU in it, RX-427BB, which is a 35W part that I think is roughly equivalent to comparable 5th or 6th gen Intel. Way way way slower than your i9 12th gen. I hooked up “WAN” to one port of the BCM5719 PCIe card, and “LAN” to the built-in Realtek RTL8168 ethernet port of the thin client. Internet/WAN side of the connection was PPPoE @ 1440 MTU, so, arguably more overhead involved than your test with DHCP on your WAN.

Performance with ‘action=masquerade’ was perfectly fine, as shown by this Ookla Speedtest result taken with a Windows 10 client on the LAN side of the router, using the Ookla Windows app:

I tried it with RouterOS 6.x first and got these results. Then I upgraded to RouterOS 7.x and still had the same results. Under RouterOS 6, the highest that the CPU load got was 25% during the download part of the test. RouterOS 7 CPU utilization was (to my surprise) even lower, running just under 20% at the highest point.

So in summary, I have no idea what is causing your problem. But it isn’t some general issue with ‘masquerade’ performance on x86. There is something specifically wrong with your environment somehow, or some difference in your testing methodology, that I am unaware of and so cannot attempt to replicate.

Here was the running config:


/interface ethernet set [ find default-name=ether1 ] disable-running-check=no name=ether1-WAN
/interface ethernet set [ find default-name=ether5 ] disable-running-check=no name=ether5-LAN
/interface pppoe-client add add-default-route=yes disabled=no interface=ether1-WAN name=pppoe-out1 password=password use-peer-dns=yes user=username
/ip pool add name=pool1 ranges=192.168.88.2-192.168.88.10
/ip dhcp-server add address-pool=pool1 disabled=no interface=ether5-LAN name=server1
/ip address add address=192.168.88.1/24 interface=ether5-LAN network=192.168.88.0
/ip dhcp-server network add address=192.168.88.0/24 gateway=192.168.88.1
/ip firewall nat add action=masquerade chain=srcnat out-interface=pppoe-out1

And here are machine specs (base clock of CPU is actually 2.7GHz, but appears to be ramped up at the moment I took this snapshot):


[admin@MikroTik] > /system/resource/print 
                   uptime: 2m14s                     
                  version: 7.18.2 (stable)           
               build-time: 2025-03-11 11:59:04       
              free-memory: 3244.1MiB                 
             total-memory: 3552.0MiB                 
                      cpu: AMD                       
                cpu-count: 4                         
            cpu-frequency: 3192MHz                   
                 cpu-load: 0%                        
           free-hdd-space: 436.8MiB                  
          total-hdd-space: 465.3MiB                  
  write-sect-since-reboot: 962                       
         write-sect-total: 962                       
        architecture-name: x86_64                    
               board-name: x86 HP HP t730 Thin Client
                 platform: MikroTik

In the second case you mentioned, yes, the test I performed was on VMware, and the network card was the default E1000 adapter. Yes, there’s no difference in speed between srcnat and masquerade, but when I reduce the random value in these NAT rules below 50, the speed increases properly. What does this indicate?

Unfortunately, it seems that masquerade is incorrectly inspecting packets, and when a SYN-ACK packet transitions to an established state, it gets stuck in some kind of loop for inspection, which causes a significant drop in performance and speed, and leads to high CPU usage.

This issue is clearly noticeable on older MikroTik devices like the hEX RB750Gr3. In a typical setup using masquerade, the speed caps at around 250–270 Mbps, but if raw packets are sent and received (bypassing masquerade), the speed can reach gigabit levels.

Unfortunately, I don’t have access to an AMD processor, so I’m not aware of how the speed performs on those. The systems I currently have are an i5-4590, an i9-12900, and a Pentium 945. On the two weaker CPUs, the performance is extremely poor, fluctuating between 100 to 300 Mbps at best.

I’ve tested MikroTik installations on bare metal, as well as on VMware, VirtualBox, and QEMU. On the i9, even with the x86 version installed directly on bare metal, the issue still exists. In virtualized environments, it performs slightly better and barely manages to reach 1 Gbps.

If I send raw packets directly, I can easily achieve high gigabit speeds — but in doing so, I lose access to many features like load balancers.

masquerade default vs masquerade random (Up to 50 random speeds reach gigabite.(
i9 12900k
1.jpg
2.jpg

I wasn’t suggesting that AMD performance may be better or worse than Intel in this application. The only point I was trying to get across is that this is a very old and wimpy x86 CPU. (And also trying to put it in the proper context by comparing it to roughly equivalent Intel parts.) And even though it is not very powerful, it still managed to forward a full 1Gbit/s worth of masquerade traffic without breaking a sweat.


Now I am confused, because before this response, you never mentioned anything about these other two CPUs you were testing on; you only told us about your i9-12900 CPU, and in your original post, you specifically said “the speed received in chr and x86 is 100 megabits”. Now you are saying that in some scenarios, you can “barely” reach 1 Gbps. These are contradictory statements. Either you’re getting 100 megs, or you’re getting “close” to 1Gbps. So which is it?


In your very first post, you LITERALLY said, “the speed of masquerade nat is extremely low, […] is 100 megabits, only src-nat to src-nat has the correct speed and gives the full 1 gigabit”. So again, I am confused, because in your first post, you seemed to be comparing NAT performance of “action=masquerade” to “action=srcnat”. By “src-nat to src-nat”, did you mean something other than changing your NAT rule from “action=masquerade” to “action=srcnat to-addresses=<WAN_IP>”??? I don’t know how else to read “src-nat”. Unless you misspoke and meant that you disabled NAT entirely, and so were not changing src-address at all? If so, that’s not called “src-nat”. That would be called “NO NAT”. :slight_smile:


It indicates you are not being completely honest about your configuration, and what exactly you are trying to do.

In your original post, you mentioned NOTHING about using “random” matcher on your NAT rules, or even that you had multiple NAT rules. You included a configuration example that was extremely basic, and which implied that all you were doing was a single NAT rule, action=masquerade, for one internet connection @ 1Gbit/s. Now you’re talking about multiple NAT rules, using matchers you did not include in your example config, and possibly multiple internet connections for all we know, since you keep hinting at “load balancing” without actually explicitly saying what you mean by that.

As I’m getting fond of saying, nobody here is a mind reader. We can only interact with the things you say out loud, not the things you only think in your head. If you want help either to figure out where the performance bottleneck is, or even to validate that there is some RouterOS bug causing a bottleneck where there shouldn’t be one & you aren’t crazy after all, you need to explain your entire situation in detail (along with a complete config that reproduces the problem you are describing), and not leave us guessing about random crap that you may or may not be doing.

The config I posted for my test rig was the entire config of the router (minus auth credentials). If you are doing something more complicated than that, well, I didn’t test for that since you never explained that you were doing it.


Let’s first be unambiguous on definitions. “action=masquerade” is nothing more than “action=srcnat” except that it automatically decides what to change source address to on the outgoing packet, rather than require you to manually specify it with the “to-addresses=” parameter. I think whenever you say “masquerade”, what you really just mean is “NAT” generally.

Yes, NAT has an impact on performance. This is a given, and also just common sense. It has more computational overhead than just forwarding packets without touching them at all does & without having to track the various active connections flowing through the router. And yes, on a device like RB750Gr3, doing almost any form of NAT will reduce performance down to about 200-300 megs. However, if you use FastTrack, then you CAN get 1Gbit/s of forwarding performance with NAT even on small, cheap RouterBOARDs like the 750Gr3.

Unfortunately, FastTrack is NOT supported on x86, and though this has been hotly debated in other threads elsewhere on this forum, the explanation given by MikroTik at least in the past is that FastTrack can only be properly supported by RouterOS on specific interfaces that use specific drivers. And x86 ethernet hardware does not appear anywhere on the supported hardware list for FastPath/FastTrack.

However, something like RB750Gr3 has a VERY wimpy CPU/SoC, and NEEDS FastTrack to be able to do NAT at 1Gbit/s. But as I believe I have successfully demonstrated already, even a relatively wimpy x86 CPU is still many times more powerful than the CPU in most sub-$100 RouterBOARDs, and if you have a sufficiently fast x86 CPU, you don’t need FastTrack to be able to forward 1Gbit/s of traffic with NAT. The CPU is more than powerful enough to do it without FastTrack.

What we need to know at this point are the missing puzzle-pieces to your config that have gone unspoken to this point, if you actually want any help to solve your mystery.

@NathanA
I admire your patience.

@NathanA

Now I am confused, because before this response, you never mentioned anything about these other two CPUs you were testing on; you only told us about your i9-12900 CPU, and in your original post, you specifically said “the speed received in chr and x86 is 100 megabits”. Now you are saying that in some scenarios, you can “barely” reach 1 Gbps. These are contradictory statements. Either you’re getting 100 megs, or you’re getting “close” to 1Gbps. So which is it?

I have 3 processors available to me: 1) i5-4590, 2) Hex RB750Gr3, and 3) i9-12900KS. In masquerade NAT and src-nat mode, they all deliver low speeds. Only when I don’t use masquerade or src-nat and send raw or mangle packets, all three devices can achieve gigabit speed.But in PCc load balancers and VPN usage on out mikrotik devices, packets are subject to confusion and many interruptions.


In your very first post, you LITERALLY said, “the speed of masquerade nat is extremely low, […] is 100 megabits, only src-nat to src-nat has the correct speed and gives the full 1 gigabit”. So again, I am confused, because in your first post, you seemed to be comparing NAT performance of “action=masquerade” to “action=srcnat”. By “src-nat to src-nat”, did you mean something other than changing your NAT rule from “action=masquerade” to “action=srcnat to-addresses=<WAN_IP>”??? I don’t know how else to read “src-nat”. Unless you misspoke and meant that you disabled NAT entirely, and so were not changing src-address at all? If so, that’s not called “src-nat”. That would be called “NO NAT”. > :slight_smile:

At first, I thought the speed was better with action=srcnat, but then I realized my configuration was incorrect. I fixed it and achieved the speed with masquerade


It indicates you are not being completely honest about your configuration, and what exactly you are trying to do.

In your original post, you mentioned NOTHING about using “random” matcher on your NAT rules, or even that you had multiple NAT rules. You included a configuration example that was extremely basic, and which implied that all you were doing was a single NAT rule, action=masquerade, for one internet connection @ 1Gbit/s. Now you’re talking about multiple NAT rules, using matchers you did not include in your example config, and possibly multiple internet connections for all we know, since you keep hinting at “load balancing” without actually explicitly saying what you mean by that.

As I’m getting fond of saying, nobody here is a mind reader. We can only interact with the things you say out loud, not the things you only think in your head. If you want help either to figure out where the performance bottleneck is, or even to validate that there is some RouterOS bug causing a bottleneck where there shouldn’t be one & you aren’t crazy after all, you need to explain your entire situation in detail (along with a complete config that reproduces the problem you are describing), and not leave us guessing about random crap that you may or may not be doing.

The config I posted for my test rig was the entire config of the router (minus auth credentials). If you are doing something more complicated than that, well, I didn’t test for that since you never explained that you were doing it.

"In order not to complicate the matter and get into other topics that I need, I focused only on the main issue, which is a single NAT rule with action=masquerade. I am providing you with the configuration that I have in mind.


Let’s first be unambiguous on definitions. “action=masquerade” is nothing more than “action=srcnat” except that it automatically decides what to change source address to on the outgoing packet, rather than require you to manually specify it with the “to-addresses=” parameter. I think whenever you say “masquerade”, what you really just mean is “NAT” generally.

Yes, NAT has an impact on performance. This is a given, and also just common sense. It has more computational overhead than just forwarding packets without touching them at all does & without having to track the various active connections flowing through the router. And yes, on a device like RB750Gr3, doing almost any form of NAT will reduce performance down to about 200-300 megs. However, if you use FastTrack, then you CAN get 1Gbit/s of forwarding performance with NAT even on small, cheap RouterBOARDs like the 750Gr3.

Unfortunately, FastTrack is NOT supported on x86, and though this has been hotly debated in other threads elsewhere on this forum, the explanation given by MikroTik at least in the past is that FastTrack can only be properly supported by RouterOS on specific interfaces that use specific drivers. And x86 ethernet hardware does not appear anywhere on the supported hardware list for FastPath/FastTrack.

However, something like RB750Gr3 has a VERY wimpy CPU/SoC, and NEEDS FastTrack to be able to do NAT at 1Gbit/s. But as I believe I have successfully demonstrated already, even a relatively wimpy x86 CPU is still many times more powerful than the CPU in most sub-$100 RouterBOARDs, and if you have a sufficiently fast x86 CPU, you don’t need FastTrack to be able to forward 1Gbit/s of traffic with NAT. The CPU is more than powerful enough to do it without FastTrack.

What we need to know at this point are the missing puzzle-pieces to your config that have gone unspoken to this point, if you actually want any help to solve your mystery.

That’s true, but what is happening now with masquerade or srcnat—or whatever other term you use—is that enabling it causes a performance drop even on the most powerful x86 systems. Even the RB750 can achieve gigabit speeds without FastTrack and NAT

@NathanA

My Priority is to be able to run PCC load balancing without needing srcnat or masquerade. Right now, with this configuration, gigabit speeds are achieved without CPU overload, but the main issue is with the handshake of HTTPS connections and similar VPN cases. This problem is resolved with NAT masquerade, but then the CPU gets close to overheating at speeds around 100 Mbps.

2025-05-19 13:08:21 by RouterOS 7.18.2

/interface ethernet
set [ find default-name=ether1 ] disable-running-check=no
/routing table
add disabled=no fib name=Wan1
add disabled=no fib name=Wan2
/ip settings
set ipv4-multipath-hash-policy=l4
/ip address
add address=192.168.1.100/24 interface=ether1 network=192.168.1.0
/ip firewall mangle
add action=mark-routing chain=prerouting new-routing-mark=Wan1 passthrough=yes
per-connection-classifier=both-addresses-and-ports:2/0 src-address=
192.168.1.200
add action=mark-routing chain=prerouting new-routing-mark=Wan2 passthrough=yes
per-connection-classifier=both-addresses-and-ports:2/1 src-address=
192.168.1.200
/ip firewall nat
add action=masquerade chain=srcnat protocol=udp src-address=192.168.1.200
add action=masquerade chain=srcnat dst-port=!80,443,8080 protocol=tcp
/ip route
add disabled=no dst-address=0.0.0.0/0 gateway=192.168.1.1 routing-table=main
suppress-hw-offload=no
add disabled=no dst-address=0.0.0.0/0 gateway=192.168.1.1 routing-table=Wan1
suppress-hw-offload=no
add disabled=no dst-address=0.0.0.0/0 gateway=192.168.1.2 routing-table=Wan2
suppress-hw-offload=no
/system note
set show-at-login=no



“In order not to complicate the matter and get into other topics that I need, I focused only on the main issue, which is a single NAT rule with action=masquerade. I am providing you with the configuration that I have in mind.”

You never mentioned that you fixed your original problem, or that it was due to a misconfiguration you found. You also made an assumption in that first post about where your main issue was located, so you felt you could leave out important details. When I ran your same experiment on an x86 system that matched your example config with “a single NAT rule with action=masquerade”, I did not have performance problems. Therefore I was left to conclude that we are still missing important details (which appears to be correct).


what is happening now with masquerade or srcnat—or whatever other term you use—is that enabling it causes a performance drop even on the most powerful x86 systems.

As already established and explained, NAT will always cause some measurable “performance drop”, because doing NAT is not “free”, computationally. This performance drop happens on ALL platforms, not just x86. It is intrinsic to connection-tracking-based NAT. The only question therefore is whether you have budgeted enough CPU horsepower to do whatever you need to do. (Or whether you are possibly approaching the problem the wrong way, if it seems impossible to obtain a CPU that is fast enough.)


Even the RB750 can achieve gigabit speeds without FastTrack and NAT

It is not clear what your meaning is here. If you mean, “the RB750Gr3 can achieve gigabit speeds if you are not doing any NAT”, then of course that’s true. But if you mean, “the RB750Gr3 can achieve gigabit speeds when doing NAT, even if you don’t use FastTrack”, we already discussed how that isn’t true! You yourself already observed that doing regular source masquerade on such a device causes performance to plummet from gigabit down to “250-270 Mbps”. That shows that on a device like that, doing NAT causes it to lose 75% of forwarding performance!! That just goes to demonstrate how much impact to forwarding performance NAT actually has!

On RB750Gr3, the ONLY way to get better performance when doing NAT is to pair NAT together with FastTrack. Since FastTrack is not an option on x86, again, you just need to budget CPU power appropriate to your particular application.


I will have to sit down and digest this as well as think up an appropriate reproduction of your application in my lab, when I have some more time to do so. Let’s just be clear, though, that this is moving the goal posts considerably, from where they started with the very first post(s) in this thread.