how does L3HW actually works?

Hallo MikroTik,
I would like to discuss layer3-offloading (again), I hope some of you guys is reading this.

I am trying to understand the actual functionality. The only reference I can find in L3HW Feature Support is:

This works only for directly connected networks. Since HW does not know how to send ARP requests,
CPU sends an ARP request and waits for a reply to find out a DST MAC address on the first received packet of the connection that matches a DST IP address.
After DST MAC is determined, HW entry is added and all further packets will be processed by the switch chip.

I understand it like this. For all CRS300 and CCR2000:

  1. a new packet arrives and goes through the CPU and passes through everything as described in Manual:Packet Flow. If a direct route to a neighbouring network (VLAN) is open, without any restrictions/special handling such as NAT, Queuing, etc. are configured, a route is entered into the memory of the switch chip.
  2. further packets of the same connection now go directly via the switch chip
  3. inactive connections are removed from the memory of the switch chip.


This means that if several VLANs are tagged via one port, then they will talk to each other via the switch chip, as far as this is permitted, and no longer via the CPU.
So even tagged traffic is offloaded to the Switch-Chip but if I read CRS317 l3hw + firewall question - MikroTik this is only possible in the latest v7.2RC-Version. The stable version can only handle untagged traffic.


For the DX8000 and DX4000 series, as well as the CCR2000, there are two additional points:

  1. NAT, e.g. Masquerade
  2. the learning of the NAT rule(s) is as described in point 1 above
  3. the translation of the IPs for NAT’d connections is entered into the switch chip
  4. further packets of the same connection now go directly through the switch chip and I can hide the individual IPs from your VLAN when communicating with another VLAN
  5. FASTTRACK, for connections that no longer flow through the same bridge but after they are ESTABLISHED, do not being handled by e.g. Queuing. The best example here is the LAN2WAN connection.
  6. The learning of FASTTRACK connections is analogous to general rule 1 above.
  7. Source and destination interfaces of the directly connected networks are entered into the switch chip.
  8. Further packets of the same connection now go directly via the switch chip.

There are two types of Hardware Routing (L3HW): Full Hardware Routing and Firewall-Compatible Hardware Routing. Full L3HW, in turn, differs between routing via an explicit nexthop gateway(-s) and routing to a connected L2 network (a.k.a. Connected Routes). As a result, there are three different cases, so let’s describe each of them.


1. Full Hardware Routing via Nexthop Gateway(-s)
Full Hardware Routing is set by enabling l3-hw-offloading both on the switch and switch ports:

/interface/ethernet/switch set 0 l3-hw-offloading=yes
/interface/ethernet/switch port set [find] l3-hw-offloading=yes

Now, let’s define a routing via a direct gateway. For instance, let’s route the 10.0/8 network via 192.168.1.1:

/ip/route add add dst-address=10.0.0.0/8 gateway=192.168.1.1

And then check the “H” flag to make sure that the route got offloaded:

/ip/route print 
Flags: D - DYNAMIC; A - ACTIVE; c, s, y - COPY; H - HW-OFFLOADED; + - ECMP
Columns: DST-ADDRESS, GATEWAY, DISTANCE
#       DST-ADDRESS      GATEWAY        DISTANCE
0  AsH  10.0.0.0/8       192.168.1.1           1

Offloading the above rule tells the switch chip to route all packets with destination IP in subnet 10.0/8 to 192.168.1.1. None of the packets to the 10.0/8 will ever enter the CPU. Hence, none of the IP Firewall rules will ever trigger. That’s why Full L3HW is incompatible with IP Firewall. On the other hand, Full L3HW is the fastest one.

However, you may offload some traffic control via switch ACL rules:

/in/eth/sw/rule

So if a limited stateless firewall is enough for the given task, you may continue with Full L3HW + ACL Rules, reaching near wire-speed performance.


2. HW-Offloading Connected Routes
That is also part of Full L3HW, but this time we specify an interface as a gateway instead of a nexthop IP. RouterOS creates Connected Routes automatically (dynamically) when assigning an IP address to an interface. In the next example, we set IP addresses to ether1, vlan32, and vlan40 interfaces, which resulted in connected routes creation in the respective networks.

/ip address
add address=192.168.1.17/24 interface=ether1 network=192.168.1.0
add address=192.168.32.17/24 interface=vlan32 network=192.168.32.0
add address=192.168.40.17/24 interface=vlan40 network=192.168.40.0

/ip/route print 
Flags: D - DYNAMIC; A - ACTIVE; c, s, y - COPY; H - HW-OFFLOADED; + - ECMP
Columns: DST-ADDRESS, GATEWAY, DISTANCE
#       DST-ADDRESS      GATEWAY        DISTANCE
  DAcH  192.168.1.0/24   ether1                0
  DAcH  192.168.32.0/24  vlan32                0
  DAcH  192.168.40.0/24  vlan40                0

To understand how Connected Routes can be offloaded, we need to know how Connected Routes are processed. Let’s imagine an Inter-VLAN routing case where a host 192.168.32.5 (vlan32) wants to connect to 192.168.40.10 (vlan40). The former sends a packet to our router (192.168.32.17). The router checks the routing table and identified the destination is somewhere in vlan40. So it broadcasts an ARP request “Who has 192.168.40.10?” to all bridge ports that belong to vlan40 (“/in/br/vlan print where vlan-ids=40”). Let’s say the router gets the reply from sfp-sfpplus5 (which belongs to vlan40) and learns its L2 (MAC) address. Now, the router knows the physical interface and MAC address of the destination 192.168.40.10, caching the data in its Forwarding Database (FDB) for reuse in subsequent packet routing to the same IP address.

The answer “How Connected Routes can be offloaded?” is “They cannot”. The switch chip cannot send ARP requests and resolve new hosts inside the subnet. RouterOS (CPU) does that instead. Connected Routes are redirected to CPU by default, but the resolved hosts are offloaded to the hardware as /32 routes. For instance, “dst-address=192.168.40.10/32 gateway=192.168.40.10%sfp-sfpplus5 VLAN-ID=40”. That’s why there is a remark in the documentation telling:

H-flag does not indicate that route is actually HW offloaded, it indicates only that route can be selected to be HW offloaded.

A connected route having the “H” flag means that the hosts within the subnet can be hw-offloaded. The subnet itself stays on the CPU. However, once both source and destination hosts are offloaded (as /32 routes), Full Hardware Routing is established between them, offering near wire-speed performance.

Since L3HW depends on L2HW, it utilizes the full capability of the hardware L2 processing, including VLAN tagging/untagging. In other words, Full Hardware Routing provides Inter-VLAN routing between tagged and/or untagged ports out of the box. And it works in RouterOS v7.1 too. The changes in v7.2 are related to FastTrack only, which we’ll discuss in the next chapter.


3. FastTrack Connection HW Offloading
Full HW Routing is incompatible with the stateful IP Firewall. So the tradeoff must be made between speed and security. Even NAT, which requires connection tracking – a feature of stateful L4 Firewall – is incompatible with Full HW Routing. Fortunately, some device models support FastTrack Offloading, allowing to achieve near wire-speed routing performance for a limited amount of connections while keeping the Firewall running. In other words, FastTrack Offloading provides Hardware-accelerated L4 Stateful Firewall.

Firewall-Compatible HW Routing is set by disabling l3-hw-offloading on the switch ports, where Firewall rules must be applied:

# Enable full hardware routing on LAN ports
:foreach i in=[/interface/list/member/find where list=LAN] do={
    /interface/ethernet/switch/port set [/interface/list/member/get $i interface] l3-hw-offloading=yes
}
 
# Disable full hardware routing on WAN or IoT ports
:foreach i in=[/interface/list/member/find where list=WAN or list=IoT] do={
    /interface/ethernet/switch/port set [/interface/list/member/get $i interface] l3-hw-offloading=no
}
 
# Activate Layer 3 Hardware Offloading on the switch chip
/interface/ethernet/switch/set 0 l3-hw-offloading=yes
  1. Packets are processed by the CPU/Firewall by default.
  2. Established connections can be fast-tracked - a faster (shorter) processing path, resulting in higher speeds.
  3. FastTrack connections can be offloaded to the switch chip (if the latter supports that), offering near wire-speed performance.
  4. If a FastTrack connection requires network address translation, the NAT rule gets offloaded too.

Before v7.2, RouterOS didn’t support FastPath on vlan-filtered bridges, meaning no FastTrack for Inter-VLAN routing and no HW offloading. The feature has been introduced in RouterOS v7.2rc2, allowing Firewall-Compatible Inter-VLAN Routing.


Difference Between Connected Routes and FastTrack Connection Offloading
While it may seem that those two follow a similar pattern (redirect to CPU first, then offload to HW), there are fundamental differences.

  • When a connected host gets offloaded (/32 route), all traffic to it gets routed by the hardware, bypassing the firewall. For instance, once your computer got connected to a server, the server’s IP gets offloaded to the hardware, and any device on the network can access it, including a random IoT device from an insecure VLAN.
  • FastTrack Offloading applies to the offloaded connections only, and only until the respective connections are closed. If you’re connected to the server, nobody else can connect to it without going through the Firewall first.
  • Amount of connected hosts that can be offloaded greatly exceeds the number of hardware FastTrack connections. At the moment of writing, the highest amount of HW FastTrack connection is 4.5k while some devices can offload up to 128k hosts (e.g. CRS317).
  • When possible, try establishing Full HW Routing between trusted networks (e.g., between admin and server VLANs), leaving Firewall for crossing the unsecured zone (or NAT).
  • Fine-tune FastTrack connections that can be HW-offloaded via firewall filter rules by setting hw-offload=yes|no. For instance, there is no need to HW-offload low-bandwidth connections, such as a smart power socket or a fridge controller.

thx a lot for the comprehensive answer, I have to find a silent moment tonight, read any details and make sure that I grasp it :slight_smile:

Thank you, @raimondsp. I wish all the documentation was written this way.

+1
thank you!

@raimondsp thx for the great effort. Before trying to describe your answers in my own words could you tell me where the L3 information is stored in the switch chip?
As that causes me difficulties to imaging how the processes on the Switch Chip are happening.

According to the https://help.mikrotik.com/docs/display/ROS/Switch+Chip+Features#SwitchChipFeatures-Features there is only the

  1. Host table
  2. VLAN table
  3. Rule table

if I exclude inter-VLAN routing there is only the Host and Rule table left.
Host table stores only MAC and corresponding port, so only the Rule table remains.

If I compare the size of the Rule table with the possible IP4 routes there is quite a gap e.g. CRS326-24S+2Q+, ACL rules:170, IP4 Routes 16K - 30K.

I’m afraid that I cannot reveal technical details of switch chips due to NDA. Switch chips have their internal memory: RAM and TCAM. In some cases, memory is shared between multiple processing units (e.g., FastTrack connections share TCAM with ACL rules), while most of the units have their exclusive memory regions (e.g., Routing Table).

L3HW Device Support

My feeling is that the linked document (briefly) describes the basic switch chips used in low-end devices and those don’t support L3HW offload. But doesn’t describe the upper-end switch chips, which can offload L3. Or am I wrong @raimondsp?

so we just say there are other tables in the TCAM besides the in Switch Chip Features - RouterOS - MikroTik Documentation describes one. These additional tables store information as given in the tables in L3 Hardware Offloading - RouterOS - MikroTik Documentation.

spacer

Could some be that kind and explained the difference between

IPv4 Route Prefixes

and

IPv4 Routes

?
I think and read my head sore but cannot come up with a proper way to explain the difference and the impact on how to configure the rules.

The Switch Chip Features document had been written before L3HW implementation, so it does not contain L3-related tables. The latter are specified on L2HW Device Support page.

Marvell switch chips use classified proprietary algorithms for routing, which we cannot reveal without violating NDA. What is important for MikroTik users is that we have implemented an abstraction layer to keep all chip-specific functions under the hood while presenting the common routing UI: no matter if it is a software routing, DX3000, or DX8000 L3HW - the end-users have zero configuration overhead in terms of routing.

In the case of DX3000/DX2000 switch chip serries, it is quite simple: one RouterOS route entry (/ip/route/) reflects into one HW IPv4 route prefix entry. Connected hosts (/32 routes) also occupy the same table. As long as the total number of routes (“ip/route print count-only”) + connected host count (“/interface/bridge/host print count-only”) , 13312 (13k), everything gets offloaded. Exceeding the number, routes with shorter prefixes stay on the CPU.

The DX8000/DX4000 have entirely different routing tables. The new routing model is more hardware-friendly but causes headaches for human beings who want to understand it. Instead of prefixes, we have to offload route indexes (once again, in hardware-friendly but human-pain format). The entire IPv4 address range must be indexed, i.e., 0.0.0.0 - 255.255.255.255. Adding new route entries causes index rebuild, increasing hardware memory by 0-5 entries depending on the complexity of the routing table. That’s why no exact numbers are given. For example, some routing tables containing 240K entries can be fully offloaded to CRS317 HW, while others with 160K entries barely fit. And you will never know until you try. Set up a BGP feed and look at RouterOS LOG: if the “Route HW table FULL” warning message does appear, everything got offloaded. If you see the warning, either configure routing filters to suppress hw-offload or buy another MikroTik device and offload half of the table to it. Yes, you can stack multiple CRS3xx/CCR2x16 devices to split the HW routing table - that way, you can do L3HW processing on the full BGP table.

So i also unterstund the limitation table wrong. For example the CRS317 can hold up to 240k routes and can route packets in hardware for all routes that a stored in the routing table of the switch chip. There is no connection limit because there is no connection tracking? The limitation for fastrack connection is independent from the routing table limitation rigth? The 4,5k Fasttrack connection limit is, if i offload connection with a firewall rule with hw-offload=yes ticked. Or are all fasttrack connection automatically offloaded?

For inter vlan routing (aka HW-Offloading Connected Routes). For example if i have two access switch connected to a CRS317. Each access switch is a separate L2 with a /24 subnet. Now if the CRS317 attempts to route between those two networks, two /32 routes a creaded in the routing table of the switch chip. If a second client in network A start to communicate to the same host in network B, only one new /32 route in added to the routing table for the return path. So worst case, i end up with 508 /32 routes for two /24 networks.

For inter vlan routing (aka HW-Offloading Connected Routes). For example if i have two access switch connected to a CRS317. Each access switch is a separate L2 with a /24 subnet. Now if the CRS317 attempts to route between those two networks, two /32 routes a creaded in the routing table of the switch chip. If a second client in network A start to communicate to the same host in network B, only one new /32 route in added to the routing table for the return path. So worst case, i end up with 508 /32 routes for two /24 networks.

This is how L3 switching works and what makes is so efficient: Packets are forwarded on directly on L2 but based on L3 IP dest adresses. Instead of looking up the egress port in the host table based on L2 mac, it is looked up based on L3 dest IP. Same as there is one table entry per L2 MAC, there is also one table entry per L3 IP. In reality, it is a bit more ecomlicated, but the principle applies.

Connection tracking is a firewall feature, not a router feature.

Personally I would rather see better MPLS-HW support. Just plain flat non-VRF L3 routing is a bit limiting and the hardware can never hold full BGP feeds anyways.

I love your answer, you are so 100% right, you made my day :laughing:


back to my question :slight_smile: :

  1. DX3000/DX2000 stores as:
  2. manual added route
/ip/route add add dst-address=10.1.1.0/24 gateway=192.168.1.1

adds 254 routes, first route is 10.1.1.0 → 192.168.1.1 and the last route is 10.1.1.254 → 192.168.1.1
2. learned host IP due to ARP request

"dst-address=192.168.40.10/32 gateway=192.168.40.10%sfp-sfpplus5 VLAN-ID=40"

This adds exactly one single route (a known host).
both only happen, if source and destination port configured to allow L3HW
.
2. DX8000/DX4000 stores as:

Could you explain what you mean by route indexes?
The only thing I can conclude from your description is that you may refer to IPv4 classes?
That would mean in the case

/ip/route add add dst-address=10.1.1.0/24 gateway=192.168.1.1

it is a call A address range, so the IP range is 1.0.0.0 to 127.0.0.0 which yields 16’777’214 IPs, each of them has an index, e.g. 1 to 16’777’214 and only the used indexes are getting offloaded but the Switch-Chip knows how to translate from index to an actual route?

Everything is right.


FastTrack connection offloading require a filter rule with hw-offload=yes. The latter is enabled by default unless you explicitly disable it. FastTrack connections happen only if the packets traverse the CPU/Firewall, i.e., l3-hw-offloading=no on ingress or egress switch port.

/ip/firewall/filter add action=fasttrack-connection chain=forward connection-state=established,related hw-offload=yes

You can fine-tune which connections get offloaded by adding multiple rules. The next example offloads only TCP connections, leaving UDP packets to the CPU.

/ip firewall filter
add action=fasttrack-connection chain=forward connection-state=established,related hw-offload=no protocol=udp
add action=fasttrack-connection chain=forward connection-state=established,related hw-offload=yes
add action=accept chain=forward connection-state=established,related

^ do not repeat that at home, or your kid will complain about missing shots in Counter-Strike due to latency /s


In the original post, I forgot to mention that DX4000/DX8000 switch chips store /32 routes (host L3 entries) in Forwarding Database (FDB) rather than in the routing table. For instance, CRS317 can store 160K-240K routes + up to 64K L3 host entries.


We have plans for implementing MPLS-HW support, at least for the DX8000 series. Not in near future, though. Currently, we are working on v7.2 stabilization and IPv6 L3HW.


NO. It adds just one route: 10.1.1.0/24 —> 192.168.1.1. I think you’re confusing it with a connected route, e.g.:

/ip/route add add dst-address=10.1.1.0/24 gateway=ether1

.


Imagine that our routing table has only one entry - default gateway:

/ip/route add gateway=10.0.0.1

. The entire IPv4 range is covered, so the index is:

0.0.0.0 - 255.255.255.255 => 10.0.0.1

Then we add “10.4.0.0/16 => 10.0.0.2” entry. The index changes to:

1. 0.0.0.0 - 10.3.255.255 => 10.0.0.1
2. 10.4.0.0 - 10.4.255.255 => 10.0.0.2 
3. 10.5.0.0 - 255.255.255.255 => 10.0.0.1

In the above case, adding one route entry increased the number of index entries by two. Now, let’s add “10.6.0.0/15 => 10.0.0.2”:

1. 0.0.0.0 - 10.3.255.255 => 10.0.0.1
2. 10.4.0.0 - 10.4.255.255 => 10.0.0.2 
3. 10.5.0.0 - 10.5.255.255 => 10.0.0.1
4. 10.6.0.0 - 10.7.255.255 => 10.0.0.2
5. 10.8.0.0 - 255.255.255.255 => 10.0.0.1

Again, +1 route entry caused +2 index entries. However, adding “10.8.0.0/16 => 10.0.0.3” create only one additional index entry:

1. 0.0.0.0 - 10.3.255.255 => 10.0.0.1
2. 10.4.0.0 - 10.4.255.255 => 10.0.0.2 
3. 10.5.0.0 - 10.5.255.255 => 10.0.0.1
4. 10.6.0.0 - 10.7.255.255 => 10.0.0.2 
5. 10.8.0.0 - 10.8.255.255 => 10.0.0.3
6. 10.9.0.0 - 255.255.255.255 => 10.0.0.1

Adding “10.5.0.0/16 => 10.0.0.2” leads to an interesting result - instead of adding index entries, it reduces them by merging adjacent ranges:

1. 0.0.0.0 - 10.3.255.255 => 10.0.0.1
2. 10.4.0.0 - 10.7.255.255 => 10.0.0.2 
3. 10.8.0.0 - 10.8.255.255 => 10.0.0.3
4. 10.9.0.0 - 255.255.255.255 => 10.0.0.1

I showed you only the top of the iceberg. The hardware memory layout is way more complex, leading to even more variations. The specified “IPv4 Ranges” in the table is the worst and best common case. I bet the one can come up with a crazy setup and overflow the table with half of the specified number. On the other hand, with only two nexthops, no ECMP, and no recursive routes it might be possible to squeeze in the entire BGP table in CRS317 HW memory.

@raimondsp thx a lot of the patient and sharing all the details :smiley:

yes, I did, day was long…
I have to be more detailed:

  1. DX3000/DX2000 stores as:
  2. manual added route, host to gateway
/ip/route add dst-address=10.1.1.17/32 gateway=192.168.1.1

adds a single route for the host 10.1.1.17/32 → 192.168.1.
That would happen to per host, who is asking for route, so in /24 you can end up with 254 routes pointing to e.g. 192.168.1
2. manual added route, subnet to gateway

/ip/route add dst-address=10.1.1.0/24 gateway=192.168.1.1

adds a single route for the entire subnet 10.1.1.0/24 → 192.168.1.1
3. Connected Routes,
CPU assumes there are more hosts in the same subnet behind the interface with the configured IP

/ip address add address=192.168.1.1/24 interface=ether1 network=192.168.1.0
/ip/route add dst-address=10.1.1.17/24 gateway=ether1

adds a single route for the subnet 10.1.1.17/24 (or host 10.1.1.17/32) → ether1(192.168.1.1/24)
4. learned host IP due to ARP request

"dst-address=192.168.40.10/32 gateway=192.168.40.10%sfp-sfpplus5 VLAN-ID=40"

This adds exactly one single route (a known host).
both only happen, if source and destination port configured to allow L3HW
.
2. DX8000/DX4000 stores as:
There is ost likely a better wording but in my words, you cannot have overlapping routes like
0.0.0.0 → 192.168.1.1
10.10.1.1 → 192.168.1.1
which would be applied in order from most specific to least specific (well explained in https://unix.stackexchange.com/questions/188584/which-order-is-the-route-table-analyzed-in)
There can be only a single entry per pair, either subnet to gateway/interface or host to gateway/interface, so a table is created as excellent illustrated by you.

Another thing as we talk about VLAN to VLAN routing.
The CPU becomes a tagged port allowing several VLAN (routed ones and Management) to enter the CPU.

What happens to VLAN id, what functionality replaces the source ID by the destination ID, so that it won’t be dropped by the ingres filter + VLAN filter = yes?

@raimondsp

Thank you for that extremely detailed explanation !

Oh? How so? How would I hypothetically configure enough CRS3xx/CCR2x16 devices to have a full table among them?

I’m also interested to know how this might work. The new CCR2216 can handle up to 120k routes in hardware. How can I do full table (850k v4 and 150k v6) with 2x 120k?