CRS3xx L3HW offloading MTU problem

I finally found some time to play with the new beta and L3HW offloading in particular.
I did configure two networks, one attached to my client and the other pointing into my regular infrastructure, added a route and activated l3hw offloading. a few unexpected reboots later (yes i know, console only ;), things were up and running. so far so good.

Problems started when i tried to iperf the routing. with the default config of my client (as i use in my regular infra) i didn’t get more than 600Mbits over the 10g links. After a round of debugging i figured that even though the interface are configured with a MTU of 9184 the maximum that got routed were 1472 bytes, everything bigger is dropped silently.
After reducing the MTU on the sending side i was able to push 3.3Gbits at around 2% CPU load. Not bad, but not quite the expected wire speed (;

I am running ROS 7.1beta2 on a CRS309 with the following config:

[admin@MikroTik] > /export hide-sensitive 
# oct/20/2020 18:23:51 by RouterOS 7.1beta2
# model = CRS309-1G-8S+
/interface bridge
add admin-mac=48:8F:7B:43:25:69 auto-mac=no comment=defconf name=bridge
/interface ethernet
set [ find default-name=ether1 ] l2mtu=1592
set [ find default-name=sfp-sfpplus1 ] l2mtu=9574 mtu=9184
set [ find default-name=sfp-sfpplus2 ] l2mtu=9574 mtu=9184
set [ find default-name=sfp-sfpplus3 ] l2mtu=9574 mtu=9184
set [ find default-name=sfp-sfpplus4 ] l2mtu=9574 mtu=9184
set [ find default-name=sfp-sfpplus5 ] l2mtu=9574 mtu=9184
set [ find default-name=sfp-sfpplus6 ] l2mtu=9574 mtu=9184
set [ find default-name=sfp-sfpplus7 ] l2mtu=9574 mtu=9184
set [ find default-name=sfp-sfpplus8 ] l2mtu=9574 mtu=9184
/interface ethernet switch
set 0 l3hw=yes
/interface lte apn
set [ find default=yes ] ip-type=ipv4
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/ip hotspot profile
set [ find default=yes ] html-directory=flash/hotspot
/ip vrf
add list=all name=main
/interface bridge port
add bridge=bridge comment=defconf interface=sfp-sfpplus3
add bridge=bridge comment=defconf interface=sfp-sfpplus4
add bridge=bridge comment=defconf interface=sfp-sfpplus5
add bridge=bridge comment=defconf interface=sfp-sfpplus6
add bridge=bridge comment=defconf interface=sfp-sfpplus7
add bridge=bridge comment=defconf interface=sfp-sfpplus8
/ip neighbor discovery-settings
set discover-interface-list=*2000003
/ip address
add address=192.168.88.1/24 interface=sfp-sfpplus2 network=192.168.88.0
add address=192.168.89.2/24 interface=sfp-sfpplus1 network=192.168.89.0
/ip route
add dst-address=0.0.0.0/0 gateway=192.168.89.1
/system package update
set channel=development
/system routerboard settings
set boot-os=router-os

I do see some problems here:
first, the configured MTU is not applied to HW offloading (or maybe i missed a magic config option).
second, frames are being dropped silently! No side of the communication knows that its sending packets way too big.
last, on my regular infrastructure, even with a MTU of 1450 I’m able to push 9.6 Gbits, so somethings is going wrong (slow) here.

Any suggestions are welcome.

Hi there,

Currently, L3HW supports only the default MTU of 1500. Adding hardware support of jumbo frames on layer 3 is on our todo list and will be addressed in the future. For now, please stick with the default settings: mtu=1500 l2mtu=1592. With the default MTU settings, CRS309 should be able to provide near to wire-speed hardware routing.

Thanks for the feedback!

Hi,

so i did try the same setup with the default MTU:

/interface ethernet
set [ find default-name=ether1 ] l2mtu=1592
set [ find default-name=sfp-sfpplus1 ] l2mtu=1592
set [ find default-name=sfp-sfpplus2 ] l2mtu=1592
set [ find default-name=sfp-sfpplus3 ] l2mtu=1592
set [ find default-name=sfp-sfpplus4 ] l2mtu=1592
set [ find default-name=sfp-sfpplus5 ] l2mtu=1592
set [ find default-name=sfp-sfpplus6 ] l2mtu=1592
set [ find default-name=sfp-sfpplus7 ] l2mtu=1592
set [ find default-name=sfp-sfpplus8 ] l2mtu=1592

but still can not get more than 3.3Gbits of throughput. Did try a couple of things including configuration variation and rebooting to have a clean re-init of the hw routing. Does anybody have this working @wirespeed and can post a config? Other suggestions on how to optimize / debug the behavior?

From another topic: http://forum.mikrotik.com/t/crs-3xx-l3-asic-performance-testing/143735/1

It seems that usual hardware has problems creating enough PPS to fill the link.

As i said in my first post, on my regular infrastructure I am able to push ~10Gbits with the same Endpoints and an MTU of 1450. It’s not like I would try to saturate the Link with 40byte packages.

I double-checked your config, and it looks like something is wrong with your neighbor discovery setting:

/ip neighbor discovery-settings
set discover-interface-list=*2000003

Try the default setting:

/ip/neighbor/discovery-settings 
set discover-interface-list=!dynamic

Yes, I have seen that as well and corrected earlier.
So you are saying nothing is basically wrong with that config? I’ll then put the device in an isolated environment, not connected to existing infrastructure and do some more debugging. I’ll come back if i figure something out.
Thanks for the help.

The rest of the config looks fine.

We did a quick bandwidth test of hardware routing on CRS309 and got ~9 Gbps at least. The packet receiving device was throttling at 100% CPU usage, so we cannot tell the precise number yet. Next week we will do the full test in the lab. Anyway, I see no reason why CRS309 shouldn’t achieve near 10 Gbps speed of hardware routing.

Are you sure that the test environment on your side is capable of handling 10Gbps traffic?

This is an iperf on my regular infrastructure:

$ iperf3 -P10 -p 12345 -c 172.17.20.10
...
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec   762 MBytes   639 Mbits/sec                  sender
[  4]   0.00-10.00  sec   762 MBytes   639 Mbits/sec                  receiver
[  6]   0.00-10.00  sec   758 MBytes   636 Mbits/sec                  sender
[  6]   0.00-10.00  sec   758 MBytes   636 Mbits/sec                  receiver
[  8]   0.00-10.00  sec   788 MBytes   661 Mbits/sec                  sender
[  8]   0.00-10.00  sec   788 MBytes   661 Mbits/sec                  receiver
[ 10]   0.00-10.00  sec  1.35 GBytes  1.16 Gbits/sec                  sender
[ 10]   0.00-10.00  sec  1.35 GBytes  1.16 Gbits/sec                  receiver
[ 12]   0.00-10.00  sec   728 MBytes   611 Mbits/sec                  sender
[ 12]   0.00-10.00  sec   728 MBytes   611 Mbits/sec                  receiver
[ 14]   0.00-10.00  sec  1.35 GBytes  1.16 Gbits/sec                  sender
[ 14]   0.00-10.00  sec  1.34 GBytes  1.16 Gbits/sec                  receiver
[ 16]   0.00-10.00  sec  1.30 GBytes  1.12 Gbits/sec                  sender
[ 16]   0.00-10.00  sec  1.30 GBytes  1.11 Gbits/sec                  receiver
[ 18]   0.00-10.00  sec  1.35 GBytes  1.16 Gbits/sec                  sender
[ 18]   0.00-10.00  sec  1.35 GBytes  1.16 Gbits/sec                  receiver
[ 20]   0.00-10.00  sec  1.39 GBytes  1.20 Gbits/sec                  sender
[ 20]   0.00-10.00  sec  1.39 GBytes  1.20 Gbits/sec                  receiver
[ 22]   0.00-10.00  sec  1.39 GBytes  1.20 Gbits/sec                  sender
[ 22]   0.00-10.00  sec  1.39 GBytes  1.20 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  11.1 GBytes  9.52 Gbits/sec                  sender
[SUM]   0.00-10.00  sec  11.1 GBytes  9.52 Gbits/sec                  receiver

$ tracert 172.17.20.10

Tracing route to 172.17.20.10
over a maximum of 30 hops:

  1    <1 ms    <1 ms    <1 ms  172.17.30.1
  2    <1 ms    <1 ms    <1 ms  172.17.20.10

and this is the same iperf between the same endpoints adding the CRS309 as an additional hop in between:

$ iperf3 -P10 -p 12345 -c 172.17.20.10
...
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec   304 MBytes   255 Mbits/sec                  sender
[  4]   0.00-10.00  sec   303 MBytes   254 Mbits/sec                  receiver
[  6]   0.00-10.00  sec   385 MBytes   323 Mbits/sec                  sender
[  6]   0.00-10.00  sec   385 MBytes   323 Mbits/sec                  receiver
[  8]   0.00-10.00  sec   390 MBytes   327 Mbits/sec                  sender
[  8]   0.00-10.00  sec   390 MBytes   327 Mbits/sec                  receiver
[ 10]   0.00-10.00  sec   393 MBytes   330 Mbits/sec                  sender
[ 10]   0.00-10.00  sec   393 MBytes   330 Mbits/sec                  receiver
[ 12]   0.00-10.00  sec   304 MBytes   255 Mbits/sec                  sender
[ 12]   0.00-10.00  sec   303 MBytes   254 Mbits/sec                  receiver
[ 14]   0.00-10.00  sec   304 MBytes   255 Mbits/sec                  sender
[ 14]   0.00-10.00  sec   304 MBytes   255 Mbits/sec                  receiver
[ 16]   0.00-10.00  sec   434 MBytes   364 Mbits/sec                  sender
[ 16]   0.00-10.00  sec   434 MBytes   364 Mbits/sec                  receiver
[ 18]   0.00-10.00  sec   424 MBytes   356 Mbits/sec                  sender
[ 18]   0.00-10.00  sec   424 MBytes   356 Mbits/sec                  receiver
[ 20]   0.00-10.00  sec   378 MBytes   317 Mbits/sec                  sender
[ 20]   0.00-10.00  sec   377 MBytes   317 Mbits/sec                  receiver
[ 22]   0.00-10.00  sec   312 MBytes   261 Mbits/sec                  sender
[ 22]   0.00-10.00  sec   312 MBytes   261 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  3.54 GBytes  3.04 Gbits/sec                  sender
[SUM]   0.00-10.00  sec  3.54 GBytes  3.04 Gbits/sec                  receiver

$ tracert 172.17.20.10

Tracing route to 172.17.20.10
over a maximum of 30 hops:

  1     *        *        *     Request timed out.
  2    <1 ms    <1 ms    <1 ms  192.168.89.1
  3    <1 ms    <1 ms    <1 ms  172.17.20.10

again the config of the CRS309:

[admin@MikroTik] > /export 
# oct/21/2020 21:40:03 by RouterOS 7.1beta2
# model = CRS309-1G-8S+
/interface bridge
add admin-mac=48:8F:7B:43:25:69 auto-mac=no comment=defconf name=bridge
/interface ethernet
set [ find default-name=ether1 ] l2mtu=1592
set [ find default-name=sfp-sfpplus1 ] l2mtu=1592
set [ find default-name=sfp-sfpplus2 ] l2mtu=1592
set [ find default-name=sfp-sfpplus3 ] l2mtu=1592
set [ find default-name=sfp-sfpplus4 ] l2mtu=1592
set [ find default-name=sfp-sfpplus5 ] l2mtu=1592
set [ find default-name=sfp-sfpplus6 ] l2mtu=1592
set [ find default-name=sfp-sfpplus7 ] l2mtu=1592
set [ find default-name=sfp-sfpplus8 ] l2mtu=1592
/interface ethernet switch
set 0 l3hw=yes
/interface lte apn
set [ find default=yes ] ip-type=ipv4
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/ip hotspot profile
set [ find default=yes ] html-directory=flash/hotspot
/ip vrf
add list=all name=main
/interface bridge port
add bridge=bridge comment=defconf interface=sfp-sfpplus3
add bridge=bridge comment=defconf interface=sfp-sfpplus4
add bridge=bridge comment=defconf interface=sfp-sfpplus5
add bridge=bridge comment=defconf interface=sfp-sfpplus6
add bridge=bridge comment=defconf interface=sfp-sfpplus7
add bridge=bridge comment=defconf interface=sfp-sfpplus8
/ip neighbor discovery-settings
set discover-interface-list=all
/ip address
add address=192.168.88.1/24 comment=defconf interface=sfp-sfpplus2 network=192.168.88.0
add address=192.168.89.2/24 interface=sfp-sfpplus1 network=192.168.89.0
/ip route
add dst-address=0.0.0.0/0 gateway=192.168.89.1
/system package update
set channel=development
/system routerboard settings
set boot-os=router-os

I have no problems reaching ~10gigs with my endpoints on my regular infrastructure it’s only when I add the CRS309, throughput drops.

Please send supout.rif file from the device to support@mikrotik.com, we will take a more detailed look, why there is a 3,3Gbps limitation.

Do nexthop IP addresses 172.17.30.1 and 192.168.89.1 belong to the same device? Looking at your traceroutes, in the first case packets go via 172.17.30.1 while in the second (CRS309) packets get routed via 192.168.89.1.

If those two IP are on the save device, then also please make sure there are no firewall or QoS rules that treat 172.17.30.0 and 192.168.89.0 subnets differently.

Hi,

@skylark: I’ve sent in a supout.rif.

@raimondsp: Yes, 172.17.30.1 and 192.168.89.1 are the same device. Both interfaces are configured the same in terms of firewall rules, no QoS happens on the device.

Another question thats popped up, if L3HW offloading is enabled, the CRS309 does not show up in the traceroute, as it’s not sending ICMP messages, is there a feature (planned) to enable correct ICMP behaviour?

Thanks again for your help.

Unfortunately, the hardware (switch chip) is incapable of sending ICMP replies. In order to get one, the packet needs to be sent to the CPU for processing. While getting ICMP replies is useful during the infrastructure test phase, I don’t think packets should be sent to CPU in the production environment, as it might open a hole for potential DDoS attacks.

Hi,
for IPv4 this is certainly true. For IPv6 (in the future) at least ICMP6 ‘packet too big’ will be needed. For IPv4 ‘packet too big’ would be nice as well, as the switch chip surely doesn’t do packet fragmentation, or does it?

Hi there,


  • You’re right - the switch chip doesn’t support IP fragmentation.
  • ICMP “Packet Too Big” will be implemented together with variable MTU support.
  • IPv6 is on the roadmap as well.

Thanks for the feedback!

Regarding your case, may I ask you a favor to repeat the test case, but this time monitor the connection list?

During the test, please run the command:

/ip/firewall/connection/print interval=1

and make sure the traffic generated by iperf does NOT appear in the list. When routing is fully performed by the hardware, the packets do not enter CPU at all, and, therefore, do not appear in the connection list. The connections still may appear in the list, but the rate should be 0.

The connection list can be viewed from Winbox as well.

Thanks in advance!

Hi,
I re-ran the test and during the test the connection list was empty as expected but throughput was even worse this time, around 2.2Gbit.
Then, just to be sure, i net-installed beta2 again to get a completely fresh setup, reapplied the config and wonder why, now i can push up to 9Gbit through the CRS309! AMAZING!
Its a bit odd that the reinstall fixed things (I did remember the discover-interface-list quirk, so i thought I’ll give it a try), but i don’t complain (;
Thanks for all the help, I hope for stable release soon so i can use this great feature in production.

Hi,

I’m glad to hear that the issue has been resolved. We will analyze your support.rif anyway to identify the possible issues.

Once again, thanks for the feedback, and I hope the CRS309 will serve you well.

.
was’nt shure if all that means the L3hw-offloading would include a vlan interface on top …
… now I am … tested with a crs312 … and indeed: fun !
.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

interface ethernet switch set 0 l3hw=yes

/interface bridge
add name=bridge1 vlan-filtering=no

/interface bridge port
add bridge=bridge1 interface=combo3 pvid=200
add bridge=bridge1 interface=combo4 pvid=300

/interface bridge vlan
add bridge=bridge1 tagged=bridge1 untagged=combo3 vlan-ids=200
add bridge=bridge1 tagged=bridge1 untagged=combo4 vlan-ids=300

/interface vlan
add interface=bridge1 name=vlan200 vlan-id=200
add interface=bridge1 name=vlan300 vlan-id=300

/ip address
add address=10.1.1.10/24 interface=vlan200
add address=1.1.1.10/24 interface=vlan300

/interface bridge set bridge1 vlan-filtering=yes

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

root@40deb:~# ip netns exec vsp2 iperf -s -V
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 MByte (default)
------------------------------------------------------------
[  4] local ::ffff:1.1.1.2 port 5001 connected with ::ffff:10.1.1.1 port 46818
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.1 sec  11.0 GBytes  9.40 Gbits/sec
###
root@40deb:~# ip netns exec vsp1 iperf -c 1.1.1.2 -w 64m -V
------------------------------------------------------------
Client connecting to 1.1.1.2, TCP port 5001
TCP window size:  128 MByte (WARNING: requested 64.0 MByte)
------------------------------------------------------------
[  3] local 10.1.1.1 port 46818 connected with 1.1.1.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  11.0 GBytes  9.47 Gbits/sec
root@40deb:~#

.
not working, when I put VRRP-interfaces aside to the vlan-interfaces and use these as gw-addresses … maybe not yet supported … or do I miss something in the vrrp config or somewhere else ?!
… still wirespeed when I use the vlan-if-IP’s as gateways …
.

[admin@crs312] /interface/vrrp> export verbose 
.
# model = CRS312-4C+8XG
# serial number = xxxxxxxxxxxx
/interface vrrp
add arp=enabled arp-timeout=auto authentication=none disabled=no group-master="" interface=vlan200 interval=1s mtu=1500 name=\
    vrrp200 on-backup="" on-master="" password="" preemption-mode=no priority=100 remote-address=0.0.0.0 \
    sync-connection-tracking=yes v3-protocol=ipv4 version=3 vrid=20
add arp=enabled arp-timeout=auto authentication=none disabled=no group-master="" interface=vlan300 interval=1s mtu=1500 name=\
    vrrp300 on-backup="" on-master="" password="" preemption-mode=no priority=100 remote-address=0.0.0.0 \
    sync-connection-tracking=yes v3-protocol=ipv4 version=3 vrid=30
[admin@crs312] /interface/vrrp> 
.
######
.
root@40deb:~# ip netns exec vsp2 iperf -s -V
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 MByte (default)
------------------------------------------------------------
[  4] local ::ffff:10.1.1.2 port 5001 connected with ::ffff:1.1.1.1 port 34780
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-11.7 sec   296 MBytes   213 Mbits/sec
###
root@40deb:~# ip netns exec vsp1 iperf -c 10.1.1.2 
------------------------------------------------------------
Client connecting to 10.1.1.2, TCP port 5001
TCP window size: 64.0 MByte (default)
------------------------------------------------------------
[  3] local 1.1.1.1 port 34780 connected with 10.1.1.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   296 MBytes   247 Mbits/sec
root@40deb:~#