Community discussions

MikroTik App
 
psybernoid
just joined
Topic Author
Posts: 12
Joined: Sat Oct 22, 2011 11:32 am

RB4011 VLAN Routing Performance

Mon Mar 29, 2021 12:13 pm

Hi.

I've an RB4011 connected to an Aruba 6300M by means of an spf+ DAC negotiating at 10Gbps.

Over the weekend, I decided to put some 2.5Gbps NICs into my servers in order to make use of the extra speeds the ports on the 6300M provide. However, I noticed when performing an iperf3 test across VLANs, performance isn't great.
[  5] local 10.100.0.81 port 51304 connected to 172.16.11.106 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  65.3 MBytes   548 Mbits/sec    8    355 KBytes
[  5]   1.00-2.00   sec  63.5 MBytes   533 Mbits/sec   12    383 KBytes
[  5]   2.00-3.00   sec  64.5 MBytes   541 Mbits/sec    5    294 KBytes
[  5]   3.00-4.00   sec  63.5 MBytes   533 Mbits/sec    6    331 KBytes
[  5]   4.00-5.00   sec  64.4 MBytes   541 Mbits/sec    2    363 KBytes
[  5]   5.00-6.00   sec  64.4 MBytes   540 Mbits/sec    9    276 KBytes
[  5]   6.00-7.00   sec  65.4 MBytes   549 Mbits/sec    4    320 KBytes
[  5]   7.00-8.00   sec  65.2 MBytes   547 Mbits/sec    1    355 KBytes
[  5]   8.00-9.00   sec  65.3 MBytes   548 Mbits/sec   32    269 KBytes
[  5]   9.00-10.00  sec  64.5 MBytes   541 Mbits/sec    5    307 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   646 MBytes   542 Mbits/sec   84             sender
[  5]   0.00-10.00  sec   645 MBytes   541 Mbits/sec                  receiver
However, when the same iperf3 server is moved to the same VLAN as the client

Connecting to host 10.100.0.171, port 5201
[  5] local 10.100.0.81 port 45380 connected to 10.100.0.171 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   283 MBytes  2.37 Gbits/sec    0    752 KBytes
[  5]   8.00-9.00   sec   280 MBytes  2.35 Gbits/sec    0   1.13 MBytes
[  5]   9.00-10.00  sec   280 MBytes  2.35 Gbits/sec    0   1.13 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.73 GBytes  2.35 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  2.73 GBytes  2.35 Gbits/sec                  receiver
Thinking this might be a bridging issue, I moved all VLANs off the bridge and put them directly on to the sfpplus1 interface, then removed the sfpplus1 interface from the bridge. I then get these results, with inter-VLAN

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  93.8 MBytes   787 Mbits/sec   17    294 KBytes
[  5]   1.00-2.00   sec  92.4 MBytes   775 Mbits/sec    5    396 KBytes
[  5]   2.00-3.00   sec  92.8 MBytes   778 Mbits/sec   21    373 KBytes
[  5]   3.00-4.00   sec  92.2 MBytes   773 Mbits/sec   24    362 KBytes
[  5]   4.00-5.00   sec  92.5 MBytes   776 Mbits/sec   79    348 KBytes
[  5]   5.00-6.00   sec  93.6 MBytes   785 Mbits/sec   21    337 KBytes
[  5]   6.00-7.00   sec  94.0 MBytes   788 Mbits/sec   26    329 KBytes
[  5]   7.00-8.00   sec  93.4 MBytes   783 Mbits/sec   18    318 KBytes
[  5]   8.00-9.00   sec  92.7 MBytes   778 Mbits/sec   18    296 KBytes
[  5]   9.00-10.00  sec  94.2 MBytes   790 Mbits/sec   10    318 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   932 MBytes   781 Mbits/sec  239             sender
[  5]   0.00-10.00  sec   930 MBytes   780 Mbits/sec                  receiver
If I then turn off connection tracking, I get:
[  5]   0.00-1.00   sec   116 MBytes   976 Mbits/sec   29    362 KBytes                                                 [  5]   1.00-2.00   sec   120 MBytes  1.01 Gbits/sec    5    305 KBytes                                                 
[  5]   2.00-3.00   sec   120 MBytes  1.01 Gbits/sec   22    331 KBytes                                                 [  5]   3.00-4.00   sec   119 MBytes   999 Mbits/sec    4    342 KBytes                                                 
[  5]   4.00-5.00   sec   119 MBytes   998 Mbits/sec   35    385 KBytes                                                 [  5]   5.00-6.00   sec   119 MBytes   998 Mbits/sec   25    417 KBytes                                                 
[  5]   6.00-7.00   sec   118 MBytes   994 Mbits/sec    9    434 KBytes                                                 [  5]   7.00-8.00   sec   119 MBytes   998 Mbits/sec    6    313 KBytes                                                 
[  5]   8.00-9.00   sec   118 MBytes   993 Mbits/sec    5    344 KBytes                                                 [  5]   9.00-10.00  sec   118 MBytes   990 Mbits/sec   11    383 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.16 GBytes   996 Mbits/sec  151             sender
[  5]   0.00-10.00  sec  1.16 GBytes   994 Mbits/sec                  receiver
Which is better, but then I get no WAN traffic.

I'm pretty sure this is a config on my part, I'm just at a loss on where to look.

I should also note that only 2 ports are connected on the RB4011. ether1 is for ISP (PPPoE) and the sfpplus port. Everything else connects to the switch.
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 11445
Joined: Thu Mar 03, 2016 10:23 pm

Re: RB4011 VLAN Routing Performance

Mon Mar 29, 2021 4:40 pm

What does /tool profile cpu=all show during ongoing iperf test? I wouldn't be surprised if only single CPU core gets loaded. How does running multiple parallel streams (iperf -P 8 ...) affect overall throughput?

The thing is that when routing, ROS will use single CPU core for all packets belonging to same connection (to reduce chance of out-of-order delivery), but on multi-core devices this often reduces throughput. With multiple concurrent connections overall throughput increases, but single connections are usually quite much slower than theoretical maximum. With RB4011 you should be able to get transfer speeds exceeding 2 Gbps.
 
psybernoid
just joined
Topic Author
Posts: 12
Joined: Sat Oct 22, 2011 11:32 am

Re: RB4011 VLAN Routing Performance

Mon Mar 29, 2021 5:10 pm

You're quite correct. When running those tests, I had 1 CPU hitting 100%. When running iperf3 with the -P 4 parameter, I get pretty close to the 2.5Gbps target and the CPU (in total) goes to about 75%

Your explanation does make sense though, so thanks for that. In practice, I doubt very much I'll really notice as all the 2.5Gbps NICs are mostly on the same subnet.

As a test, I just did an inter-VLAN iperf test with 8 threads and copied a 5GB ISO to another machine, again on a different subnet and I saw the sfpplus interface reaching ~3Gbps. So I guess everything's working as intended. It's a shame that a single CPU thread limits it in such a way though.
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 11445
Joined: Thu Mar 03, 2016 10:23 pm

Re: RB4011 VLAN Routing Performance

Mon Mar 29, 2021 6:43 pm

It's a shame that a single CPU thread limits it in such a way though.

ARM-based routers (RB4011, CCR2004) are quite good actually, their single-core performance is not too bad. Imagine your disappointment if you used a CCR1072 instead ... on paper it's got tons of umph, but in your case it'd be much worse than RB4011. It would be a completely different story when speaking about core- or edge- router of a mid-sized ISP though.
 
User avatar
jbl42
Member Candidate
Member Candidate
Posts: 214
Joined: Sun Jun 21, 2020 12:58 pm

Re: RB4011 VLAN Routing Performance

Tue Mar 30, 2021 3:00 pm

It's a shame that a single CPU thread limits it in such a way though..
This is in the nature of TCP: TCP guarantees applications running on top of TCP sockets that all bytes are received in the exact same order as the were sent. Even if the packets transporting those bytes get reordered during transmission. So if a TCP reciver gets incoming packets in wrong order, it has to keep buffering further incoming traffic while waiting for the "bypassed" packets to be able to reconstruct the original byte stream. To avoid the RX buffer from overflowing, the receiver slows down the sender (using TCP flow control) until all packets required to reconstruct the original byte stream have been received.

If different packets belonging to the same connection are processed/forwarded by different CPU cores, there is now way to avoid that a core with less load forwards its packets faster. This causes packet reordering slowing down TCP connections or even cause packet loss. It would increase the packets forwarded per second, but decrease the effective TCP payload bandwidth and harm connection reliability.

There are uses cases for routers with a small number of powerful cores (like RB4011/RB1100) and also for routers with a high number of not so powerful cores (like CCR10xx). If the requirement is to have s small number of connections with high individual throughput (file downloads, backups to remote sites) less CPU cores with more individual power is better. If it is about having many parallel connections with relative low individual bandwidth (VoIP, man users browsing the net, etc) many cores with less power per core work better.

If you want routing at full aggregated wire speed independent of number of parallel connections you need gear with HW accelerated routing in ASICs. Such gear is available from the usual suspects (Cisco, Juniper, etc) but adds at least a zero if not two zeroes on the price tag compared to a RB4011.

Who is online

Users browsing this forum: JohnTRIVOLTA, mozerd, rextended, simonefil, tangent and 80 guests