TCP retransmissions & low performance while bridging

While playing with my setup I was able to isolate a weird (to me at least) issue. I’m running the tests on hEX (Gr3). When I try to bridge two ports with VLANs on them like so:

# model = RouterBOARD 750G r3

/interface ethernet
set [ find default-name=ether1 ] name=ether1-trunk
set [ find default-name=ether2 ] name=ether2-intranet
/interface bridge
add name=bridge-trunk protocol-mode=none
add name=bridge30-intranet protocol-mode=none
/interface vlan
add interface=ether1-trunk name=ether1-v30 vlan-id=30
/interface bridge port
add bridge=bridge30-intranet interface=ether1-v30 trusted=yes
add bridge=bridge30-intranet interface=ether2-intranet
add bridge=bridge-trunk interface=ether1-trunk
add bridge=bridge-trunk interface=ether3 trusted=yes

(during each of the tests only a single bridge was active)

I get a decent performance using iperf3 but still retramissions (but ALWAYS in the rx direction; client connected to ether2):

$ iperf3 --bidir -c SERVER

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  7][TX-C]   0.00-10.00  sec   924 MBytes   775 Mbits/sec                  sender
[  7][TX-C]   0.00-10.00  sec   923 MBytes   774 Mbits/sec                  receiver
[  9][RX-C]   0.00-10.00  sec  1.05 GBytes   905 Mbits/sec  475             sender
[  9][RX-C]   0.00-10.00  sec  1.05 GBytes   902 Mbits/sec                  receiver

While using hEX to split trunk to an access port like so:

The issue is even bigger with the performance being absymal… but ONLY in the rx direction (client connected to ether3 with vlan30 configured on the client):

$ iperf3 --bidir -c SERVER

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  7][TX-C]   0.00-10.00  sec   991 MBytes   831 Mbits/sec                  sender
[  7][TX-C]   0.00-10.00  sec   990 MBytes   831 Mbits/sec                  receiver
[  9][RX-C]   0.00-10.00  sec   933 MBytes   783 Mbits/sec  612             sender
[  9][RX-C]   0.00-10.00  sec   930 MBytes   780 Mbits/sec                  receiver

While not running bidirectional tests the speeds are … acceptable as for what it is:

[ ID] Interval           Transfer     Bitrate
[  7]   0.00-10.00  sec  1.07 GBytes   923 Mbits/sec                  sender
[  7]   0.00-10.00  sec  1.07 GBytes   923 Mbits/sec                  receiver

Is this related to a weird block diagram with two separate 1Gb/s links in hEX?

I prefer (and I thought it was recommended) to use a single bridge with filters, see also this great tutorial:
http://forum.mikrotik.com/t/using-routeros-to-vlan-your-network/126489/1

Not sure if it is completely related, but it is at least worth the try.

Re-Check Actual MTU and L2 MTU on all ethernet interfaces/bridges/VLANs. Make sure all are the same.

Neat, I actually never used VLAN filtering on bridge. Results I will say are comparable, but the CPU usage is lower:

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  7][TX-C]   0.00-10.00  sec  1000 MBytes   839 Mbits/sec                  sender
[  7][TX-C]   0.00-10.00  sec  1000 MBytes   838 Mbits/sec                  receiver
[  9][RX-C]   0.00-10.00  sec   717 MBytes   602 Mbits/sec  430             sender
[  9][RX-C]   0.00-10.00  sec   715 MBytes   600 Mbits/sec                  receive

However, I wonder what actual advantages bridge filtering with 3 VLANs over 3 bridges and VLANs on interfaces?


That was the first thing I've checked. The reality is even thou the CPU is not peaking the routing uses a single thread and the hEX CPU is well... very low powered (and probably picked originally due to HW AES for IPSec & co.). This is more just an exercise (because lets be real, such task should be done using a switch chip) which gave me a "huh?" while looking at the data.

From philosophical point of view, when unit is configured with vlan-filtering on single bridge, it acts like a smart switch while with 3 “dumb” bridges it looks like a stack of dumb switches with many wires in between. Meaning: setup is much simpler to read (and understand) and thus less prone to errors and easier to troubleshoot.

Technically: when using multiple untagged bridges device has to strip VLAN tag off every single packet on ingress and add it back on egress while with VLAN-aware bridge this doesn’t happen. Plus some devices (CRS3xx) can actually HW offload the single-bridge config while it can’t do it for the “spaghetti” of multiple bridges.

When you have so few retransmissions (430 retransmissions ~ 645K while transmitting 1 GB is only very tiny) it does not indicate a setup problem like MTU, it just means you have some slight bottleneck and the transmitting side is overfeeding the link.
The retransmissions in TCP are actually used to control the pace of transmission. There is no cause for alarm.