Community discussions

MikroTik App
 
arclength
just joined
Topic Author
Posts: 4
Joined: Mon Dec 28, 2020 8:13 pm

Approximately 5s delay in TCP connections when using a static route via an address on bridge

Mon Dec 28, 2020 8:59 pm

My home router is a hEX RB750Gr3 running RouterOS v6.48. The LAN is 192.168.20.0/24. I'm running WireGuard on a Linux server (eth0: 192.168.20.10 wg0:192.168.21.1) for "roadwarrior" access to my home network as well as linking up with off-site backup hosts. Clients access this Linux server from the internet through a port forward on the RB750Gr3.

If it's helpful, here's a simplified diagram of my network.
network-diagram.png
The WireGuard network is 192.168.21.0/24. Since I need to access WireGuard hosts from the LAN, I've dispensed with the usual masquerade ifup/ifdown iptables rules on the Linux host and added a static route on the RB750Gr3 for 192.168.21.0/24 via 192.168.20.10. This setup mostly works, but connections take like 5 seconds to start moving data from 192.168.20.0/24 to 192.168.21.0/24 after the initial TCP handshake. This problem does not arise with traffic to and from the internet.

Consider the following example
$ time curl http://192.168.21.1 > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   612  100   612    0     0     91      0  0:00:06  0:00:06 --:--:--   175

real	0m6.742s
user	0m0.020s
sys	0m0.038s
$ sudo ip route add 192.168.21.0/24 via 192.168.20.10
$ time curl http://192.168.21.1 > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   612  100   612    0     0   199k      0 --:--:-- --:--:-- --:--:--  298k

real	0m0.046s
user	0m0.022s
sys	0m0.021s
The WireGuard server is also running an HTTP server for internal use. To take WireGuard out of the picture, we're sending an HTTP GET request to the WireGuard server's wg0 interface (192.168.21.1) rather than something like the offsite backup server (192.168.21.10). We identify the routing on the RB750Gr3 as the culprit by adding a static route on our testing host which bypasses the RB750Gr3. When we do this, the delay goes away.

Here's a screenshot of a representative packet capture. The TCP handshake goes fine, but things go downhill after the testing host sends the HTTP GET.
pcap-problem.png
A lightly redacted copy of my configuration follows:
# dec/28/2020 12:47:30 by RouterOS 6.48
# software id = REDACTED
#
# model = RouterBOARD 750G r3
# serial number = REDACTED
/ip pool
add name=dhcp ranges=192.168.20.10-192.168.20.254
/ip dhcp-server
add address-pool=dhcp disabled=no interface=bridge name=defconf
/ip address
add address=192.168.20.1/24 comment=defconf interface=ether2 network=192.168.20.0
/ip dhcp-client
add comment=defconf disabled=no interface=ether1
/ip dhcp-server lease
====REDACTED====
/ip dhcp-server network
add address=192.168.20.0/24 comment=defconf gateway=192.168.20.1 netmask=24
/ip dns
set allow-remote-requests=yes
/ip dns static
add address=192.168.20.1 comment=defconf name=router.lan
/ip neighbor discovery-settings
set discover-interface-list=LAN
/ip firewall filter
add action=accept chain=input comment="defconf: accept established,related,untracked" connection-state=established,related,untracked
add action=drop chain=input comment="defconf: drop invalid" connection-state=invalid
add action=accept chain=input comment="defconf: accept ICMP" protocol=icmp
add action=accept chain=input comment="defconf: accept to local loopback (for CAPsMAN)" dst-address=127.0.0.1
add action=drop chain=input comment="defconf: drop all not coming from LAN" in-interface-list=!LAN
add action=fasttrack-connection chain=forward comment="defconf: fasttrack" connection-state=established,related
add action=accept chain=forward comment="defconf: accept established,related, untracked" connection-state=established,related,untracked
add action=drop chain=forward comment="defconf: drop invalid" connection-state=invalid
add action=drop chain=forward comment="defconf: drop all from WAN not DSTNATed" connection-nat-state=!dstnat connection-state=new in-interface-list=WAN
/ip firewall nat
add action=masquerade chain=srcnat comment="defconf: masquerade" ipsec-policy=out,none out-interface-list=WAN
add action=dst-nat chain=dstnat comment=wireguard dst-port=51820 protocol=udp to-addresses=192.168.20.10 to-ports=51820
/ip route
add distance=1 dst-address=192.168.21.0/24 gateway=192.168.20.10 pref-src=192.168.20.1
/ip service
====REDACTED====
You do not have the required permissions to view the files attached to this post.
Last edited by arclength on Thu Dec 31, 2020 6:16 am, edited 1 time in total.
 
User avatar
erkexzcx
Member Candidate
Member Candidate
Posts: 263
Joined: Mon Oct 07, 2019 11:42 pm

Re: Approximately 5s delay in TCP connections when using a static route

Wed Dec 30, 2020 9:33 pm

Seems your target destination (of your static route) is part of existing bridge. I once had similar issue and all was fixed when I enabled bridge firewall:

/interface bridge settings set use-ip-firewall=yes

It just fixed it for me. Maybe someone has better ways to fix this kind of issue.
 
arclength
just joined
Topic Author
Posts: 4
Joined: Mon Dec 28, 2020 8:13 pm

Re: Approximately 5s delay in TCP connections when using a static route

Wed Dec 30, 2020 10:45 pm

Thanks for that!

Yes, the target destination is part of an existing bridge.

Enabling the ip firewall for the bridge does resolve the issue (progress!) in that latency in traffic between 192.168.20.0/24 and 192.168.21/24 goes away. Unfortunately, there are the following side effects:
  • I can consistently get iperf3 results of 920-940 Mbps across hosts on 192.168.20.0/24. When I turn on the ip firewall for the bridge and the traffic transverses the hEX's switch, speeds go down to 440 Mbps. If I take the hEX's onboard switch out of the picture by connecting a separate switch to a bridge port and connecting the rest of the network to that switch, speeds go back to what they should be. If anything, they get more consistent, with all results being 941±1 Mb/s.
  • I've got a residential gigabit service. Speedtest.net results to a particular server consistently goes from about 800 Mbps up/850 Mbps down to 800 Mbps up/440 Mbps down when I enable the IP firewall on bridge. Replacing the hEX's role as a switch as previously described seems to improve upload speed by 50 Mb/s.
Other approaches are welcome, but I think the IP firewall needs to be enabled on the bridge for this to work. I wonder why it works at all with it disabled. But hey, I understand just a bit more about how RouterOS does things now.

As I understand it, I've got the following options:
  1. Unassign a port from the hEX's bridge and do a dedicated run from the server to the hEX on that port on a new point-to-point network. I worry that this will produce the same kinds of slowdowns on the WAN link because it's still an interface that the IP firewall has to run on, exacerbated by the fact that the WAN port and the the point-to-point link will share a link to the CPU.
  2. VLAN nonsense to accomplish the same as #1.
  3. Get a more powerful router than the hEX that can handle running the IP firewall on another interface. Was already on my radar as I'd like to get IPv6 running, but the hEX's performance has been so poor when I've tried it in the past that it's disabled for now.
Or maybe there is another way to accomplish this without enabling ip firewall on the bridge?
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 11445
Joined: Thu Mar 03, 2016 10:23 pm

Re: Approximately 5s delay in TCP connections when using a static route  [SOLVED]

Thu Dec 31, 2020 12:37 am

I'm pretty sure you're victim of "routing triangle": when 192.168.20.9/24 host initiates connection towards 192.168.21.0/24, it sends packet to its default gateway (192.168.20.1). That MT takes a note in its connection tracking state and forwards packet to next hop router (WG concentrator at 192.168.20.10). Then the packet proceeds to the destination. Destination replies, packet arrives at WG concentrator which notices that destination address is in directly connected subnet and delivers it directly. Reply packet thus bypasses main router and its connection tracking machine can't update connection state properly. Next packet, sent from 192.168.20.0/24 host, is then out of perceived connection state and is dropped due to being invalid.

The solution is to disable connection tracking for connections between the two subnets (and hence the firewall filter rule which accepts untracked packets will trigger). Or introduce a new subnet, used solely for connection between the main router and WG concentrator which means WG concentrator will have to pass replies to main router making main router's connection tracking machine happy. Or stop droping invalid packets. Or something else which will bypass invalid connection state.
 
arclength
just joined
Topic Author
Posts: 4
Joined: Mon Dec 28, 2020 8:13 pm

Re: Approximately 5s delay in TCP connections when using a static route

Thu Dec 31, 2020 2:16 am

Thanks you mkx!

I added rule 3, so my forward chain now looks like
Flags: X - disabled, I - invalid, D - dynamic 
 0  D ;;; special dummy rule to show fasttrack counters
      chain=forward action=passthrough 

 1    ;;; defconf: fasttrack
      chain=forward action=fasttrack-connection connection-state=established,related 

 2    ;;; defconf: accept established,related, untracked
      chain=forward action=accept connection-state=established,related,untracked log=no log-prefix="" 

 3    chain=forward action=accept connection-state=invalid,new src-address=192.168.20.0/24 dst-address=192.168.21.0/24 in-interface=bridge log=no 
      log-prefix="" 

 4    ;;; defconf: drop invalid
      chain=forward action=drop connection-state=invalid 

 5    ;;; defconf: drop all from WAN not DSTNATed
      chain=forward action=drop connection-state=new connection-nat-state=!dstnat in-interface-list=WAN 
With this in place, I can set use-ip-firewall=no for my bridge, I don't see the huge latency problems I saw before, and the throughput problems I've seen with use-ip-firewall=yes are gone. I'm also using the hEX solely as a bridge because I was going to put a switch in after the hEX anyway (add some 10g links). I think my problem would be solved if I went back to using the built in hEX switch, but I'm not going to bother testing.
 
User avatar
erkexzcx
Member Candidate
Member Candidate
Posts: 263
Joined: Mon Oct 07, 2019 11:42 pm

Re: Approximately 5s delay in TCP connections when using a static route

Thu Dec 31, 2020 2:31 am

That's something I learnt too. :)
 
User avatar
shalak
newbie
Posts: 41
Joined: Sat Aug 24, 2019 11:47 am

Re: Approximately 5s delay in TCP connections when using a static route via an address on bridge

Mon Jan 29, 2024 5:36 pm

I have a very similar problem:
- main MT at 10.0.0.1
- main LAN at 10.0.0.0/24
- my machine at 10.0.0.101
- roadwarrior at 10.0.0.210
- readwarrior LAN at 10.100.100.0/24

Until I added the `chain=forward action=accept connection-state=invalid,new` between those two, I was experiencing consistent 5s delays, with lots of retransmission errors detected by wireshark.

Now it works without delay, however I'm still noticing some issues in pcaps (in this example ssh from 10.0.0.101 to 10.100.100.100):
Image

I'm wondering if this is something I should worry about?
 
User avatar
baragoon
Member Candidate
Member Candidate
Posts: 298
Joined: Thu Jan 05, 2017 10:38 am
Location: Kyiv, UA
Contact:

Re: Approximately 5s delay in TCP connections when using a static route via an address on bridge

Mon Jan 29, 2024 5:42 pm

if this is something I should worry about?
about necroposting... open a new thread
 
User avatar
shalak
newbie
Posts: 41
Joined: Sat Aug 24, 2019 11:47 am

Re: Approximately 5s delay in TCP connections when using a static route via an address on bridge

Mon Jan 29, 2024 6:24 pm

about necroposting... open a new thread
The topic is still valid, the solution presented here as well - I didn't see any reason not to reply, especially since the topic is not locked. I don't want to create unnecessary posts and introduce clutter. Unless that's the forum rule, if so - my apologies, I was not aware of it.
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 11445
Joined: Thu Mar 03, 2016 10:23 pm

Re: Approximately 5s delay in TCP connections when using a static route via an address on bridge

Mon Jan 29, 2024 6:40 pm

... I'm still noticing some issues in pcaps

You may want to analyze these particular packets in depth. It seems that there was some out-of-order delivery. At the same time it mentions "reassembled PDUs" while size is larger than 1500 bytes. This might be due to wireguard's overhead ... which might mean that sending peer of wireguard had to fragment packet to fit underlying MTU. When it comes to fragmented packets, wireshark sometimes freaks out for no real reason in such cases. Even if there was actual out-of-order delivery (which is not very common), this is not a problem for TCP, TCP stack guarantees in-order delivery to higher layers, the only big problem is reduced speed (out-of-order delivery may trigger retransmissions and TCP window shrinkage).

You'd have to analyze further to determine if out-of-order delivery is due to MT firewall or due to other issues in the network.

Who is online

Users browsing this forum: mkx and 51 guests