srcnat stops working after running for a little while when SSTP reconnects

I need a little help with this:

This is a configuration for a MikroTik that runs at a clients location, connected via SSTP (needed because of the client firewall,) with static routes (and NAT) for some subnets through the SSTP connection.

I am having a problem with srcnat ceasing to work after running for a time on more than one client router when SSTP drops and reconnects. If I reboot, it starts working again. If I add fast-forward=yes to the bridge, it starts again, though I am not sure for how long. I am still testing that one.

Here’s what’s happening via two mangle rules that are set to log:

Log entries from Broken State:
prerouting: in:Lan-Bridge out:(unknown 0), connection-state:new src-mac 54:b2:03:83:d7:78, proto UDP, 172.16.96.200:10259->10.20.0.100:10259, len 7231
postrouting: in:Lan-Bridge out:sstp-out1, connection-state:new src-mac 54:b2:03:83:d7:78, proto UDP, 172.16.96.200:10259->10.20.0.100:10259, len 7231

Log entries from Working State:
prerouting: in:Lan-Bridge out:(unknown 0), connection-state:new,snat src-mac 48:21:0b:25:fc:a1, proto UDP, 172.16.96.200:10259->10.20.0.100:10259, NAT (172.16.96.200:10259->10.20.138.20:10259)->10.20.0.100:10259, len 6897
postrouting: in:Lan-Bridge out:sstp-out1, connection-state:new,snat src-mac 48:21:0b:25:fc:a1, proto UDP, 172.16.96.200:10259->10.20.0.100:10259, NAT (172.16.96.200:10259->10.20.138.20:10259)->10.20.0.100:10259, len 6897

I thought it may be an issue with connection tracking, but there are minimal entries when it is in the failed state. Ex, there were only 16 connections in the last failed one I looked at.

I enabled fast forward on the bridge, but am not sure that’s the fix. The systems I did it on started working again, but I am just not certain of the work-around. I also do not like now knowing exactly what the problem is.

Any ideas?

Here’s the relevant part configuration on the client routers (some entries have been redacted):

# 2024-08-19 09:06:01 by RouterOS 7.15.2
# software id = [redacted]
#
# model = RB450Gx4
# serial number = [redacted]

/interface bridge
add arp=proxy-arp comment="Bridge containing ether3, ether4, and ether5." fast-forward=no name=Lan-Bridge \
    port-cost-mode=short protocol-mode=stp
    
### NOTE: fast-forward=yes makes srcnat start working again.  Unknown for how long.

/interface ethernet
set [ find default-name=ether1 ] comment=\
    "Internet Connection (Req DHCP by default)"
set [ find default-name=ether2 ] comment=\
    "Secondary Internal LAN (if applicable)"
set [ find default-name=ether3 ] comment="Lan-Bridge Member"
set [ find default-name=ether4 ] comment="Lan-Bridge Member"
set [ find default-name=ether5 ] comment="Lan-Bridge Member" poe-out=off

/ip pool
add name=lan-dhcp ranges=172.16.96.10-172.16.96.99

/ppp profile
add change-tcp-mss=yes dns-server=Y.Y.Y.1 name=sstp-vpn remote-address=\
    Y.Y.Y.1 use-compression=no use-encryption=yes use-mpls=no use-upnp=no
set *FFFFFFFE use-compression=no use-mpls=no use-upnp=no

/interface sstp-client
add authentication=chap,mschap1,mschap2 connect-to=X.X.X.X disabled=no \
    http-proxy=0.0.0.0 name=sstp-out1 pfs=yes profile=sstp-vpn proxy-port=0 \
    tls-version=only-1.2 user=USERNAME \
    verify-server-address-from-certificate=no

/interface bridge port
add bridge=Lan-Bridge ingress-filtering=no interface=ether3 \
    internal-path-cost=10 path-cost=10 trusted=yes
add bridge=Lan-Bridge ingress-filtering=no interface=ether4 \
    internal-path-cost=10 path-cost=10 trusted=yes
add bridge=Lan-Bridge ingress-filtering=no interface=ether5 \
    internal-path-cost=10 path-cost=10 trusted=yes

/ip firewall connection tracking
set udp-timeout=10s

/ip settings
set max-neighbor-entries=8192

/interface list member
add interface=Lan-Bridge list=INT-LAN
add interface=ether1 list=WAN
add interface=sstp-out1 list=VPN
add interface=ether2 list=Client-Lan
add interface=ether1 list=Client-Lan

/interface sstp-server server
set certificate=*5

/ip address
add address=172.16.96.1/24 interface=Lan-Bridge network=172.16.96.0

/ip dhcp-client
add default-route-distance=4 interface=ether1 use-peer-dns=no use-peer-ntp=no

/ip dhcp-server
add address-pool=lan-dhcp comment=\
    "Internal Bridged Ports ether3, ether4 and ether5." interface=\
    Lan-Bridge lease-time=10m name="Internal LAN"

/ip dhcp-server network
add address=172.16.96.0/24 dns-server=172.16.96.1 gateway=172.16.96.1

/ip firewall mangle
add action=change-mss chain=forward comment=\
    "Change MSS to handle cloud provider System." in-interface=ether1 new-mss=1200 \
    passthrough=yes protocol=tcp tcp-flags=syn
add action=change-mss chain=forward comment=\
    "Change MSS to handle cloud provider System." new-mss=1200 out-interface=ether1 \
    passthrough=yes protocol=tcp tcp-flags=syn

/ip firewall nat
add action=masquerade chain=srcnat comment=\
    "VPN: masquerade traffic to the vpn server" out-interface=sstp-out1
add action=masquerade chain=srcnat comment=\
    "WAN- masquerade traffic to the wan interfaces." out-interface-list=WAN
add action=masquerade chain=srcnat comment=\
    "CLIENT-LAN - masquerade traffic to second client network. " \
    out-interface=ether2
add action=dst-nat [redacted - this, and the entries below are for accessing internal systems.]
add action=dst-nat [redacted]
add action=dst-nat [redacted]
add action=dst-nat [redacted]
add action=dst-nat [redacted]
add action=dst-nat [redacted]
add action=dst-nat [redacted]
add action=dst-nat [redacted]
add action=dst-nat [redacted]
add action=dst-nat [redacted]
add action=dst-nat [redacted]
add action=dst-nat [redacted]
add action=dst-nat [redacted]

/ip route
add disabled=no distance=2 dst-address=A.B.C.0/23 gateway=sstp-out1 \
    routing-table=main scope=30 suppress-hw-offload=no target-scope=10
add disabled=no distance=2 dst-address=A.B.D.0/24 gateway=sstp-out1
add disabled=no dst-address=A.B.E.0/23 gateway=sstp-out1
add disabled=no distance=2 dst-address=A.B.F.0/24 gateway=sstp-out1 \
    pref-src=0.0.0.0 routing-table=main scope=30 suppress-hw-offload=no \
    target-scope=10
add disabled=no distance=2 dst-address=A.B.G.0/24 gateway=sstp-out1 \
    pref-src="" routing-table=main scope=30 suppress-hw-offload=no \
    target-scope=10
add disabled=no distance=1 dst-address=A.B.H.0/24 gateway=sstp-out1 \
    routing-table=main scope=30 suppress-hw-offload=no target-scope=10

Here’s a little more info about what is happening. It looks like the SSTP connection drops and when it reconnects it no longer is able to preform srcnat on UDP.

Applicable Logs & info from three client routers that failed at the same time:


From System with ip ending in 71:

Sep/16/2024 20:53:19 sstp,ppp,info sstp-out1: terminating… - connection timeout
Sep/16/2024 20:53:19 sstp,ppp,info sstp-out1: disconnected
Sep/16/2024 20:53:19 sstp,ppp,info sstp-out1: initializing…
Sep/16/2024 20:53:19 sstp,ppp,info sstp-out1: connecting…
Sep/16/2024 20:53:19 sstp,ppp,info sstp-out1: authenticated
Sep/16/2024 20:53:19 sstp,ppp,info sstp-out1: connected


From System ending in 49:

Sep/16/2024 20:54:08 sstp,ppp,info sstp-out1: terminating… - connection timeout
Sep/16/2024 20:54:08 sstp,ppp,info sstp-out1: disconnected
Sep/16/2024 20:54:08 sstp,ppp,info sstp-out1: initializing…
Sep/16/2024 20:54:08 sstp,ppp,info sstp-out1: connecting…
Sep/16/2024 20:54:09 sstp,ppp,info sstp-out1: authenticated
Sep/16/2024 20:54:09 sstp,ppp,info sstp-out1: connected


From System ending in 91:

Sep/16/2024 20:53:16 sstp,ppp,info sstp-out1: terminating… - connection timeout
Sep/16/2024 20:53:16 sstp,ppp,info sstp-out1: disconnected
Sep/16/2024 20:53:16 sstp,ppp,info sstp-out1: initializing…
Sep/16/2024 20:53:16 sstp,ppp,info sstp-out1: connecting…
Sep/16/2024 20:53:17 sstp,ppp,info sstp-out1: authenticated
Sep/16/2024 20:53:17 sstp,ppp,info sstp-out1: connected

Uptimes:

Sytem 71:
uptime: 3d20h23m14s
CLOCK
time: 10:09:56
date: sep/17/2024

System 49:
uptime: 3d23h51m35s
CLOCK:
time: 10:15:31
date: 2024-09-17

System 91:
uptime: 3d20h41m48s
CLOCK
time: 10:17:46
date: 2024-09-17

In the snippet, nothing jumps out.

One easy thing you do is lower the keepalive on the SSTP connection, default is 1m before it reconnects, a lower timeout like 15s might be good (as perhaps you’re not waiting long enough before rebooting since today it be at least 1 minute :wink:).

But often some subtle firewall interaction are involved which are harder to spot. Are you using a “fastrack-connection” rule in /ip/firewall/filter? You can try disabling it, to see if it helps (then add it back so that SSTP get a rule before it to accept).

Also, make sure you not using “dial on demand” (default is off, which is what you’d want).

I haven’t worked with the keepalive for SSTP, but I can try it and see if it makes any difference. It’s a shot in the dark, but may be worth a test.

As for fasttrack, I did not have it enabled on the bridge when the systems were having issues, but I did recently enable it. It’s hard to run tests because it takes a few days before problems happen. I am not using fasttrack-connection in any firewall filter rule right now.

I am not using dial-on-demand.

It’s a weird problem. The firewall rules I have so far are just to accept some traffic, but they don’t block yet and don’t change anything. I am still gathering data on exactly what needs to go through the connection, then I am planning to update the rules again.

Thanks for the help so far!