LTE / L2TP/IPSEC tunnel unstable

Got a new device with LTE, Siera MC7700. Works fine, however, when i try to set up L2TP/IPSEC client it connects to the server (have no control over the server side).

Then i try to ping but it shows me ‘not reachable’, and then a few seconds latter it collapses and tries to reconnect again:

Here is the specs of the systems:

[admin@MikroTik] > export
# may/05/2018 19:39:43 by RouterOS 6.42.1
# software id = K0KV-DMB1
#
# model = 953GS-5HnT
# serial number = 49C504E2567E
/interface wireless
set [ find default-name=wlan1 ] band=5ghz-a/n channel-width=20/40mhz-Ce \
    distance=indoors frequency=auto mode=ap-bridge rx-chains=0,1,2 ssid=\
    MikroTik-61AE09 tx-chains=0,1,2 wireless-protocol=802.11
/interface bridge
add admin-mac=******* auto-mac=no comment=BRIDGE name=bridge
/interface ethernet
set [ find default-name=sfp1 ] disabled=yes
set [ find default-name=sfp2 ] disabled=yes
/interface l2tp-client
add add-default-route=yes allow-fast-path=yes connect-to=******** \
    dial-on-demand=yes disabled=no ipsec-secret=****** keepalive-timeout=\
    10 max-mru=1410 max-mtu=1410 name=LiquidVPN password=******* use-ipsec=\
    yes user=********
/interface list
add comment=defconf name=WAN
add comment=defconf name=LAN
/interface lte apn
add apn=fast.t-mobile.com name="T-Mobile US LITE"
/interface lte
set [ find ] apn-profiles="T-Mobile US LITE" mac-address=************ \
    name=LTE
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/ip pool
add name=default-dhcp ranges=192.168.88.10-192.168.88.254
/ip dhcp-server
add address-pool=default-dhcp disabled=no interface=bridge name=defconf
/port
set 1 name=usb2
/interface ppp-client
add apn="T-Mobile US LITE" dial-on-demand=no name=ppp-out1 port=usb2
/interface bridge port
add bridge=bridge comment=defconf interface=ether2
add bridge=bridge comment=defconf interface=ether3
/ip neighbor discovery-settings
set discover-interface-list=LAN
/interface list member
add comment=defconf interface=bridge list=LAN
add comment=defconf interface=ether1 list=WAN
/ip address
add address=192.168.88.1/24 comment=defconf interface=bridge network=\
    192.168.88.0
/ip dhcp-server network
add address=192.168.88.0/24 comment=defconf dns-server=192.168.88.1 gateway=\
    192.168.88.1
/ip dns
set allow-remote-requests=yes servers=208.67.222.222,8.8.8.8
/ip dns static
add address=192.168.88.1 name=router.lan
/ip firewall nat
add action=masquerade chain=srcnat disabled=yes out-interface=LTE
add action=masquerade chain=srcnat out-interface=LiquidVPN
/ip route
add distance=2 gateway=LTE
add distance=1 dst-address=100.204.125.0/24 gateway=LTE
/system routerboard settings
set silent-boot=no
/tool mac-server
set allowed-interface-list=LAN
/tool mac-server mac-winbox
set allowed-interface-list=LAN
[admin@MikroTik] > ip route print
Flags: X - disabled, A - active, D - dynamic, 
C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme, 
B - blackhole, U - unreachable, P - prohibit 
 #      DST-ADDRESS        PREF-SRC        GATEWAY            DISTANCE
 0 ADS  0.0.0.0/0                          LiquidVPN                 1
 1   S  0.0.0.0/0                          LTE                       2
 2 ADC  10.11.11.0/32      23.227.197.119  LiquidVPN                 0
 3  DS  23.2************                   23.2************          0
 4   S  23.2************                   100.2**********           1
 5 ADC  100.2**********    100.2********** LTE                       0
 6 A S  100.2**********                    LTE                       1
 7 ADC  192.168.88.0/24    192.168.88.1    bridge                    0

Any ideas why that might be happening?

When obfuscating the public IP addresses in the configuration _export_s and _print_s, it is essential to do that in such a way that you wouldn’t obfuscate also the information about relationship between them.

So e.g. if your public address is 123.45.67.89/25 and the gateway is 123.45.67.1, take care to replace 123.45.67 with something like my.prefix.a systematically in the whole text you publish and keep the last byte and the /25 unchanged. Ctrl-H (find&replace) is your friend, it also helps you not to omit some occurrences of the prefix.

The reason why I write this is that I suspect that you haven’t realized one point. When you send a packet to the L2TP client interface, the L2TP encapsulates it into a brand new, locally originated packet and sends the new one to the IP address of the VPN server. At this stage it is not important which route matches this new packet (provided that at least one does), because the IPsec policy created by the L2TP/IPsec astro clock steals the packet and lets the IPsec SA encrypt and encapsulate it again into yet another packet (plain ESP or ESP over UDP). And now comes the point: this packet is routed using the normal routing rules, but normal routing rules say that the default route is the one via L2TP client interface, and no route more specific for the IP address of the VPN server as destination is available. So the double encapsulated packet is sent to the L2TP tunnel again, double encapsulated, sent to the L2TP tunnel again…

Now depending on whether you configure a domain name or an IP address as connect-to of the /interface l2tp-client, a different approach needs to be chosen to fix this. If it is an IP address, simply

/ip route add dst-address=addr.of.vpn.server gateway=LTE

and you should be good.

If the connect-to is a domain name, it means that the actual IP address may change for each new connection, and you never know whether the new one will come from the same subnet or a completely different one, so you don’t know how to configure the exception route from the default one.

If this is the case, there are several solutions, none of them perfect. So come back if it turns out you need one of them.

Sindy, I’m coming back to thank you again, and breaking down the logic behind it. Unfortunately, in the last few days i was heavily medicated due to spring allergies, so wasnt thinking clearly. Your observation along with suggestion worked 100%. Now, I would create another threat for the question, but they are closely related, so I hope you can extend further knowledge The issue is as follows:

My ultimate goal for this router is to operate in a very tight environment, so only selected IPs to come out and in via L2TP/IPSEC tunnel, along with a set of DNS IPs for pings helping with failover.

Now having input, output and forward and given what you’ve mentioned above, most of the exceptions should be made for ‘forward’ chain and for ‘input’ allow only l2tp/ipsec ports and protocols and the rest to be blocked.

To make sure all traffic goes through LT2TP/IPSEC i set:

/ip firewall filter
add action=drop chain=forward comment=VPN_Only in-interface=LAN out-interface=LTE

Made another exception on ‘input’ chain for Winbox IP.

However, I tried reproducing the above on ‘output’ chain and everything seems to crash, i get locked out of Winbox (despite making the exception on ‘output’ chain for Winbox ip).

My question is, is there a better way to do what i’ve described above? Any why ‘output’ chain reacts differently?

Well, here I cannot give a clear answer because the question is not fully clear - what exactly means “I tried reproducing the above on ‘output’ chain”?

So at this stage, just generic hints:

  • the roles of the chains in the firewall are clearly visible from this diagram. No in-interface exists for packets handled by chain=output, which means that a rule checking in-interface value in that chain would never match, so it must look different if it could cut you from the world.
  • it may be a matter of personal opinion, but if I compose a firewall, I prefer the approach “drop everything except what I know must be allowed” to “allow everything except what I know must be dropped”, as I’m not Sir Pratchett to imagine even the unimaginable. The rationale behind is that if you by mistake drop something that should not be dropped, your authorized users will quickly let you know; while if you by mistake allow something that should not be allowed, your non-authorized users will never let you know.
    In Mikrotik case in particular, this means that the last rule in the chain must be a “drop” one without any conditions, because unlike with iptables, you cannot change the default behaviour of a predefined chain to “drop”, it is always “accept”.
  • stateful packet inspection is your best friend. It allows you to concentrate on the initial packet of each connection, and handle the rest using the generic “allow established or related” rule, provided the very last rule in each chain is the “drop” as stated above.
  • an exception from the above is the chain=output of /ip ifirewall filter. My opinion is that this chain is of little use. The only reason to prevent your own device from actively establishing connections somewhere else by putting restrictions to the output chain of the filter could be to make it impossible for a malware running on the device itself to connect somewhere, but as in our case the malware can easily wipe the firewall completely once it gets in, why bother.

A supercharged introduction to the firewall is here.

A couple of points specific for L2TP/IPsec:

  • if both peers run public IP addresses directly on themselves, the session negotiation of the IPsec part runs between their UDP ports 500 and the data transport runs using ESP, which is a layer inside IP at the same hierarchical level as UDP or TCP, and unlike them, it has no notion of ports so it cannot be NATed. If at least one of the peers is behind a NAT, even a 1:1 one or with static port forwarding, the session negotiation on ports 500 detects that and moves both the continuation of itself and the data transport to UDP ports 4500. The ESP is then encapsulated into UDP so it can be NATed.
  • the L2TP uses the IPsec connection to cipher communication between UDP ports 1700 on the same two addresses of the peers.
  • if the IPsec tunnel does not establish for some reason, the L2TP may still work without encryption unless you take some counter-measures.

So a box acting as an L2TP/IPsec client can simply be configured not to accept any incoming connection at all via the WAN interface, because it initiates all the above connections from its own side. The magic “accept related, established” in chain=input of /ip firewall filter is all you need to let L2TP/IPsec client be permitted on an otherwise closed firewall.

When configuring an L2TP/IPsec server, you must permit new incoming connections to UDP port 500, but also new incoming connections to UDP port 4500 and new incoming ESP connections, because it is not possible for the firewall to “spy” the need to treat one of these as connection-state=related from the session negotiation as it is already ciphered, nor there is any out-of-band information channel between the IPsec stack and the firewall (at least in RouterOS case). You must also permit new incoming connections to UDP port 1701, but only those matching ipsec-policy=in,ipsec, because otherwise L2TP connections without IPsec, i.e. with much weaker authentication and without any encryption, could be accepted. The rationale behind is that IPsec works very different from other VPN protocols, so the packet decrypted from the received ESP one still appears to the firewall as coming in via the same physical interface as the ESP packet did. The ipsec-policy property of the packet allows you to distinguish between a packet which came in directly (ipsec-policy=in,none) and a packet which has been locally decrypted and decapsulated (ipsec-policy=in,ipsec).

See, Sindy, that exactly that for ‘output’ chain. I want a client to my router to be only able to initiate connection to selected IPs, the rest of application, windows or linux updates etc to be unable to go online. Therefore, if I just use the ‘input’ chain, it will not stop the inner network machine to initiate connections. However, you suggestion on ‘accept established, related’ worked well, as I was able to create ‘forward’ and ‘output’ chain ‘whitelist’ and in the end put the ‘accept established, related’ and rest just to be dropped. So, seems like a great suggestion, thank you.

BTW, in the past, i used to monitor internet connection via Netwatch as part of some router systems there was basic failover:

/ip firewall filter
add action=drop chain=output comment=Ping_LTE dst-address=8.8.8.8 out-interface=LTE
/tool netwatch
add down-script="/interface lte set [find name=LTE] disabled=yes;\r\
    \n/interface disable VPN;\r\
    \n/ip firewall connection remove [/ip firewall connection find protocol udp and t\
    cp];\r\
    \n/delay 1;\r\
    \n/interface lte set [find name=LTE] disabled=no;\r\
    \n/delay 25;\r\
    \n/interface enable VPN;" host=8.8.8.8 interval=45s

Something that is simple and kind of effective. But this time, I need to do other topological updates, one being able to check if the L2TP/IPSEC tunnel is up on it own. I tried redirecting pinging through Netwatch to VPN interface but not very effective as i remember its icmp and goes from Mik core out which I was not able to redirect. Does that mean that only creating a separate script is the solution, or is there a way to redirect Netwatch pings through the VPN interface?

You didn’t get me (or the diagram in the documentation). chain=output has nothing to do with clients on Mikrotik’s LAN side, it only processes packets sent by the Mikrotik itself.

To restrict clients, use chain=forward, and permit connection-state=new packets only in only LAN → VPN (or LAN → WAN where appropriate) direction, and only to destinations you want to allow the clients to reach. And in chain=input, you may want to prevent the clients from accessing Mikrotik’s management (ssh, winbox etc.). You may actually want to permit access to management of the Mikrotik itself only via the VPN if there is a risk that someone could come and connect to an unused port.


Two points here.

First, the netwatch privileges have been decreased in the last releases. Some people seem to be really unhappy about that because they cannot use it for configuration adjustments any more, so you may find yourself affected too.

Second, Netwatch pings should be just like any other pings originated by Mikrotik itself (e.g. those checking availability of recursive gateways), so they normally use the default routing table (“main”). One way to change that would be to mark them with some routing-mark in chain=output of /ip firewall mangle and create a default route with the same routing-mark and the gateway you want them to use. So packets without any routing mark would be routed according to the default routing table, and packets with routing mark would be routed according to the routing table named the same.

But I would probably be lazy and instead of exercises with route marking, I would simply set individual routes for one pair of DNS server addresses (such as 8.8.4.4 and 208.67.220.220) via the VPN gateway, and individual routes to another pair (such as 8.8.8.8 and 208.67.222.222) via the WAN gateway, all four in the default routing table, and set firewall rules

/ip firewall address-list
add list=netwatch-vpn-targets address=8.8.4.4
add list=netwatch-vpn-targets address=208.67.220.220
add list=netwatch-wan-targets address=8.8.8.8
add list=netwatch-wan-targets address=208.67.222.222

/ip firewall mangle
add action=jump chain=prerouting icmp-options=0 jump-target=netwatch-monitors protocol=icmp
add action=add-dst-to-address-list address-list=netwatch-vpn-responses address-list-timeout=5s chain=netwatch-monitors src-address-list=netwatch-vpn-targets
add action=add-dst-to-address-list address-list=netwatch-wan-responses address-list-timeout=5s chain=netwatch-monitors src-address-list=netwatch-wan-targets

I would run four netwatch processes, one per each monitored address, and if one of them would get to state down, its on-down script would check the status of both address-lists and depending on it it would take measures. So a temporary loss of one monitored address would not trigger any action at all, and loss of both addresses monitored via VPN would only cause a restart of the VPN if responses from at least one address monitored via WAN would still be present.

And if eventually the netwatch privileges would be too low to let it disable L2TP or power-cycle the LTE, I would schedule a periodical spawn of a script which would checking the two address-lists instead of setting the on-down scripts in netwatch.

Just a remark, whatever you set the address-list-timeout to, the script can still see the address on the list 5 seconds after the timeout has expired (indicating remaining time od 0s all the time).

Sindy, you are correct. I didn’t get it to the extend as you’ve explained in the last post, thus thank you. That post helped me resolve certain issues with which I’ve been struggling earlier today. This is a simplified prototype of firewall built:

Questions#1:

/ip firewall filter
1. add action=drop chain=forward comment=KillSwitchVPN_LTE in-interface=BRIDGE out-interface=LTE
2. add action=drop chain=forward comment=KillSwitchVPN_MGT out-interface=LiquidVPN src-address=192.XXX.XXX.5
3. add action=drop chain=forward comment=KillSwitchVPN_LTE out-interface=LTE src-address=192.XXX.XXX.5
4. add action=drop chain=input comment="Drop Incoming ICMP on LTE" disabled=yes dst-address=100.100.100.100 in-interface=LTE protocol=icmp
5. add action=drop chain=forward comment="Drop packets from BRIDGE that do not have BRIDGE IP" in-interface=BRIDGE log=yes log-prefix=LAN_!BRIDGE src-address=\
    !192.XXX.XXX.75-192.XXX.XXX.77
  1. is for making sure no traffic leaves if the tunnel is down 2&3 to make sure MGT has no access to internet 4&5 selfexplanatory.
add action=accept chain=forward comment="Word License#1" dst-address=72.XXX.XXX.X2 dst-port=80 log=yes protocol=tcp
add action=accept chain=forward comment="Word License#2" dst-address=19.XXX.XXX.X4 dst-port=80 log=yes protocol=tcp
add action=accept chain=forward comment="Demo#1" dst-address=20.XXX.XXX.2 dst-port=80 log=yes protocol=tcp
add action=accept chain=forward comment="Demo#2" dst-address=20.XXX.XXX.X2 dst-port=80 log=yes protocol=tcp
add action=accept chain=forward comment="Demo#3" dst-address=20.XXX.XXX.X3 dst-port=443 log=yes protocol=tcp
add action=accept chain=forward comment="Demo#4" dst-address=20.XXX.XXX.X9 dst-port=443 log=yes protocol=tcp
add action=accept chain=input comment="Allow MGT" disabled=yes dst-address=192.XXX.XXX.5 dst-port=***** log=yes protocol=tcp
add action=accept chain=forward comment=DNS dst-port=53 log=yes protocol=tcp
add action=accept chain=input comment="L2TP - IKE v1" dst-port=500 in-interface=LTE log=yes protocol=udp
add action=accept chain=input comment="L2TP - IKE v1" dst-port=4500 in-interface=LTE log=yes protocol=udp
add action=accept chain=input comment="L2TP - Traffic" dst-port=1701 in-interface=LTE log=yes protocol=udp
add action=accept chain=input comment=IPSEC in-interface=LTE log=yes protocol=ipsec-esp
add action=accept chain=input comment="Accept Established, Related Connections" connection-state=established,related
add action=accept chain=forward comment="Accept Established, Related Connections" connection-state=established,related
add action=drop chain=input comment="Drop all INPUT chain LTE" src-address=!192.XXX.XXX.5 in-interface=LTE
add action=drop chain=input comment="Drop all INPUT chain VPN" in-interface=VPN
add action=drop chain=forward comment="Drop all FORWARD chain LTE" in-interface=LTE
add action=drop chain=forward comment="Drop all FORWARD chain VPN" in-interface=VPN
add action=drop chain=input comment="Drop Incoming ICMP on VPN" dst-address=100.100.100.100 in-interface=VPN protocol=icmp
add action=drop chain=input protocol=icmp
add action=drop chain=forward protocol=icmp

In short, have specific IPs that need to be able to connect from Mik client through the tunnel to internet, have DNS and LT2P accept, and as told earlier the ‘magic’ “Accept Established, Related”.
Now, unfortunately, Mik client can still initiate connections on browser etc.. I understand you mentioned the ‘output’ chain is usually regarding Mik itself, but without it, client can still initiate new connections. Unfortunately, it seems that i’m using ‘output’ chain unaffectedly as when i add new rule chain=output action=drop in-interface LTE (it doesn’t change much) but when i change the interface to VPN, everything locks down, the tunnel collapses etc. Thus, how can i stop outgoing connection and limit only to the ones listed?

Questions#2:
You brought up a good point about quad pings, the thing is that i was thinking on having that setup on failover machine, nevertheless, i used Netwatch because im somewhere of an intermediate level, not advance or export and definitely not Guru. Script writing for me isnt easy because i dont have enough time to sit and get a hold of it or write enough to feel comfortable with it, thus Netwatch was my savior. I like the idea of route marking and I think this can work, but

I would run four netwatch processes, one per each monitored address, and if one of them would get to state down, its on-down script would check the status of both address-lists and depending on it it would take measures. So a temporary loss of one monitored address would not trigger any action at all, and loss of both addresses monitored via VPN would only cause a restart of the VPN if responses from at least one address monitored via WAN would still be present.

And if eventually the netwatch privileges would be too low to let it disable L2TP or power-cycle the LTE, I would schedule a periodical spawn of a script which would checking the two address-lists instead of setting the on-down scripts in netwatch.

Where would I even start if i would to follow your suggestion? Is there a better way than writing a script? I know for you it might be seconds but for someone like me its days if not weeks. Perhaps you know of any examples available online that i can go through and then try to modify, etc? Dont loose patience with me yet :slight_smile:

Just a quick update on a few points before the real one hopefully later today.

WRT Question lot #1:

  • your output rule with in-interface=vpn breaking everything is a mystery to me. I’ve tried to enter it and got a mid-finger straight away:
[me@MyTik] > ip firewall filter add chain=output in-interface=vlan200 action=log
failure: incoming interface matching not possible in output and postrouting chains

So I don’t get how you could even enter it.

  • please post the complete output of /ip firewall export after replacing the public addresses by some meaningful names. The order of rules matters and it is not clear from the two separate lists you’ve posted.
  • do you plan the machine in question to be an L2TP/IPsec server? If yes, you haven’t followed my recommendation regarding the rule for UDP port 1701. If not, you don’t need to care about IPsec-related ports at all as no restrictions in chain=output and “accept established” in chain=input will do the job.

Regarding the /tool netwatch, I’ve found myself that it is not possible to use the on-down and on-up scripts if you want to use redundancy of monitored points (or ignore short-time outages on lossy links with no alternative, which is the case in one of my deployments). So in addition to fixing the copy-paste mistake I’ve made in my previous post (updating the address-list on echo requests, icmp-options=8 instead of echo responses, icmp-options=0, once you go that way, you have to use the /tool netwatch only as a controlled generator of periodic ping requests and the evaluation of the results has to be a job of the periodically spawned script. So give me the actions you want to take when only both pings via L2TP fais and when also both pings via LTE fail, I’ll put something together.

So you’ve made me replace the plain netwatch monitoring on my site where a wireless “backbone” is so overbooked and the air is so noisy that netwatch pings sometimes fail to get responded although the monitored devices themselves are doing fine.

As a reward, here is the script modified for your needs as much as I could, you only have to fill the appropriate corrective actions and the timeouts necessary to give them some time to complete before re-attempting them.

# create the variables if they do not exist yet, be optimistic and set them to green status
if ([/system script environment print count-only where name=lteAlive]=0) do={global lteAlive 1}
if ([/system script environment print count-only where name=vpnAlive]=0) do={global vpnAlive 1}

# compare the status of address-list monitor-lte to the previous one and take action if it became empty
if ([/ip firewall address-list print count-only where list=monitor-lte]=0) do={\
  if ([/system script environment get [find name=lteAlive] value]>0) do={
    global lteAlive 0;
    ... place here the action necessary to kickstart LTE ...;
    /ip firewall address-list add list=monitor-lte address=4.3.2.1 address-list-timeout=... place here the necessary recovery time, such as 2m30s ...
  }
} else={
  if ([/system script environment get [find name=lteAlive] value]=0) do={
    global lteAlive 1
  }
}

# compare the status of address-list monitor-vpn to the previous one and take action if it became empty
if ([/ip firewall address-list print count-only where list=monitor-vpn]=0) do={\
  if ([/system script environment get [find name=vpnAlive] value]>0) do={
    global vpnAlive 0;
    ... place here the action necessary to kickstart L2TP ...;
        /ip firewall address-list add list=monitor-vpn address=4.3.2.1 address-list-timeout=... place here the necessary recovery time, such as 2m30s ...
  }
} else={
  if ([/system script environment get [find name=vpnAlive] value]=0) do={
    global vpnAlive 1
  }
}

You have to paste that script, with your adjustments, as source parameter of a newly created /system script item.

The mangle rules suggested earlier must add some address (source or destination, the address value itself isn’t important) to the corresponding address-list (monitor-vpn or monitor-lte) each time they see an icmp echo response from one of the monitored addresses. The address-list-timeout value choice is up to you, but it must be proportionally longer than the pinging interval configured in /tool netwatch. No up-script and down-script values are necessary for any of the four /tool netwatch items

You have to schedule periodic execution of the script with start-time=startup and interval proportional to the times above.

So e.g. if you make the netwatch ping each destination every 15 seconds, then the address-list-timeout in the mangle rules may be set to 65s to tolerate loss of up to three ping responses from each of the two monitored addresses, and the periodicity of script execution may be set to 15s again to react relatively promptly once the failure will have been confirmed.

RE:Question#1:

Regarding ‘output’

/ip firewall filter
add action=drop chain=forward comment=KillSwitchVPN_LTE in-interface=BRIDGE out-interface=LTE
add action=drop chain=forward comment=KillSwitchVPN_MGT out-interface=LiquidVPN src-address=192.XXX.XXX.5
add action=drop chain=forward comment=KillSwitchVPN_LTE out-interface=LTE src-address=192.XXX.XXX.5
add action=drop chain=input comment="Drop Incoming ICMP on LTE" disabled=yes dst-address=100.100.100.100 in-interface=LTE protocol=icmp
add action=drop chain=forward comment="Drop packets from BRIDGE that do not have BRIDGE IP" in-interface=BRIDGE log=yes log-prefix=LAN_!BRIDGE src-address=\
    !192.XXX.XXX.75-192.XXX.XXX.77
add action=accept chain=forward comment="Word License#1_Server" dst-address=172.XXX.XXX.12 dst-port=80 log=yes protocol=tcp
add action=accept chain=forward comment="Word License#2_Server" dst-address=191.XXX.XXX.24 dst-port=80 log=yes protocol=tcp
add action=accept chain=forward comment="Demo#1_Server" dst-address=202.XXX.XXX.2 dst-port=80 log=yes protocol=tcp
add action=accept chain=forward comment="Demo#2_Server" dst-address=203.XXX.XXX.12 dst-port=80 log=yes protocol=tcp
add action=accept chain=forward comment="Demo#3_Server" dst-address=204.XXX.XXX.13 dst-port=443 log=yes protocol=tcp
add action=accept chain=forward comment="Demo#4_Server" dst-address=205.XXX.XXX.19 dst-port=443 log=yes protocol=tcp
add action=accept chain=input comment="Allow MGT" disabled=yes dst-address=192.XXX.XXX.5 dst-port=***** log=yes protocol=tcp
add action=accept chain=forward comment=DNS dst-port=53 log=yes protocol=tcp
add action=accept chain=input comment="L2TP - IKE v1" dst-port=500 in-interface=LTE log=yes protocol=udp
add action=accept chain=input comment="L2TP - IKE v1" dst-port=4500 in-interface=LTE log=yes protocol=udp
add action=accept chain=input comment="L2TP - Traffic" dst-port=1701 in-interface=LTE log=yes protocol=udp
add action=accept chain=input comment=IPSEC in-interface=LTE log=yes protocol=ipsec-esp
add action=accept chain=input comment="Accept Established, Related Connections" connection-state=established,related
add action=accept chain=forward comment="Accept Established, Related Connections" connection-state=established,related
add action=drop chain=input comment="Drop all INPUT chain LTE" src-address=!192.XXX.XXX.5 in-interface=LTE
add action=drop chain=input comment="Drop all INPUT chain VPN" in-interface=VPN
add action=drop chain=forward comment="Drop all FORWARD chain LTE" in-interface=LTE
add action=drop chain=forward comment="Drop all FORWARD chain VPN" in-interface=VPN
add action=drop chain=output comment="Drop all OUTPUT chain LTE" in-interface=LTE disabled=yes	
add action=drop chain=output comment="Drop all OUTPUT chain VPN" in-interface=VPN disabled=yes
add action=drop chain=input comment="Drop Incoming ICMP on VPN" dst-address=100.100.100.100 in-interface=VPN protocol=icmp
add action=drop chain=input protocol=icmp
add action=drop chain=forward protocol=icmp

LTE=WAN Interface

VPN=L2TP-client interface

  1. I’m on the L2TP/IPSEC client side and have no control over the server I’m connecting to
  2. all “output” chain rules are disabled because they either both block traffic to the VPN tunnel and in the end collapsing connection
  3. Set the rules for port 500, 4500, 1701, before setting up “accept related, established”
  4. My only goal is to allow “Word Server & Demo Server” IPs to be connected by the Mik client machine through l2tp/ipsec tunnel. The rest from both sides blocked, nothing else in or out. Managed to do it on the in, but on the out, I’m having troubles as given the above machine can still go and use browser etc.

P.S. Regarding #2, will respond shortly, let me put it together and try running it.

  1. it is very useful to keep rules belonging to the same chain in one block, because it makes them much easier to read.
  2. what software version are you running and how exactly did you enter those surrealistic output rules? I can only imagine that WinBox or WebFig allow them to be added although they are incorrect, and because it is impossible to match in-interface in output chain, that match condition is ignored and thus the rules match any packet, hence cutting you off the box
  3. blocking icmp completely is a Bad Idea. Ping is just one application of icmp, the majority is network control and if you block that part, you break things like path MTU discovery which relies on icmp. Normally, these flavors of icmp are handled by the magic “accept established,related” rule because the network control icmp messages are considered related to the TCP and/or UDP connections, but you drop icmp already before this rule.
    Blocking icmp on an interface is also a bit incompatible with using netwatch over that interface.
    You can match icmp requests selectively using protocol=icmp icmp-options=8
  4. neither of your (input, forward) chains ends with a “drop the rest” rule.

So a firewall implementing your requirements looks as follows:

/ip firewall address-list
add list=permitted-http-destinations address=dst-address=172.XXX.XXX.12 comment="Word License#1_Server"
add list=permitted-http-destinations address=dst-address=191.XXX.XXX.24 comment="Word License#2_Server"
add list=permitted-http-destinations address=dst-address=202.XXX.XXX.2 comment="Demo#1_Server"
add list=permitted-http-destinations address=dst-address=203.XXX.XXX.12 comment="Demo#2_Server"

add list=permitted-https-destinations address=dst-address=204.XXX.XXX.13 comment="Demo#1_Server"
add list=permitted-https-destinations address=dst-address=205.XXX.XXX.19 comment="Demo#2_Server"

add list=permitted-local-sources address=192.XXX.XXX.75
add list=permitted-local-sources address=192.XXX.XXX.76
add list=permitted-local-sources address=192.XXX.XXX.77

add list=management-access address=ip.of.management.pc

/ip firewall filter
### chain INPUT - communication with Mikrotik itself ###

add action=accept chain=input connection-state=established,related
# the above rule handles the bulk of all packets towards Mikrotik itself, so it should be the topmost one in each chain so that those packets wouldn't need to be matched by other rules

add action=accept chain=input protocol=udp in-interface=BRIDGE dst-port=53,123 # permit DNS and NTP requests from clients
add action=accept chain=input protocol=tcp in-interface=BRIDGE dst-port=53 # permit DNS requests from clients over TCP
#if you don't want the clients to use Mikrotik as DNS server and/or NTP server, you have to permit access to DNS and/or NTP servers in forward chain instead
# dhcp from clients does not need to be permitted as it is handled before the firewall

add action=accept protocol=tcp in-interface=BRIDGE src-address-list=management-access dst-port=22,80,443,8291
# the above rule permits local management access to the machine, keep only the ports you actually use on the list

# ... place here eventual rules logging various attempts to establish connections to the Mikrotik itself...

add action=drop chain=input disabled=yes
# this rule prevents any other new connections to the Mikrotik itself than those permitted by the previous rules to be established
# from anywhere; only enable it after checking that the "accept management connections" rule just above counts packets, 
# and nevertheless switch on the safe mode before enabling it

### chain FORWARD - communication with Mikrotik itself ###
add action=accept chain=forward connection-state=established,related
# same like in chain INPUT, the above rule should be the topmost one in this chain

add action=accept chain=forward protocol=tcp dst-port=80 in-interface=BRIDGE out-interface=l2tp-out1 dst-address-list=permitted-http-destinations src-address-list=permitted-local-sources
add action=accept chain=forward protocol=tcp dst-port=443 in-interface=BRIDGE out-interface=l2tp-out1 dst-address-list=permitted-https-destinations src-address-list=permitted-local-sources
# the two rules above implement your requirement "allow "Word Server & Demo Server" IPs to be connected by the Mik client machine through l2tp/ipsec tunnel"

# ... place here eventual rules logging what tries to establish new forwarded connections either way...

add action=drop chain=forward
# the above rule drops anything attempting to pass #through# the Mikrotik that didn't match the rules above

RE: Question#2

I’m glad that you found value in helping me through this process. I took yesterday’s mangle rule and today. There was one little issue ‘address-list-timeout’ which the system wasnt taking as it required ‘timeout’, thus here is the setup that i added:

/ip firewall address-list
add list=netwatch-vpn-targets address=8.8.4.4
add list=netwatch-vpn-targets address=208.67.220.220
add list=netwatch-wan-targets address=8.8.8.8
add list=netwatch-wan-targets address=208.67.222.222

/ip firewall mangle
add action=jump chain=prerouting icmp-options=0 jump-target=netwatch-monitors protocol=icmp
add action=add-dst-to-address-list address-list=netwatch-vpn-responses address-list-timeout=5s chain=netwatch-monitors src-address-list=netwatch-vpn-targets
add action=add-dst-to-address-list address-list=netwatch-wan-responses address-list-timeout=5s chain=netwatch-monitors src-address-list=netwatch-wan-targets

/system script
add name=operation_lte_vpn owner=admin policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon source="# create the variables if they \
    do not exist yet, be optimistic and set them to green status\r\
    \nif ([/system script environment print count-only where name=lteAlive]=0) do={global lteAlive 1}\r\
    \nif ([/system script environment print count-only where name=vpnAlive]=0) do={global vpnAlive 1}\r\
    \n# compare the status of address-list monitor-lte to the previous one and take action if it became empty\r\
    \nif ([/ip firewall address-list print count-only where list=monitor-lte]=0) do={\\\r\
    \n  if ([/system script environment get [find name=lteAlive] value]>0) do={\r\
    \n    global lteAlive 0;\r\
    \n    /interface lte set [find name=LTE] disable=yes\r\
    \n\t/interface lte set [find name=LTE] disable=no \r\
    \n\t/ip firewall address-list add list=monitor-lte address=8.8.8.8 timeout=1m5s\r\
    \n  }\r\
    \n} else={\r\
    \n  if ([/system script environment get [find name=lteAlive] value]=0) do={\r\
    \n    global lteAlive 1\r\
    \n  }\r\
    \n}\r\
    \n# compare the status of address-list monitor-vpn to the previous one and take action if it became empty\r\
    \nif ([/ip firewall address-list print count-only where list=monitor-vpn]=0) do={\\\r\
    \n  if ([/system script environment get [find name=vpnAlive] value]>0) do={\r\
    \n    global vpnAlive 0;\r\
    \n        /interface l2tp-client set [find name=VPN] disabled=yes\r\
    \n\t\t/interface l2tp-client set [find name=VPN] disabled=no\r\
    \n        /ip firewall address-list add list=monitor-vpn address=8.8.4.4 timeout=1m5s\r\
    \n  }\r\
    \n} else={\r\
    \n  if ([/system script environment get [find name=vpnAlive] value]=0) do={\r\
    \n    global vpnAlive 1\r\
    \n  }\r\
    \n}"
    
/system scheduler
add interval=1m5s name=operation_lte_vpn on-event="/system script run operation_lte_vpn" policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon start-time=startup

I’m still trying to better understand the logic of the script and certain lines, as this is not ‘vanilla’ approach, given your level of expertise, however, I’m highly interested in understanding it and learning rather than just plug-n-play with the work someone put in it.

  • So, ‘/system script environment print count-only where name=lteAlive]=0’ checks the LTE interface if is enable/disabled, if disabled or 1, the goes to '/system script environment get [find name=lteAlive] value]>0’ confirming and defining as lteAlive 0 (disabled) and going into the following command to re-enable it; and with ‘/ip firewall address-list add list=monitor-lte address=8.8.8.8 timeout=1m5s’ gives it another try to see if its enabled and if enabled and goes to let Alive 1 state. Am correct?

  • Also, in the above I’m using just 1 IP for each, no sure how the logic would fall for 2 IP for each interface.

  • Lastly, I noticed that when L2TP-client is down the pings go out LTE and pass through giving it up status. This script seems to work only for the LTE side, and the netwatch goes from l2tp-client interface to LTE and shows ‘up’ status and the script keeps on going as business as usual. Not sure if thats suppose to be like it.

lteAlive and vpnAlive are status variables created for the sake of detecting a change of interface status. If the variable contains 1 (interface was up at last check) and the address-list tracking the icmp responses coming via this interface is empty during the current run of the script, indicating that the interface is actually down, there is a difference between last known status and the status detected during the current run of the script, so it means the interface went down between the previous and current runs and so the corrective action needs to be taken and the variable updated to remember the current state for comparison in the next run of the script, so that you could notice the change from down to up.

It could actually be completely omited from your script. In my application, I need to send a single e-mail when the destination goes down, and a single e-mail when the destination goes up, and don’t need to initiate any corrective actions; in your case, you need to restart the interface as soon as you detect that it is down, and you even want to pretend that it is up (that’s why I add the bogus address to the address-list) so that the restart process could complete successfully without being re-triggered over and over again. So the bogus address on the address-list substitutes the variable in your case.


The logic is such that the more addresses you ping, the lower the probability that you conclude that the interface is down while actually only the monitored address is down. Even Google DNS can sometimes be down.


This must be because you haven’t added the individual route(s) to the address(es) used to monitor the state of the L2TP tunnel via the gateway address provided by the L2TP server.
A route with more specific (longer prefix) dst-address always wins over a less specific (shorter prefix) dst-address, regardless the distance value (distance only determines the mutual priority of routes with identical dst-address values. So if you create an individual route to some destination (like the address used to monitor the status of the interface) and that route’s gateway goes down, you cannot ping that destination even though a default route’s gateway is accessible.

However, I’m not sure whether the route remains active if its gateway property is set to an interface name and the interface is down, but that should not be your case.

So I keep on testing firewall + the failover. So far firewall seems to do what its suppose to. Will keep on working during the night on it and tomorrow to make further adoptions to devices etc. thus truly appreciate you setting my logic and approach to it straight. Also, added your suggestion regarding the route for monitor-vpn, however, noticed something odd… with the script, I noticed loss of connection on LTE almost every few minutes, but without the script, it seems to be ok, Thus wondering, do you believe because the script is looking for certain changes, it might inadvertently collide with something related to LTE connection itself? Very odd observation and behavior…

/ip firewall address-list
add list=netwatch-vpn-targets address=8.8.4.4
add list=netwatch-vpn-targets address=208.67.220.220
add list=netwatch-wan-targets address=8.8.8.8
add list=netwatch-wan-targets address=208.67.222.222

/ip firewall mangle
add action=jump chain=prerouting icmp-options=0 jump-target=netwatch-monitors protocol=icmp
add action=add-dst-to-address-list address-list=netwatch-vpn-responses address-list-timeout=5s chain=netwatch-monitors src-address-list=netwatch-vpn-targets
add action=add-dst-to-address-list address-list=netwatch-wan-responses address-list-timeout=5s chain=netwatch-monitors src-address-list=netwatch-wan-targets

/ip route
add check-gateway=ping comment=VPN_Ping_Check_1 distance=1 dst-address=8.8.4.4/32 gateway=VPN
add check-gateway=ping comment=VPN_Ping_Check_2 distance=1 dst-address=208.67.220.220 gateway=VPN

/system script
add name=operation_lte_vpn owner=admin policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon source="# create the variables if they \
    do not exist yet, be optimistic and set them to green status\r\
    \nif ([/system script environment print count-only where name=lteAlive]=0) do={global lteAlive 1}\r\
    \nif ([/system script environment print count-only where name=vpnAlive]=0) do={global vpnAlive 1}\r\
    \n# compare the status of address-list monitor-lte to the previous one and take action if it became empty\r\
    \nif ([/ip firewall address-list print count-only where list=monitor-lte]=0) do={\\\r\
    \n  if ([/system script environment get [find name=lteAlive] value]>0) do={\r\
    \n    global lteAlive 0;\r\
    \n    /interface lte set [find name=LTE] disable=yes\r\
    \n\t/interface lte set [find name=LTE] disable=no \r\
    \n\t/ip firewall address-list add list=monitor-lte address=8.8.8.8 timeout=1m5s\r\
    \n  }\r\
    \n} else={\r\
    \n  if ([/system script environment get [find name=lteAlive] value]=0) do={\r\
    \n    global lteAlive 1\r\
    \n  }\r\
    \n}\r\
    \n# compare the status of address-list monitor-vpn to the previous one and take action if it became empty\r\
    \nif ([/ip firewall address-list print count-only where list=monitor-vpn]=0) do={\\\r\
    \n  if ([/system script environment get [find name=vpnAlive] value]>0) do={\r\
    \n    global vpnAlive 0;\r\
    \n        /interface l2tp-client set [find name=VPN] disabled=yes\r\
    \n\t\t/interface l2tp-client set [find name=VPN] disabled=no\r\
    \n        /ip firewall address-list add list=monitor-vpn address=8.8.4.4 timeout=1m5s\r\
    \n  }\r\
    \n} else={\r\
    \n  if ([/system script environment get [find name=vpnAlive] value]=0) do={\r\
    \n    global vpnAlive 1\r\
    \n  }\r\
    \n}"

/system scheduler
add interval=1m5s name=operation_lte_vpn on-event="/system script run operation_lte_vpn" policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon start-time=startup

Don’t add the addresses of the monitored DNS servers to the /ip firewall address-list manually (or, better to say, remove that part from the configuration), they must be added there dynamically by the mangle rules only, otherwise the whole concept cannot work. I hope the dynamically added ones set the timeout to the ones added statically, but I’m not sure how that works together.

And show me the /tool netwatch export, the mangle rules are adding the addresses to the lists for 5 seconds only, have you set the netwatch for shorter intervals? The trick is that the address-list-timeout must be several times longer than the netwatch ping interval.

Haven’t added any addresses to DNS-address-list.. and

/ip firewall address-list manually (or, better to say, remove that part from the configuration

can you be more specific?

Ok…so made the change to Mangle=15s, Netwatch=5s and Scrip=1m5s.

It did resolve the disconnection issue… but not it doesnt resolve either LTE or VPN interface when down now. Have I dont too much?

/ip firewall address-list
add list=netwatch-vpn-targets address=8.8.4.4
add list=netwatch-vpn-targets address=208.67.220.220
add list=netwatch-wan-targets address=8.8.8.8
add list=netwatch-wan-targets address=208.67.222.222

/ip firewall mangle
add action=jump chain=prerouting icmp-options=0 jump-target=netwatch-monitors protocol=icmp
add action=add-dst-to-address-list address-list=netwatch-vpn-responses address-list-timeout=15s chain=netwatch-monitors src-address-list=netwatch-vpn-targets
add action=add-dst-to-address-list address-list=netwatch-wan-responses address-list-timeout=15s chain=netwatch-monitors src-address-list=netwatch-wan-targets

/tool netwatch export
add host=8.8.8.8 interval=5s
add host=8.8.4.4 interval=5s
add host=208.67.222.222 interval=5s
add host=208.67.220.220 interval=5s

/ip route
add check-gateway=ping comment=VPN_Ping_Check#1 distance=1 dst-address=8.8.4.4/32 gateway=VPN
add check-gateway=ping comment=VPN_Ping_Check#2 distance=1 dst-address=208.67.220.220 gateway=VPN

/system script
add name=operation_lte_vpn owner=admin policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon source="# create the variables if they \
    do not exist yet, be optimistic and set them to green status\r\
    \nif ([/system script environment print count-only where name=lteAlive]=0) do={global lteAlive 1}\r\
    \nif ([/system script environment print count-only where name=vpnAlive]=0) do={global vpnAlive 1}\r\
    \n# compare the status of address-list monitor-lte to the previous one and take action if it became empty\r\
    \nif ([/ip firewall address-list print count-only where list=monitor-lte]=0) do={\\\r\
    \n  if ([/system script environment get [find name=lteAlive] value]>0) do={\r\
    \n    global lteAlive 0;\r\
    \n    /interface lte set [find name=LTE] disable=yes\r\
    \n\t/interface lte set [find name=LTE] disable=no \r\
    \n\t/ip firewall address-list add list=monitor-lte address=8.8.8.8 timeout=1m5s\r\
    \n  }\r\
    \n} else={\r\
    \n  if ([/system script environment get [find name=lteAlive] value]=0) do={\r\
    \n    global lteAlive 1\r\
    \n  }\r\
    \n}\r\
    \n# compare the status of address-list monitor-vpn to the previous one and take action if it became empty\r\
    \nif ([/ip firewall address-list print count-only where list=monitor-vpn]=0) do={\\\r\
    \n  if ([/system script environment get [find name=vpnAlive] value]>0) do={\r\
    \n    global vpnAlive 0;\r\
    \n        /interface l2tp-client set [find name=VPN] disabled=yes\r\
    \n\t\t/interface l2tp-client set [find name=VPN] disabled=no\r\
    \n        /ip firewall address-list add list=monitor-vpn address=8.8.4.4 timeout=1m5s\r\
    \n  }\r\
    \n} else={\r\
    \n  if ([/system script environment get [find name=vpnAlive] value]=0) do={\r\
    \n    global vpnAlive 1\r\
    \n  }\r\
    \n}"

/system scheduler
add interval=1m5s name=operation_lte_vpn on-event="/system script run operation_lte_vpn" policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon start-time=startup

Remove this part of configuration:

/ip firewall address-list
add list=netwatch-vpn-targets address=8.8.4.4
add list=netwatch-vpn-targets address=208.67.220.220
add list=netwatch-wan-targets address=8.8.8.8
add list=netwatch-wan-targets address=208.67.222.222



Now that’s me who doesn’t understand what you mean. Disconnection issue solved, interface down issue not solved doesn’t make sense to me. Bear in mind that English is not my native language so please describe the two issues which sound the same to me in more detail.


/ip route
add check-gateway=ping comment=VPN_Ping_Check#1 distance=1 dst-address=8.8.4.4/32 gateway=VPN
add check-gateway=ping comment=VPN_Ping_Check#2 distance=1 dst-address=208.67.220.220 gateway=VPN

VPN is interface name or IP address?

Where are the routes for 8.8.8.8/32 and 208.67.222.222? Have you read the post in the “DHCP server stopped” topics?

Ok… removed the addresses from ‘address-list’.

Initial issue with the script was that i kept on getting disconnected from the internet every few minutes. Than you suggested to check the times for mangle rule:

/ip firewall mangle
add action=jump chain=prerouting icmp-options=0 jump-target=netwatch-monitors protocol=icmp
add action=add-dst-to-address-list address-list=netwatch-vpn-responses address-list-timeout=5s chain=netwatch-monitors src-address-list=netwatch-vpn-targets
add action=add-dst-to-address-list address-list=netwatch-wan-responses address-list-timeout=5s chain=netwatch-monitors src-address-list=netwatch-wan-targets

#changed to

/ip firewall mangle
add action=jump chain=prerouting icmp-options=0 jump-target=netwatch-monitors protocol=icmp
add action=add-dst-to-address-list address-list=netwatch-vpn-responses address-list-timeout=15s chain=netwatch-monitors src-address-list=netwatch-vpn-targets
add action=add-dst-to-address-list address-list=netwatch-wan-responses address-list-timeout=15s chain=netwatch-monitors src-address-list=netwatch-wan-targets

then Netwatch

/tool netwatch export
add host=8.8.8.8 interval=15s
add host=8.8.4.4 interval=15s
add host=208.67.222.222 interval=15s
add host=208.67.220.220 interval=15s

#changed to

/tool netwatch export
add host=8.8.8.8 interval=5s
add host=8.8.4.4 interval=5s
add host=208.67.222.222 interval=5s
add host=208.67.220.220 interval=5s

Now, after the changes, I have stable internet connection BUT, the script is not working. When I tested and manually disabled ‘LTE’ interface, it didnt see the changed state and kept on being disabled. Same happens when i manually disable VPN interface… it says disabled until i manually enable it again.

/ip route
add check-gateway=ping comment=VPN_Ping_Check#1 distance=1 dst-address=8.8.4.4/32 gateway=VPN
add check-gateway=ping comment=VPN_Ping_Check#2 distance=1 dst-address=208.67.220.220 gateway=VPN

VPN interface, not IP

I thought you only told me to set up routes for VPN as the other two, if VPN is dont get the IP of LTE directly. Basically, when VPN is online all 4 respond to IP of L2TP server. But when L2TP tunnel is down, they still all respond to LTE gw IP, thus all 4 IP always either UP or DOWN

P.S. what is your native language, if you dont mind me asking. English is my second language two.

That’s different to my understanding on how it should work. Please show me the output of /ip route print when VPN is up and when it is down. When obfuscating addresses, replace them with meaningful strings.



Czech. Some other languages supported too :slight_smile: