SIP Packets dropped unless Torch running

Hi all,
Router: hEX 6.48.1. Single WAN, Simple Masquerade NAT rule, .

Bit of a strange one here - in a recent update something strange has started happening with SIP packets. Nothing was changed in the configuration, and it’s been working fine for 2 years, hence my thinking that this has happened with an update.

All our IP phones keep simultaneously unregistering on the LAN (SIP server is Azure cloud based).

However (this is the odd bit)… if you start running Torch looking for packets on SIP ports… it all starts working again. Close Torch, and within a matter of minutes all phones are unregistered again.

For a “read only” tool, I’m a bit lost as to how this is influencing behaviour of the firewall, and making it start behaving how it should be (and how it has been for the last two years!).


Any thoughts/ideas gratefully recieved!



Andy

Without seeing your config, it is just a guessing game.
Torch disables a couple of things while running, i.e. Fasttrack, so if you have perhaps mangle rules for the phones and have fasttrack enabled, disable it, restart router and test
If it does not solve the problem, post your config between code brackets

@CZFan - thanks for that. I tried disabling Fasttrack but it’s made no difference.

Please find config below:

# mar/10/2021 11:50:13 by RouterOS 6.48.1
# software id = DWNV-LDG5
#
# model = RouterBOARD 750G r3
# serial number = 6F3807E26367
/interface bridge
add admin-mac=64:D1:54:75:A3:34 arp=proxy-arp auto-mac=no comment=\
    "created from master port" name=bridge1 protocol-mode=none
/interface ethernet
set [ find default-name=ether1 ] speed=100Mbps
set [ find default-name=ether2 ] name=ether2-master speed=100Mbps
set [ find default-name=ether3 ] speed=100Mbps
set [ find default-name=ether4 ] speed=100Mbps
set [ find default-name=ether5 ] speed=100Mbps
/interface pppoe-client
add add-default-route=yes disabled=no interface=ether1 keepalive-timeout=\
    disabled name=PPPoE_WightWireless user=fwaxxxxxx@gointernet.co.uk
/interface l2tp-server
add name=l2tp-in1 user=andy.squibb
/interface list
add exclude=dynamic name=discover
add name=mactel
add name=mac-winbox
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=SUKL_Mikrotik
/ip ipsec profile
set [ find default=yes ] dh-group=modp1024 enc-algorithm=aes-256 \
    hash-algorithm=sha256
/ip ipsec proposal
set [ find default=yes ] auth-algorithms=sha256,sha1 enc-algorithms=\
    aes-256-cbc,aes-128-cbc
/ip pool
add name=dhcp ranges=192.168.2.101-192.168.2.239
add name=dhcpVPN ranges=192.168.2.240-192.168.2.250
/ip dhcp-server
add address-pool=dhcp authoritative=after-2sec-delay disabled=no interface=\
    bridge1 lease-time=1w3d name=defconf
/ppp profile
add bridge=bridge1 local-address=192.168.2.1 name=PPP_VPN remote-address=\
    dhcpVPN
/user group
set full policy="local,telnet,ssh,ftp,reboot,read,write,policy,test,winbox,passw\
    ord,web,sniff,sensitive,api,romon,dude,tikapp"
/interface bridge port
add bridge=bridge1 interface=ether3
add bridge=bridge1 interface=ether4
add bridge=bridge1 interface=ether5
add bridge=bridge1 interface=ether2-master
/ip neighbor discovery-settings
set discover-interface-list=discover
/interface l2tp-server server
set default-profile=PPP_VPN enabled=yes use-ipsec=yes
/interface list member
add interface=bridge1 list=discover
add interface=ether3 list=discover
add interface=ether4 list=discover
add interface=ether5 list=discover
add list=discover
add interface=bridge1 list=mactel
add interface=bridge1 list=mac-winbox
/ip address
add address=192.168.2.1/24 comment=defconf interface=bridge1 network=\
    192.168.2.0
/ip dhcp-client
add comment=defconf disabled=no interface=ether1
/ip dhcp-server lease
add address=192.168.2.99 client-id=1:0:d0:2d:f6:ef:4 comment=Alarm mac-address=\
    00:D0:2D:F6:EF:04 server=defconf
add address=192.168.2.66 client-id=1:b8:ca:3a:ba:1c:d mac-address=\
    B8:CA:3A:BA:1C:0D server=defconf
/ip dhcp-server network
add address=192.168.2.0/24 comment=defconf gateway=192.168.2.1 netmask=24
/ip dns
set allow-remote-requests=yes servers=8.8.8.8,8.8.4.4
/ip dns static
add address=192.168.2.1 name=router
/ip firewall filter
add action=fasttrack-connection chain=forward comment="defconf: fasttrack" \
    connection-state=established,related disabled=yes
add action=accept chain=forward comment="defconf: accept established,related" \
    connection-state=established,related
add action=drop chain=forward comment="defconf: drop invalid" connection-state=\
    invalid
add action=drop chain=input comment="DROP DNS" dst-port=53 in-interface=ether1 \
    protocol=tcp
add action=accept chain=input dst-port=5060 in-interface=PPPoE_WightWireless \
    protocol=tcp src-address=51.140.191.159
add action=accept chain=input dst-port=5060 in-interface=PPPoE_WightWireless \
    protocol=udp src-address=51.140.191.159
add action=drop chain=forward comment=\
    "defconf:  drop all from WAN not DSTNATed" connection-nat-state=!dstnat \
    connection-state=new in-interface=ether1
add action=accept chain=input in-interface=ether1 protocol=ipsec-esp
add action=drop chain=input comment="DROP DNS" dst-port=53 in-interface=\
    PPPoE_WightWireless protocol=udp
add action=drop chain=input protocol=icmp
add action=accept chain=input connection-state=established
add action=accept chain=input connection-state=related
add action=drop chain=input dst-address=193.31.66.165 dst-port=80 in-interface=\
    PPPoE_WightWireless protocol=tcp
add action=drop chain=input in-interface=ether1 protocol=tcp src-port=8080,80
add action=drop chain=input disabled=yes in-interface=ether1 log=yes
/ip firewall nat
add action=masquerade chain=srcnat out-interface=PPPoE_WightWireless
add action=dst-nat chain=dstnat dst-port=10001 in-interface=PPPoE_WightWireless \
    protocol=tcp to-addresses=192.168.2.99 to-ports=10001
add action=dst-nat chain=dstnat dst-port=10001 in-interface=PPPoE_WightWireless \
    protocol=udp to-addresses=192.168.2.99
add action=dst-nat chain=dstnat disabled=yes dst-port=5060,5061 in-interface=\
    PPPoE_WightWireless protocol=tcp to-addresses=192.168.2.21
add action=dst-nat chain=dstnat disabled=yes dst-port=5060,5061 in-interface=\
    PPPoE_WightWireless protocol=udp to-addresses=192.168.2.21
add action=dst-nat chain=dstnat disabled=yes dst-port=5090 in-interface=\
    PPPoE_WightWireless protocol=tcp to-addresses=192.168.2.21
add action=dst-nat chain=dstnat disabled=yes dst-port=5090 in-interface=\
    PPPoE_WightWireless protocol=udp to-addresses=192.168.2.21
add action=dst-nat chain=dstnat disabled=yes dst-port=9000-9049 in-interface=\
    PPPoE_WightWireless protocol=udp to-addresses=192.168.2.21
/ip firewall service-port
set sip disabled=yes
/ip service
set telnet disabled=yes
set ftp disabled=yes
set ssh disabled=yes
set api disabled=yes
set api-ssl disabled=yes
/ip ssh
set allow-none-crypto=yes forwarding-enabled=remote
/ip upnp
set enabled=yes
/ppp secret
add name=andy.squibb profile=PPP_VPN
/system clock
set time-zone-name=Europe/London
/system identity
set name=SUKL_Mikrotik
/system resource irq rps
set ether1 disabled=no
set ether3 disabled=no
set ether4 disabled=no
set ether5 disabled=no
set ether2-master disabled=no
/system routerboard settings
set auto-upgrade=yes
/system scheduler
add interval=1d name=3amReboot on-event="/system reboot" policy=\
    ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon \
    start-date=nov/01/2018 start-time=03:00:00
add disabled=yes interval=5m name=WAN_checkalive on-event=\
    "/system script run checkWAN" policy=\
    ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon \
    start-date=jan/01/1970 start-time=00:00:00
/system script
add dont-require-permissions=no name="Restart ether1" owner=admin policy=\
    ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon source=":gl\
    obal runningRestartEther1\
    \n:if ([:len \\\$runningRestartEther1] = 0 || \$runningRestartEther1 = 0) do\
    \n={\
    \n    /interface {\
    \n        :set runningRestartEther1 1\
    \n        :local o [find name=\\\"ether1\\\"]\
    \n        set \\\$o disabled=yes\
    \n       :delay 5\
    \n       set \\\$o disabled=no\
    \n       :delay 2\
    \n       :set runningRestartEther1 0\
    \n   }\
    \n}"
add dont-require-permissions=no name=checkWAN owner=admin policy=\
    ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon source=":if\
    \_([/ping 8.8.8.8 interface=PPPoE_WightWireless count=3] = 0) do={;\
    \n:log info \"restarting WAN (ether1) interface\";\
    \n[/interface disable ether1];\
    \n:delay 5;\
    \n[/interface enable ether1];\
    \n}"
/tool graphing interface
add interface=PPPoE_WightWireless
/tool mac-server
set allowed-interface-list=mactel
/tool mac-server mac-winbox
set allowed-interface-list=mac-winbox
/tool netwatch
add disabled=yes down-script=\
    "/system scheduler set [find name=\\\"Restart ether1\\\"] disabled=yes" \
    host=8.8.8.8 up-script=\
    "/system scheduler set [find name=\\\"Restart ether1\\\"] disabled=no"
/tool sniffer
set streaming-server=0.0.0.0:sip

Did you restart the router after disabling fast track? if not, the fasttracked connections in connection tracking table will stay active till timeout, and if active traffic on these connections can stay active indefinitely.

Your firewall accepts established related packets, so should the phone initiate connection out, it should be allowed back in.

With the current config you posted, there are 2 lines that is not correct, see below:
add action=accept chain=input dst-port=5060 in-interface=PPPoE_WightWireless protocol=tcp src-address=51.140.191.159
add action=accept chain=input dst-port=5060 in-interface=PPPoE_WightWireless protocol=udp src-address=51.140.191.159

These can cause the incoming SIP packets to be accepted by the router, but the router is not going to do anything with them as it is not a SIP device, so remove these and test, the accept established / related rules in forward chain should handle phone traffic

I also read somewhere on forum that someone had to disable MNDP protocol for their phones to register after some recent update, not sure if these phones / pbx mentioned in this post was all on the same LAN or not

Those rules were partially my fault - I’d created them while fault finding at the start of the week.

I had rebooted after disabling fast track, but I’ve just tried again after removing those two rules to no avail.

I’ve had MNDP disabled on all interfaces for the last 15 minutes and it does seem to have done the trick… fingers crossed, and I’ll see if it’s still working over the next few hours!


Thanks @CZFan!

So yep - definitely was the MNDP that was causing the issue. Fasttrack is enabled and it’s absolutely rock solid again, so it’s all boiled down to the SIP packets falling foul of MNDP.

I had a search of the forums but couldn’t find the post you mentioned - was going to see whether it had been escalated as a bug.

Thanks for your help!


Andy

It has been reported, and Mikrotik has already identified the root cause and is working on a fix, or it is even present in 6.49.beta-something.

If I get it correctly, it is actually because some SIP phones auto-learn the MAC address of the gateway IP from the received MNDP packets rather than using only ARP responses to fill their ARP table, and the frames carrying the MNDP packets are sent from the MAC address of the physical interface even if that interface is a member port of a bridge. And when a frame arrives to a MAC address of a physical interface rather than the one of the bridge, RouterOS ignores it, because the bridge (as in “virtual switch”) can see no reason to forward it to the bridge port (as in “the virtual Ethernet port of the router connected to the virtual Ethernet port of the virtual switch”).

My guess is that the reason why it works when torch is running is that activation of torch bypasses the standard L2 forwarding on the bridge, so the IP stack gets the frames from the phone even though they come with a wrong destination MAC address.

@networquk, pleasure, glad I could be of some assistance

@sindy, you are a blessing to the Mikrotik community, thank you and also thanks for the explanation, makes more sense to me now

https://forum.mikrotik.com/viewtopic.php?f=21&t=171035&p=840920&hilit=Disable+mndp#p840552

My guess is that the reason why it works when torch is running is that activation of torch bypasses the standard L2 forwarding on the bridge, so the IP stack gets the frames from the phone even though they come with a wrong destination MAC address.

If torch is activated on an interface, all ingress packets on that interface are forwarded to the CPU. Regardless of MAC adresses, VLANs, IP adresses, Bridge rules, fastpath etc.
Torch is a SW running on the CPU. So this is required for torch being able to see all incoming packets on an interface.


For a “read only” tool, I’m a bit lost as to how this is influencing behaviour of the firewall, and making it start behaving how it should be (and how it has been for the last two years!).

Torch is readonly in a sense it does not modify or produce any network packets. But it changes what packets are visible to the CPU what can change behaviour because SW sees packets it normaly would not. In the current case, this “fixes” reception of incoming SIP packets sent to the wrong MAC address as decribed by @sindy.

So what you are saying is that I was wrong when mentioning the bridge, because it is actually the switch chip hardware that doesn’t forward those frames to the CPU port, not the bridge software running on the CPU that doesn’t forward them to the L3 interface. I.e. setting hw=no on the ports through which the phones are connected should do the same thing as running the torch does: let those frames be delivered to the CPU port.

I.e. setting hw=no on the ports through which the phones are connected should do the same thing as running the torch does: let those frames be delivered to the CPU port.

This will not help. The switch logic is the same, HW or virtual in SW. There is no reason for it to forward such a packet to the CPU port.
Except torch tells it to do so.

What proably would work is to create a switch host rule forwarding packets going to the physical ethernet port MAC to the switch CPU port. But this most likely has unwanted side effects to other functions and might impact security. Disabling discovery seems much easier and safer.

The root cause is discovery makes physical bridge port MAC adresses visible to the network. But physical bridge ports inherently can not have a MAC adress. This is fixed with the current beta.

Still I’m not convinced of the concept of SIP phones snooping MACs out of network equipment discovery packets instead of using proper ARP. Probably its some phone autoconfig thing going wrong.