Router "blocks" some SIP INVITES but not all - misconfiguration or bug?

Hello, I’m using an RB4011iGS+ which shows some strange behavior for incoming SIP calls. Probably it’s a misconfiguration on my end but I cannot find it.

Problem: Some incoming SIP calls are always blocked but only when calling an number of VoIP-Provider D. According to a specialist of provider D the SIP INVITE is blocked at my end. No firewall rules are in place that are based on Layer 7 analyses or affect the VoIP devices. However, adding a forward “accept all” incoming rule and outgoing rule, respectively, for the VoIP devices at the beginning of the rules list did not help. Replacing the RB4011 with a simple layer 2 switch solves the problem.

Setup

Modem-Router (DHCP Server 192.168.1.1) --> RB4011 v6.47.1 (Port1, DHCP Client 192.168.1.2)
  route 192.168.88.0/24                           Ports 2-10 (bridge1) -> VoIP device 1 (Yealink), VoIP device 2, some computer ...

Details: I’m using three different VoIP Provider (S, P, and D). All accounts are registered in my Yealink T48S (and other devices). Outgoing calls always work. If mobile phone 1 (provider M1) tries to call my number D1 (VoIP provider D), it does not ring on my Yealink, nothing happens and after two minutes or the the mobile phone stops trying. If mobile phone 2 (provider M2) calls the same number D1, it does ring and the call can be established.
Surprisingly, if mobile phone 1 (provider M1) tries to call my number S1 (VoIP provider S) or number P1 (VoIP provider P), the Yealink is ringing and calls can be established. I’m no SIP expert but I assume that provider M1 always sends the same SIP INVITE format no matter what provider is called. If Rb4011 is blocking the provider M1 it should be doing it even if the content of the SIP invite differs (same target device IP but different provider account).

Why I think the RB4011 causes the problem? Tests indicate that the problem is independent of specific device that register the VoIP accounts. Additionally, when I replace the RB4011 with a layer 2 switch all calls are coming through as expected. Although I’ve no firewall rule in place that blocks or redirects SIP traffic and I’m not using Layer7 parsing at all, I cannot rule out a misconfiguration.

Maybe someone did already come across such a behavior and can give me a hint. I’m not sure what config information might be necessary to understand my configuration and problem, respectively. Therefore, I provided an excerpt that I thought might be applicable, see below. If it’s insufficient, please indicate what is needed.

Thanks for your help.


config excerpt
SIP helper is currently deactivated, problem exists if activated no matter if direct-media is checked or not:

 /ip firewall service-port print
Flags: X - disabled, I - invalid 
 #   NAME                                                    PORTS
 4 XI sip                                                    5060 5061
/ip dhcp-server print 
Flags: D - dynamic, X - disabled, I - invalid 
 #    NAME                             INTERFACE                           RELAY           ADDRESS-POOL                           LEASE-TIME ADD-ARP
 0    dhcp1                            bridge1                                             pool88                                 30m       
 
 /ip address print 
Flags: X - disabled, I - invalid, D - dynamic 
 #   ADDRESS            NETWORK         INTERFACE                                                                                                   
 0   192.168.88.1/24    192.168.88.0           bridge1

When you replace the 4011 by an L2 switch, you remove one layer of NAT and routing - the Yealink gets its IP address from the modem-router (192.168.1.x) rather than from the 4011.

The INVITEs from the exchange only reach the Yealink due to the fact that a pinhole in the NAT (or both NATs) has been created by means of the REGISTER message sent by the phone towards the exchange. The pinhole has a lifetime, which is 3 minutes in case of RouterOS default setting; with SIP helper enabled, the timeout is 60 minutes no matter what the actual lifetime of the registration is. So first of all, you should run /ip firewall connection print interval=1s where dst-address~“ip.of.D’s.SIP.server” and see whether it is showing any connection and whether its lifetime decreases and gets reset to 3m every now and then. As you say it wasn’t working when the SIP helper was active, I don’t think the pinhole expires because keepalives are sent too rarely, but start from this check anyway.

Most phones don’t like when the request-URI contains some other IP address than their own. There may be a SIP ALG in the modem-router and it may overwrite some fields of the REGISTER and/or INVITE, or the exchange of D may handle the Contact in the registration in a slightly different way than those of S and P, so the INVITE may contain a wrong IP in request-URI.

So the next step would be to sniff the traffic on the 4011 into a file - first configure /tool sniffer on the 4011 to sniff into a file, then run /tool sniffer quick ip-address=ip.of.D’s.SIP.server and make the call, then stop the sniffer (press q), download the file and use Wireshark to look whether the INVITE has arrived at all, and what IP addresses are in it. You should see each packet three times, once from WAN with local-side address 192.168.1.2, and twice with the 192.168.88.x address of the Yealink as the local-side one - once from the LAN bridge and once from the physical port of the bridge. So if all three “copies” of the INVITE are there, the Yealink doesn’t like the contents of the INVITE and silently ignores it; if you can see it only once, it’s from the WAN interface and the firewall doesn’t let it in; if you can’t see it at all, the modem-router blocks it or the D’s gear has sent it somewhere else or hasn’t sent it at all.

Dear sindy, thank you for your explanation and the hints to track the error. I really appreciate it! I did what you suggested to track down the reason for the blocked/dropped INVITEs. I don’t understand why RB4011 seems to drop some INVITEs. Maybe I overlook something.


Well, I thought that my configuration between RB4011 and modem-router does not use NAT because of the static route. Just to clarify, I’ve no NAT from RB4011 to the modem-router.
As you mentioned the SIP ALG of the modem-router, it is deactivated. I assume that it does not alter the SIP packets in any regard.


If I’m not mistaken, the only NAT is done by the modem-router (LAN to WAN). Anyway, if this was true, shouldn’t all incoming calls have the same problem?


The SIP registrations are reset to 3m every now and then. When calling D1 with M1 no connection is added. When calling D1 from M2 another connection shows up (~10s timeout).


To be honest, I don’t understand the rational behind that. I’m using a Yealink, a Gigaset, and even tried an Asterisk server. How likely is it that all of them “don’t like” something that causes the problem? More important I don’t understand how the phones could “dislike” something in the INVITE when placed behind RB4011 but “like” the same INVITE header when not behind RB4011.


I thought so too. But calls from M2 to D1 should be blocked too, right? The specialist of provider D was not clear about it if the servers of provider D are rewriting calls from M2 but not from M1.


Done it. Started sniffing, connected VoIP device and waited for registration. Called D1 from M2 and then called D1 from M1 and after that called D1 from M2 again.
I saw the following behavior:

3* REGISTER
2* 401 Unauthorized
3* REGISTER
2* 200 OK
3* SUBSCRIBE
2* 489 Bad event
// now M2 calls D1
2* INVITE sip:D1-phone-number@my-public-ip:port SIP/2.0, To: <sip:D1@provider-D.tld>, From: <sip:mobile-number@provider-M2.tld>
3* 100 Trying
3* Message: Binding Request
2* Message: Binding Response
6* UDP 5160 -> 5060 packets
3* 180 Ringing
// I did not answer
2* Request: CANCEL sip:D1-phone-number@my-public-ip:port
3* 200 OK
3* 487 Request Cancelled
2* Request: ACK sip:D1-phone-number@my-public-ip:port
// end of successul calling/ringing.

// Called D1 from M1, nothing registered but 3 UDP packets. No idea if these are related to the call
3* UDP 5160 -> 5060
// few seconds later I called D1 with M2 again
2* INVITE sip:D1-phone-number@my-public-ip:port
... same as above.



When calling from M1 to D1 no packets where shown. I could rule out that the modem-router blocks them because I added one VoIP device behind the RB4011 and one directly at the modem-router. The latter rang.
As far as I understand it: INVITEs do get routed successfully from provider D to and through my modem-router. The VoIP device placed “before” the RB4011 gets all INVITE no matter what provider is called and what is calling. The VoIP device behind the RB4011 does get some INVITE (M2 to D1 and all to provider S and P).
I had no time to switch the two VoIP devices to test this setup. But I don’ think it will make any difference.

Is there a mechanism/option other than /ip firewall that could be configured to drop packets?

Thanks for your help.
Best!

It took some more time but now I had the opportunity to set up an hAP lite v6.47.1 with default configuration (NAT/masquerading active). I just changed the standard network .88 to .99 so that i can run both MikroTik devices on the same L2 switch. Route for 192.168.99.0/24 is enabled in modem-router. All calls are coming through.
But I did some testing and could reliably switch between working and non-working config.
First, I disabled all firewall rules → all calls still coming through.
Second, and with deactivated firewall rules, I deactivated NAT/masquerading. Just in case I rebooted both the hAP lite and the VoIP device.

/ip firewall nat print
Flags: X - disabled, I - invalid, D - dynamic 
 0 X  ;;; defconf: masquerade
      chain=srcnat action=masquerade out-interface-list=WAN log=no 
      log-prefix="" ipsec-policy=out,none

After that D1 could be called with M2 but not with M1. Not having IP masquerading is causing the problem. Enabling NAT/masquerading and rebooting the VoIP device will get me a working configuration again.

Maybe I’m just overlooking a basic concept here and this is exactly how it should work? If I’m not mistaken this is double NAT. Why does double NAT work better than single NAT plus routing? I’m so confused.

However, does anyone have an idea how to change my RB4011 config to get it working? Hopefully not using double NAT. Would be fine with me if I could just NAT/masquerade the ether-ports for the VoIP devices and leave the rest alone. Not sure if this is possible.

Thanks for your patience and advise.
Best

SIP over NAT is a mess. I put my phone on a routable (public) IP address and all my issues were gone.
Before, I had similar problems as you had. It worked fine for one provider, it refused to work for another.
Never went down to the bits to analyze what is really going on. Of course I realize that not everyone has
IP addresses to spare… so it likely does not help you at all. Of course you can consider using IPv6.

And that’s the point. When configuring a firewall, the best approach is to drop everything except what you are sure you want to let through. When posting your configuration so that others could find an error, the best approach is to post everything except what could disclose your identity or allow an attack to your device - i.e. public IP addresses, usernames, and passwords/secrets. But replacing public IPs by just x.x.x.x may hide the internal logic of the configuration. See my automatic signature for further hints.

I had no idea whether you have kept the default masquerade rules in place on the 4011, so I guessed, and I choose the wrong variant :slight_smile:


Look at the ports after the addresses. I suppose the connection with 10s timeout is an RTP one, carrying the alert tone as audio.


Quite a lot. SIP looks simple at second glance, but at the third one there is a lot of strict requirements.


That’s a puzzle to me too, and the only idea I have is the ALG on the router-modem which may handle differently addresses from its LAN subnet (192.168.1.x) and addresses from other subnets (in your case, 192.168.88.x). You say that you’ve deactivated it, but I’ve seen plenty of devices where you deactivate it and it interferes anyway. And combined with the behaviour of the D’s exchange, it may result in a problem.


Did you sniff on the 4011 the way I’ve described, or on the Yealink itself? If on the 4011, the conclusion that the INVITE has reached the 4011 is wrong, as it was not in the capture.

As for the UDP packets, some SIP providers send UDP packets using the same ports as if they carried SIP messages, but with some contents that is ignored by SIP stacks. Hence the firewalls (that don’t care about the payload of the UDP) treat them as a legal traffic of the pinhole and don’t close it, whereas the SIP stack at worst complain to the log but do nothing harmful.



There is one more idea which just dawned on me after reading that the From header contains the mobile operator’s domain name (at least in case of M2): packet fragmentation and reassembly. No matter how funny it may seem, the number of characters of the IP address of the phone may be different when it gets the address from the modem-router and when it gets it from the 4011, and the IP address of the phone appears several times in the message. So if the domain name in the From is longer in the call from M1 than in the call from M2, and the IP address of the phone is longer too, the size of the INVITE from M1 may exceed the MTU of the link and either the packet is simply lost as it is sent because it doesn’t fit to the MTU (there is no path MTU discovery for UDP), or it gets properly fragmented but the reassembly fails on the modem-router, and hence the packet is not forwarded to the 4011.

To confirm this, you’d have to change the LAN side IP address, IP pool, and IP DHCP server network at the 4011 to 192.168.8.0/24, and reduce the pool to 192.168.8.2-192.168.8.9, to make sure that the IP addresses won’t be longer than those provided by the modem-router. And then try the call from M1 again.


This might be explained as above if the ALG in the modem-router is doing something about the addresses.

Thank you both for your replies.

@pe1chl
That’s an interesting idea. I’ll check if i can use IPv6 but this will take some time.

@sindy
I triple checked the modem-router and SIP ALG is deactivated. I talked to the technical support of the vendor. They made clear that when SIP ALG is deactivated the modem-router does not alter the packets in any way. However, they suggested to put the modem-router in modem-only mode and remove it from the equation. I will try that but it will take some time till I have the opportuinity to test that.

In regards to sniffing, I did exactly what you suggested. Sniffed at RB4011, saved to file, and analyzed it on computer. But let me clarify what I meant with “INVITE reached RB4011”. From the fact that the INVITE gets to the VoIP device when it is connected to the L2 switch behind the modem-router, I came to the conclusion that the INVITE is not blocked by the modem-router. Moreover, I thought that only packets are shown after firewall rules apply. On second thought, that does not make sense, sorry.

To test your MTU theory, I changed IP network to 192.168.9.0 and assigned 192.168.9.9 to the VoIP device. Unfortunately, the INVITE did not get to the hAP lite (i sniffed the packages).

Thank you again for your thoughts. I’ll post an update after further tests.
Best

First, let me thank you for your support.

Due to time constraints, I chose to add the following firewall rule on my RB4011 as a workaround for my problem:

/ip firewall nat
add action=masquerade chain=srcnat comment="workaround for VoIP problem" ipsec-policy=out,none out-interface-list=WAN

The L2 switch is removed and all VoIP device are connected to RB4011. Not sure why I thought, this might not be compatible with my setup but I was wrong.

If I find some time in the near future to test IPv6 and modem-only mode, I’ll post an update.

Best

Do you stll have your modem in bridge mode (modem only)?

Thanks for your reply.

The modem-router was not and is not in bridge/modem-only mode. It’s in router mode, i.e. handling PPPoE. If I find more time, it one of two things I want to try.

Best

Hello, just a small update.
I wanted to test different settings and, therefore, I disabled the IP masquerading on the RB4011. No double NAT anymore just static routes.
I restarted my VoIP phone and, surprisingly, everything was working. Any local line could be reached by any mobile phone.
Maybe the provider D changed something on his end or the blocking effect needs some time to take effect? No idea. But in case I can find out what was going on, I’ll post again.
Best.