I recently discovered some weird behavior with some grandstream VoIP phones.
Sometimes the incoming calls are not working properly and the phones do not ring. The phones are connected to the German provider 1&1, whose servers show some weird behavior I alrady discovered. When registering at sip.1und1.de this can be two different IP addresses. When now a INVITE comes from 1&1 it can be one of those two addresses. The addresses picked seems to be random/untracable. I’m not completely sure whether my sip problems come from this issue or from some router misconfiguration of mine. Maybe somebody can give me some advice. If my configuration seems right, will there be an option to configure the firewall the way that the grandstream phone seems only one of those two addresses?
My Router is a RB3011 running ROS 7.15.3
Here are the necessary excerpts of my firewall
I’m not sure whether endpoint-indepenant-nat is the right thing to avoid source port randomization in this place.
Additionaly one should say that the SIP-ALG is totally disabled, and the phones are configured with the different non-standard source ports 5066 5067 and the nat-traversal option is set to “keep-alive”.
The phones always show the registered state and are able to place outgoing calls all the time. Only some incoming calls are lost.
It is not a big deal that a SIP provider gives you a choice of multiple IP addresses to register to, but normal SIP providers understand how a typical firewall and NAT works and send incoming calls to registered users from the same public IP and port to which said users have chosen to register. But maybe at 1&1, 1 und 1 gibt 3? Unless your Grandstream can register to two registrars simultaneously (most phones I came across support two registrars only in primary/secondary mode, i.e. they only register to the backup one if they fail to register with the primary one), there may be issues.
The action=src-nat or action=masquerade in Mikrotik only changes the source port if it cannot be avoided (example - two phones use the same local port, 5066, and register to the same SIP server, so if the REGISTER packets leaving the WAN would keep the source port 5066 unchanged for both phones, the connection tracking would not know to which phone to forward the response as all the other fields of the response (source address, source port, destination address) would be the same too) or if you have explicitly asked for it by specifying some to-ports range as one of the output parameters of the NAT rule.
So you do not need the action=endpoint-independent-nat to avoid randomisation of source port, but it should handle the issue of the SIP provider sending the calls randomly from different SBCs even without any dst-nat rules, as it in a way creates their equivalents dynamically. When you send a packet from 10.42.0.5:5066 to any UDP socket in the internet and randomise-ports is set to no, a packet from any UPD socket in the internet that arrives to port 5066 on the public address of the router will get forwarded to 10.42.0.5:5066. The difference from a manual dst-nat rule is that such a packet will get dropped if it comes before the phone has sent something to the internet, whereas a manually set dst-nat rule would create a new pinhole in such a case.
I have never tested whether there is no conflict if the “dynamic dst-nat rule” gets created by endpoint-indepednent-nat and a manually configured dst-nat rule matching on the same destination address and port exists. So I would suggest you disable the two dst-nat rules, remove the connections created by the phones’ REGISTER requests from the list, force the phones to re-register, and then run /tool/sniffer/quick port=5067 while testing incoming calls. The goal would be to see how far the INVITE from the “wrong” SBC socket gets. If you can see it to arrive to WAN but not to proceed to the phone, the firewall does not work as expected (maybe the filter needs some adjustment to accept packets whose destination addresses have changed thanks to the tracked connections created by endpoint-independent-nat); if it does make it to the phone, it may be the phone that only accepts incoming traffic from a particular source, which may or may not be configurable.
Just for the case, what does /ip/firewall/filter print where chain=forward show when the phones are registered (don’t forget to obfuscate public addresses, if any)?
Thanks for the clarification you gave me about the endpoint-indepenant-nat. I wasn’t absolutely sure about what it actually does, I just thought I would need it because it was the only nat operation that offered to disable source port randomization.
So here’s my output when the phones are registered using the old setup from my above post:
(I’d cut out some disabled rules and hotspot related stuff)
2 ;;; DDOS Protection
chain=forward action=jump jump-target=detect-ddos connection-state=new
3 ;;; Accept established, related, untracked
chain=forward action=accept connection-state=established,related,untracked log=no log-prefix=""
5 ;;; Allow forwarding to internet from LAN-NET list
chain=forward action=accept in-interface-list=LAN-NET out-interface-list=WAN
6 ;;; Jump to ICMP filters
chain=forward action=jump jump-target=icmp protocol=icmp
7 ;;; Guests may only access the WAN
chain=forward action=drop in-interface-list=GUEST out-interface-list=!WAN log=no log-prefix=""
8 ;;; Allow DSTNAT to Server from WAN
chain=forward action=accept connection-nat-state=dstnat dst-address=10.60.0.3 in-interface-list=WAN log=no log-prefix=""
12 ;;; Allow to Server from LAN
chain=forward action=accept dst-address=10.0.0.2 in-interface-list=LAN-NET log=no log-prefix=""
19 ;;; Allow CoIoT requests to Homeassistant
chain=forward action=accept protocol=udp dst-address=10.0.0.2 in-interface=vlan-SMARTHOME dst-port=5683
20 ;;; Allow admin addresses to access each network
chain=forward action=accept src-address-list=admin_addresses
21 ;;; Server > Smarthome
chain=forward action=accept src-address=10.0.0.2 out-interface=vlan-SMARTHOME log=no log-prefix=""
22 ;;; Smarthome > Server
chain=forward action=accept dst-address=10.0.0.2 in-interface=vlan-SMARTHOME log=no log-prefix=""
27 ;;; Drop invalid
chain=forward action=drop connection-state=invalid
28 ;;; Drop tries to reach non public addresses from LAN
chain=forward action=drop dst-address-list=not_in_internet in-interface-list=LAN out-interface-list=!LAN log=yes log-prefix="!public_from_LAN"
29 ;;; Drop all from WAN not DSTNATed
chain=forward action=drop connection-state=new connection-nat-state=!dstnat in-interface-list=WAN log=yes log-prefix="!NAT"
30 ;;; Drop incoming from internet which is not public IP
chain=forward action=drop dst-address-list=!not_in_internet in-interface-list=WAN log=yes log-prefix="!public"
31 ;;; Drop packets from LAN that do not have locally used IP
chain=forward action=drop src-address-list=!allowed_to_router in-interface-list=LAN log=yes log-prefix="LAN_!LAN"
32 ;;; Drop bad forward IPs
chain=forward action=drop src-address-list=no_forward_ipv4
33 ;;; Drop bad forward IPs
chain=forward action=drop dst-address-list=no_forward_ipv4
34 ;;; Drop everything not forwarded
chain=forward action=drop log=no log-prefix="NOFWD"
I will then soon start again a debugging session where I disable the rules as desired and check whether or which problems persist with the phones. That’s a great point to start from and to do packet capturing to filter out if the problem originates from the firewall or from the phone.
What might be difficult is to capture the case when the source IP of incoming calls spuriously changes. As this happens not very often it might take a bit of time
OK, so no dynamic rules there. I wasn’t sure how exactly Mikrotik has implemented the part of the RFC that mentions the filtering requirements related to endpoint-independent NAT, and as no rules have been added dynamically to filter (which would be kind of similar to dst-nat rules being added dynamically as a result of UPnP requirement from a LAN side device), I remain as unsure as I was before - maybe connection tracking sets the connection-state of an incoming packet towards a socket allocated by an endpoint-independent NAT rule to established (or maybe related?) even if said packet came from a previously unseen source, maybe it doesn’t. So to reduce the number of possible issues, would you mind adding a permissive rule that would accept packets towards UDP ports 5066 and 5067 at the private addresses of your phones after (below) the “accept established, related, untracked” one? I mean, let’s go step by step and come back to this detail once the rest is clear.
Other than that, your action=drop rules that do not log anything are in fact redundant, as the last one drops everything that wasn’t dropped (or accepted) by any of the previous ones.
As for the randomness of the occurrence of the issue, if I had to handle that, I would set the sniffer to match on UDP ports 5066 or 5067 and write to file, and let it run until the issue happens. It can run even if you log off from the router, and unless the phones serve a busy reception desk, most of the SIP traffic are registration extensions, so even a few tens of megabytes of file size are sufficient for days of SIP traffic. And the 3011 has an USB port so unless you use it for connectivity, you can let the sniffer save the file to a flash and set the limit to hundreds of megabytes.
Just a few seconds after adding those rules their packet counter incremented to 1. Maybe (hopefully) those two missing rules were just the problem? We will see.
Those redundant drop rules I just implemented for diagnostic purposes, to have the counters for some special cases of denied traffic.
Well, I have no hands on experience with SIP but shouldn’t it be sufficient to configure a STUN server and configure your custom SIP service ports under up/firewall/service-ports/sip? No need for all this firewall filter hell.
memory-limit and file-limit are two different things, and the 3011 does not have that much memory, please fix that ASAP.
Also, the default value of filter-operator-between-entries is or, so try first with and and if that prevents the sniffer from seeing anything at all, set filter-ip-protocol to “”.
And when at it, the correct suffix so that Windows would know that the application to use for the file is Wireshark would be .pcap.
Ah, I forgot that there are also the counters, not just the log.
So I quickly fixed the issue with file-limit and memory-limit, I kind of mixed that up.
Using the and operation does not work. I changed it back to or and using the filter as mentioned above, it acutally works and filters out the right SIP packages.
So now lets see wether the problems occur.
Then another short question to you, It seems like you’re experienced with VoIP. Would you suggest leaving the nat-traversal methods of the grandstreams to the keep-alive mechanism or rather switch to STUN?
The whole purpose of STUN is to determine the behavior of the NAT behind which the phone is located, so that the phone could put the correct public address and port into the SIP headers and the SDP of the packets it sends. But STUN cannot determine the behavior of firewalls, so when the incoming call arrives from the address of the other SBC, the firewall may still drop it even though the destination address and port are “correct”. So an executive summary, STUN will not be useful (leaving aside that it can never be 100 % successfull because the Mikrotik firewall may change the source port if it needs to).
Maybe you could also talk to 1 und 1 and ask them whether that behavior is intentional or caused by a misconfiguration or by a failure of the “proper” SBC?
I think the issue with this big/main provider is that they won’t offer such a support for “home” users like me. Normally they deliver their own preconfigured boxes and do not support such custom setups with 3rd party routers and DSL modems. I often read in some other web forums about similar problems with those two addresses.
I still will collect some data and check whether this “random” change occurs and causes faulty behaviour. The interesting thing is that since I had added those two forwarding rules in the filter table I yet had no “stuck” calls anymore. Either this is just luck or those rules might capture the cases where the connection state is recognized from the endpoint-indepenant-nat/nat as new and would not be catched by the established,related,untracked rule. I’m not quite sure about that. I will check if the counters of those rule increment. And to not mess sth. up, I will stay at keep-alive and do not switch over to STUN.
Edit:
What I now also recognized, is that the firewall displays in the connection list a connection to both of those addresses.
I have never tested whether there is no conflict if the “dynamic dst-nat rule” gets created by endpoint-indepednent-nat and a manually configured dst-nat rule matching on the same destination address and port exists. So I would suggest you disable the two dst-nat rules, remove the connections created by the phones’ REGISTER requests from the list, force the phones to re-register, and then run /tool/sniffer/quick port=5067 while testing incoming calls. The goal would be to see how far the INVITE from the “wrong” SBC socket gets. If you can see it to arrive to WAN but not to proceed to the phone, the firewall does not work as expected (maybe the filter needs some adjustment to accept packets whose destination addresses have changed thanks to the tracked connections created by endpoint-independent-nat); if it does make it to the phone, it may be the phone that only accepts incoming traffic from a particular source, which may or may not be configurable.
A connection (as in “one” - it’s indistinguishable in German) or two connections? It is well possible that each phone has chosen a different address from the DNS response. So if you have four connections in total, FON Oben to (from) both SBC addresses and FON Unten to (from) both SBC addresses, it is OK. If you can see just two in total, it may still be fine - if the additional connections have been created by incoming INVITEs thanks to the added filter rules, they should be established with the address of the SBC as src-address and the public IP of your Tik ports 5066 and 5067 as dst-address. But unless the Grandstream engineers are really creative, the keepalives from a Grandstream phone will only refresh the connection towards the SBC it has registered to, so a connection supposedly created by an incoming INVITE from a “wrong” SBC will time out after some 3 minutes unless the SBC sends its own keepalives (many do). So sniffing for some time still makes sense.
Sorry for the confusion. I mean it displayed two connections for one phone and only one connection for the other phone.
And indeed I think the grandstream phones would keep-alive only to one address they registered to.
So just a summary of what I think that happens:
Case 1
Invite comes from the same address the phone registered to. Here kicks in the connection tracking of the endpoint independant NAT and the invite goes to the phone. Everything works well. The phone is ringing
Case 2
The invite comes from the secondary address. The phone does not keep-alive the connection to this address and the automated connection tracking of the router does not work. Here comes in the static dst-nat rule, which would catch this case. Normally (and now) the phones would ring. But without your hint i totally overseen that I would need some forwarding rules towards the phones for this to work!
Now those forwarding rules would e.g. yield the following log message:
FON forward: in:pppoe-out1 out:vlan-FON, packet-mark:VOIP connection-mark:VOIP connection-state:new,dnat proto UDP, 212.227.124.130:5060->10.42.0.11:5066, NAT 212.227.124.130:5060->(##.###.###.##:5066->10.42.0.11:5066), prio 5->0, len 1658
Prior to this missing rule those packets would have been kicked out by my last deny-any rules.
My further thoughts
What would be the best-practice approach to work all this around? Might it be an idea to disable endpoint independent nat from the FON network to the WAN to completely avoid automated connection tracking and just force the static rules? In my opinion this might work-around the issue with the two changing IP’s from 1&1.
In which way does the endpoint-independant-nat interfere with my static dstnat rules? Which takes precedence? I think the dynamic rules do…
Unfortunately I soon will be on holiday and cannot further investigate this stuff. I will probably keep recording the packets over this time and see what happens.
But in general many thanks to @sindy for your help and the hint with the missing forwarding rules.
Sounds funny to me to complain about being on vacation
There is no static priority between manually configured dst-nat rules and src-nat ones (including endpoint-independent-nat). Any incoming packet is checked against a list of tracked connections (unless an action=notrack rule in raw prohibits that); if it matches an existing connection, it gets processed accordingly, if it doesn’t, it is processed as an initial packet of a new connection, which includes matching it against the dstnat and srcnat chains of firewall table nat. NAT handling of mid-connection packets is determined by the outcome of handling of the initial packet.
As 1 und 1 uses UDP for SIP signalling and the firewall rules only look at the UDP layer (the SIP helper is disabled, but it would work the same even if it wasn’t), the incoming requests from the “proper” SBC are treated the same like “responses” to the REGISTER that has created the connection. So if you switch the phone off for long enough that its registration expires, no incoming calls to that phone will be possible any more, and 3 minutes after the expiration of the last REGISTER you can be sure that there are no tracked connections related to that phone any more. If you then switch the phone on again, it will register, and that way it will create a tracked connection whose dst-address will be the one of the randomly chosen SBC. Keepalives will keep this connection alive, and a corresponding dst-nat rule will never get hit, as explained above.
If the action of the rule in srcnat is the usual src-nat or masquerade one, an incoming INVITE from the “other” SBC will, however, be treated as an initiation of a new connection, so it may match a manually created dst-nat rule if a corresponding one exists.
I have to test how exactly the endpoint-independent-nat works before commenting on the case when there is no manually configured dst-nat rule but the action of the rule in srcnat is the endpoint-independent-nat one, so I won’t speculate here until then.
I can advise with the following:
1-From your UCM, go to Maintenance, take a capture after reproducing the issue, and send it to this helpful Guys:https://helpdesk.grandstream.com/
After that, you can proceed to the next step and check your router configuration.
The Grandstream UCM has many helpful troubleshooting tools that you can use before checking the router.
Nice idea, but I would exclude the problems beeing at the grandstream phones. I used an Auerswald VoIP appliance before which suffered exactly from the same issues.
Did you already investigated on this?
I currently can state that now the VoIP phones are working fine. There were no issues since I added the forwarding rules you suggested. But I still cannot completely reconstruct, why it is working now. Maybe those rules catch the case when the static dst-nat rules kick in in some special edge-case when the registration timed out and the grandstreams didn’t punched the ports open?
So I’ve finally got to this. It was easier to test it on 7.14.3, so there is a narrow chance that the behavior in newer releases is different, but the outcome is that if a device on LAN address i.i.i.i initiates an UDP connection from port IIII to public address a.a.a.a port AAAA and action=endpoint-independent-nat is used in chain srcnat (rather than plain action=srcnat), so the packets towards a.a.a.a:AAAA leave WAN with source address e.e.e.e and source port IIII, the connection tracking attributes packets arriving from another public address (b.b.b.b) port AAAA to e.e.e.e:IIII with connection-state=new, connection-nat-state=none, nor does any actual dst-nat operation happen. In another words, I could not see any actual difference in the outcome of those two srcnat actions.
I.e. the reason why incoming INVITEs to your phones succeed regardless from which SBC they come is indeed the presence of those filter rules that accept dst-nated packets towards UDP ports 5066 and 5067.
Another thing I stumbled upon and I’m not sure about what it does are the srcnat and the dstnat rules for EINAT. Do we need both? And what do they mean in the documentation with “. The following rule enables filtering:”
Maybe the missing dstnat EINAT rule is the problem?
And for my approach:
Maybe I should disable this EINAT stuff and use normal dstnat nat rules combined with forwarding filter rules for the phones? This configuration might be the most predictable. The reason why I probably forgot the forwarding rule is that I have a rule that allow dstnat to my server, but they are only for this specific host and would never capture the VOIP traffic.
Maybe 2 more specific rules would be then better for permanent running in my case:
Indeed. It did not come to my mind that filtering might be controlled by a rule in dstnat chain, although now as you’ve pushed my nose into it I can imagine the mechanism behind. OK, I will make another test.
The matching on connection-nat-state=dstnat in filter is a kind of a “keyboard saver” in terms that since you have to use all the match conditions checking the source addresses/address lists, source and destination ports, and in-interfaces/interface lists already in the action=dst-nat rules in nat anyway, there is no need to re-type the same sets of conditions in filter. The information that a given packet did match one of the narrowly targeted dst-nat rules is expressed by the presence of the dstnat attribute, so it is safe to use a single rule in filter that matches on that attribute alone to accept all those various connection-establishing packets.
Speculation until another round of testing confirms or denies that: similarly, the action=endpoint-independent-nat rule in dstnat probably compares the destination address and port of a received paket with the reply-dst-address of all tracked connections whose initial packet got handled by an action=endpoint-independent-nat in srcnat (and there is anote regarding that in the connection tracking context data), and if it finds a matching one, it sets the established (or maybe related?) attribute for that packet and applies the “un-src-nat” treatment on it.
The whole beauty of the endpoint-independent-nat treatment makes sense when you have tens or even hundreds of phones in your network and maintenance of individual action=dst-nat rules for them would be a nightmare. For your two phones in total, you can choose which approach is easier to grasp for you.
I’ll come back again once I test the speculation above.