And in particular - the symptoms you describe (outgoing call ringing at the called party but nothing happens when the called party accepts it), it seems as if the answer message (SIP 200 OK) did not make it to your router from the mobile operator's exchange, or the router has failed to forward it to the phone.Then you can start trouble shooting.
This is the result. I just fine tune to include the "via WAN traffic".And in particular - the symptoms you describe (outgoing call ringing at the called party but nothing happens when the called party accepts it), it seems as if the answer message (SIP 200 OK) did not make it to your router from the mobile operator's exchange, or the router has failed to forward it to the phone.Then you can start trouble shooting.
So open a command line window to the router, make it as wide as your screen allows, and run
/tool sniffer quick ip-address=ip.of.mobile.exchange port=4500
While there is no call, you should see some keepalive packets every 20 seconds or so. Leave it like that for, say, two minutes, then stop the sniffer (Ctrl-C), do /tool sniffer packet print, copy the output as text and post it here (if you have a public IP on WAN of the router, replace it systematically by my.pub.lic.ip or so before posting).
You should see each packet multiple times - in via wlanX, then in via bridge, then out via ether1 (WAN), and then the response in reverse order, if your RB4011 is more or less in the factory default configuration.
Once we get past this, we can debug the actual issue.
Incoming calls work normally for this phone in this home network.OK. This shows that the phone has a local address of 192.168.1.8, is connected via a WiFi AP connected via ether10, ether10 is a member port of bridge local, and ether1 is a WAN.
So now you can run the sniffer again, not restricting it to a particular interface, try the outgoing call, and try to answer the call on the called phone. Since the traffic is encrypted, you cannot identify the SIP registration updates and call control messages and RTP media packets from one another and from the IPsec keepalive traffic by anything else but size. But all the SIP and RTP packets should be larger than the 122 bytes of the IPsec DPDs you can see in the "idle time" capture as above. Which means that when printing the captured packets, you'll use /tool/sniffer/packet/print where size>122 in order to get rid of the "background noise" and only see the SIP & RTP.
So the pattern we are looking for is the following:What we are looking for is whether the 200 OK arrives to ether1 (WAN) and whether the router delivers it all the way to ether10. That's basically all we can find - if it does, the only other wrongdoing of the router could be that it malforms the contents of the packet as delivering it; to find out, it would be necessary to sniff into a file, open the file using Wireshark, and compare the payload of the two packets - the Ethernet and IP headers will be different due to NAT and different MAC addresses.
- as you press "call", there should be a single large message from the phone to the exchange (the INVITE), responded by a multiple smaller ones from the exchange (100 Trying, 180 Ringing, 183 Media Change - cannot say whether all of them will be there, but at least one should)
- many same-size packets (RTP ones) should follow, carrying the alerting tone, unless the exchange asks the phone to generate the tone - both cases are possible
- once you answer the called phone, one packet larger than the RTP ones should be seen from the exchange to the phone (200 OK), "responded" with another "larger-than-RTP" one from the phone (ACK). Following the 200 OK, the RTP should start flowing in both directions.
If the router did nothing wrong, it must be the exchange, the phone, or the wireless AP.
What can interfere with the message exchange outlined above is the periodic SIP registration, which has its own timing, independent from the call establishing messages. So better to do the complete procedure (two separate sniffings) for two calls and compare the results. You can filter out also the RTP by size to have only the SIP packets printed, or you can sniff into files and open them using Wireshark for better filtering and graphing possibilities.
As you specifically mention outgoing calls, do I read it right that incoming calls work normally for this phone and this home network? What about outgoing calls in another wireless network?
On a bridge, the default mtu setting is auto (these days, and makes sense in general); on Ethernet ports, it seems to be set to 1500 in the default configuration, so it seems the OP has changed it manually.I checked in my router config and it has explicit MTU 1500 set on the bridge. But it could well be that I set that myself at some time.
I'm afraid that reassembly and eventual re-fragmentation is inevitable where connection tracking is used, i.e. any NAT handling means reassembly, otherwise the router would not know where to forward the second and later fragments that do not contain port numbers. And as the router just forwards the IPsec packets, it doesn't care about them being IPsec ones.It also is "interesting" that RouterOS performs reassembly for the received packets, and that this even works across IP packets that form a single UDP packet.
Usually, re-assembly in routers is only done at the IP level, not at the UDP level. Maybe this happens only on IPsec tunnels?
It seems a bit superfluous, and as shown in this example it could even be dangerous.
The packets that need to be small to limit latency are those carrying the media (audio), i.e. RTP. What was not passing through was the packet informing the calling party that the call has been answered and which codecs out of the offered ones have been accepted by the called party. What I suspect (I didn't read the recommendations) is that VoLTE/VoWiFi may be using SIP over TCP, and that several SIP messages may get piggybacked into a single TCP packet, explaining its size to make use of the MTU. Or maybe the codec list was huge in the INVITE (which is missing in the capture), so the codec list in the response is proportionally huge in the 200 that has exceeded the phone's MTU even if it is sent in UDP - when a payload packet is encapsulated into an IPsec transport one, the additional headers and authenticity/replay protection bits are added, and only the resulting packet is fragmented depending on the MTU of the outgoing interface.Finally, it is strange that the VoWIFI service sends UDP packets that are too large to fit in a 1500 byte ethernet packet. Usually those packets are much smaller, to reduce latency.
Actually that isn't the case. The IP fragments all contain the same ID number and subsequent fragments could be forwarded to the same place as the first fragment with the same ID.I'm afraid that reassembly and eventual re-fragmentation is inevitable where connection tracking is used, i.e. any NAT handling means reassembly, otherwise the router would not know where to forward the second and later fragments that do not contain port numbers. And as the router just forwards the IPsec packets, it doesn't care about them being IPsec ones.It also is "interesting" that RouterOS performs reassembly for the received packets, and that this even works across IP packets that form a single UDP packet.
Usually, re-assembly in routers is only done at the IP level, not at the UDP level. Maybe this happens only on IPsec tunnels?
It seems a bit superfluous, and as shown in this example it could even be dangerous.
Well of course that always is the case, but I could understand when someone thinks "well, I can try to make jumboframes work on my network, let's set all local ethernet MTU to 9000 and the internet facing MTU to 1500. that should work because traffic between local hosts can pass at 9000 byte MTU and traffic to internet will be fragmented. Traffic from internet will be limited to 1500 by the internet interface. TCP traffic to internal systems will work because of the MSS parameter".And here, the refragmentation was not the reason why it didn't work - the actual cause was the MTU setting incompatible with the receiving device's capability.
Then, when they have a device what has 1500 byte MTU it will only fail in this very special case.
Maybe it should be possible to specify a reassembly-MTU separately.
Unfortunately, I was too optimistic. On the third day it stops working. The situation as was before: on port 4500 (UDP) there is only one-way traffic, because firewall cannot establish bidirectional connection. Sometimes (rare) it can, mostly cannot, so traffic is not routed back to the source. I think, it's related to NAT and firewall implementation. I cannot test it on 6.x branch, unfortunately.7.1.1 firmware seems solved problems with VoWIFI I experienced. I updated yesterday and still testing it, but for these two days it works well.
As far as I remember I saw packets in both directions. When I reported this issue to MikroTik support, I mentioned I have two separate connections in firewall instead one "linked". The root of this issue is still unknown to me. I continue experimenting, but every try takes days. I.e. right now it is working, after changing router config and phone reboot (it stops connecting to Wi-Fi, LOL, maybe iOS 15 bug? RouterOS 7.x bug? I don't know , and I'll see what happens in few more days of coming in and coming out.A stupid question - the phone keeps sending its IPsec traffic to port 4500 of the IP address of the mobile exchange. Assuming you haven't intentionally told the src-nat/masquerade rule to use only a single specific port at your WAN IP address for this connection, if you run /tool/sniffer/quick ip-address=ip.of.the.exchange port=4500, can you see the phone->exchange packets also at the WAN interface or only at the LAN one? If at both, can you see the responses from the exchange to arrive to the WAN interface?
Why is this important? The phone uses a distinct APN for ims than for "other data", which implies two distinct addresses to be assigned, but the fact that the phone keeps using that address when sending packets via WiFi has the same effect - even if you let those packets through, the responses won't reach the phone unless the mobile network is still reachable.I still have to research if that IPv6 address is from the actual 4G internet or if it is a hidden network only used for VoLTE.
Yes, it's a very unusual situation. Sorry for not sharing details publicly, but I created one more ticket with support, #[SUP-72355], especially related to this situation. It's 100% reproducible now. Last time they replied me they did not see anything unusual and they do not know why these connections are separate. Personally I think that it's a bug in firewall. I would like to be wrong, then it could be easily corrected. I better wait for response from a tech support, than guessing.I can hardly imagine how a forwarded UDP connection on a NATing router could create two independent unidirectional connections, unless...
Look:When you had double entries, was one of them with an untranslated port number 4500 and the other one with a different reply port number?
Incoming UDP 4500 permitted in input chain, as I have IKEv2 server enabled. I also tried to disable it and test, and the result was unsatisfactory. I have no idea what to do with this situation.So connection #4 should have been src-nated (unless there is an issue in chain srcnat of nat, which is unlikely as it works for some time after reboot) but it is not, and connection #3 should not have been even accepted unless connections to UDP port 4500 are permitted in chain input of filter.
Sure no. A default plain masquerade rule.And there is no restriction of allowed to-ports in the src-nat/masquerade rule that normally creates the correct, bi-diectional & src-nated, connection?
I think, it's some kind of weird bugs they have in ROS7 firmware. There were bugs with bridge filtering, which they fixed (maybe not all, but most noticeable), but, obviously, there are a lot of not-so-common cases that will pollute their support tickets until some 7.x minor version. I'd love to use RouterOS 6 and that's all but Chateau device do not allow me to do it.Maybe the newer version of ipfilter in RouterOS 7 behaves differently in the same situation, but…
)Афигеть...
OK, this sounds much different than "the behaviour is normal" as I understood it before.Exact phrase was "Unfortunately, we cannot spot any errors which can cause such behavior."
No, it do not goest down. Indeed, I have interface list as masquerade destination, PPPoE + LTE. PPPoE is the main route, LTE is second. But: I tried to change it solely to PPPoE (as support recommended me to do)—the situation did not change. So no, not a routing issue, at least not on configuration level.But I've got another idea - would it be possible that if the WAN goes down for some reason, the packets towards the exchange take some other route than via the WAN gateway? That would explain why the new connection is not src-nated.
Wi-Fi calling was turned off on the phone, at least one day without it (so not 5 minutes, but whole day). Now I turned on again, and what I see: src-address: LAN IP -> dst-address: MTS IP. And src-address: MTS IP -> dst-address: Provider IP. It do not establish NAT for reply packets.Still... if you shut down WiFi on the phone, wait for the existing weird connections to die off (3+ minutes), then run /tool sniffer quick ip-address=ip.of.the.exchange and switch on the WiFi on the phone, what does the sniffer output show?
On what interfaces? wlanX->bridge->pppoe?what I see: src-address: LAN IP -> dst-address: MTS IP. And src-address: MTS IP -> dst-address: Provider IP.
Here is the dump. 10.x - LAN, x.77 - MTS VoWiFi IP, x.61 - my provider.On what interfaces? wlanX->bridge->pppoe?what I see: src-address: LAN IP -> dst-address: MTS IP. And src-address: MTS IP -> dst-address: Provider IP.
[admin@Router] > /tool sniffer quick ip-address=x.77
Columns: INTERFACE, TIME, NUM, DIR, SRC-MAC, DST-MAC, SRC-ADDRESS, DST-ADDRESS, PROTOCOL, SIZE, CPU
INTERFACE TIME NUM DIR SRC-MAC DST-MAC SRC-ADDRESS DST-ADDRESS PROTOCOL SIZE CPU
wlan5 12.726 10 <- MA:C1:00:00:00:00 MA:C2:00:00:00:00 10.x:4500 x.77:4500 ip:udp 126 3
bridge 12.726 11 <- MA:C1:00:00:00:00 MA:C2:00:00:00:00 10.x:4500 x.77:4500 ip:udp 126 3
pppoeX 12.726 12 -> 10.x:4500 x.77:4500 ip:udp 112 3
wlan5 17.731 13 <- MA:C1:00:00:00:00 MA:C2:00:00:00:00 10.x:4500 x.77:4500 ip:udp 126 3
bridge 17.731 14 <- MA:C1:00:00:00:00 MA:C2:00:00:00:00 10.x:4500 x.77:4500 ip:udp 126 3
pppoeX 17.731 15 -> 10.x:4500 x.77:4500 ip:udp 112 3
wlan5 22.738 16 <- MA:C1:00:00:00:00 MA:C2:00:00:00:00 10.x:4500 x.77:4500 ip:udp 126 3
bridge 22.738 17 <- MA:C1:00:00:00:00 MA:C2:00:00:00:00 10.x:4500 x.77:4500 ip:udp 126 3
pppoeX 22.738 18 -> 10.x:4500 x.77:4500 ip:udp 112 3
pppoeX 23.766 19 <- x.77:4500 x.61:4500 ip:udp 29 3
wlan5 25.255 20 <- MA:C1:00:00:00:00 MA:C2:00:00:00:00 10.x:4500 x.77:4500 ip:udp 43 3
bridge 25.255 21 <- MA:C1:00:00:00:00 MA:C2:00:00:00:00 10.x:4500 x.77:4500 ip:udp 43 3
pppoeX 25.255 22 -> 10.x:4500 x.77:4500 ip:udp 29 3
pppoeX 28.039 23 <- x.77:4500 x.61:4500 ip:udp 112 3
pppoeX 29.055 24 <- x.77:4500 x.61:4500 ip:udp 112 3
pppoeX 30.1 25 <- x.77:4500 x.61:4500 ip:udp 112 3
pppoeX 31.126 26 <- x.77:4500 x.61:4500 ip:udp 112 3
pppoeX 32.176 27 <- x.77:4500 x.61:4500 ip:udp 112 3
pppoeX 33.201 28 <- x.77:4500 x.61:4500 ip:udp 112 3
Yes it does, but there is normally no need for a special treatment for IPsec traversing a NATing router. When IPsec detects NAT, it normally automatically sends keepalive packets every 20 seconds in order to keep the pinhole open even if no actual traffic is being sent, plus usually also Dead Peer Detection packets every minute, whereas the default UDP pinhole lifetime in RouterOS is 180s. So what @memelchenkov suffers from is some malfunction of the regular operation of the NAT, which is intermittent on top of that.vowifi (VOWLAN) use IPSEC
This./interface bridge settings set use-ip-firewall=yes?
/ip firewall raw
add action=notrack chain=prerouting in-interface=!bridge
Sure, the idea was that instead of positive matching on in-bridge-port(-list), you have to use negative matching on IP interface (in-interface=!bridge), because the positive one will match during both passes (bridging and routing) whereas the negative one will match only during bridging phase where in-interface is (hopefully) not the bridge yet, but you have to add some src-address(-list) condition to restrict the effect of the rule only to traffic initiated from your local LAN subnets. If that doesn't help, I cannot see any other way how to distinguish the routing phase from the bridging phase.If you mean below rule, then this rule just blocks everything.Code: Select all/ip firewall raw add action=notrack chain=prerouting in-interface=!bridge
It seems it does not work that way. When adding src-address limited to LAN address space (or even just a single IP address), the counter of this rule is always zero.but you have to add some src-address(-list) condition to restrict the effect of the rule only to traffic initiated from your local LAN subnets. If that doesn't help, I cannot see any other way how to distinguish the routing phase from the bridging phase.