sms failing over wifi calling (vowifi)

Morning all,

I’m struggling to trace the fault for this scenario and would appreciate some guidance. I’ve found some references on this forum, but have not stumbled upon the fix (yet).

SMS messages (iphone to android, ie not imessage) used to work, and if I had to guess it would be around the v6 to v7 upgrade (some months ago). After a while I eventually realised that the intermittent issue was a wifi-calling message issue. If we turn off wifi, or wifi-calling, the message sends ok. Weirdly receiving messages is ok, sending will fail, and will typically fail up until you turn off wifi-calling. And when that single send-message fails, you may get lucky and be able to send a second message and it goes through ok. Weird.

What I have tried
1/ factory reset the hAP ac^2 and try the bare minimum of customisation

  • this generally improved things as I think my paranoid firewall rules where blocking the “unrelated source” connections from one of the *.3gppnetwork.org addresses for my carrier
    2/ confirmed that I can see udp500 nat sessions being established to the carrier
    3/ put a blanket ipsec-esp, udp500/4500 allow rule at the top of the firewall stack
    4/ move my dns from opendns back to my isp’s
    5/ removed my dns nat rule redirecting non-standard dns servers to my home dns server
    6/ echo’ed to the log any firewall drops
    7/ taken wireshark traces and filtered against esp or
    8/ deleted every firewall/nat rule (except for nat masquerade)
    and lastly
    9/ a different non-mikrotik wifi (and sms’s flow perfectly)

For (6) I’m not any seeing any esp drops from unknown sources
For (7) I’m not any dns/esp failures in the trace for the iphone

What I’m finding is my experience is better, but remains inconsistent. I can send a bunch of messages ok, go away for a 5 minutes, try again and it fails. The only pattern I can find is the iphone seems to timeout (the time is inconsistent) and for that message it cannot re-establish its esp tunnel (I’m guessing that’s the fault). On the phone UI when you retry sending you will see one of two scenarios
a/ after about 5sec a “progress bar” appears at the top, slows down at 90%, and at about the 10sec mark it then fails
b/ the retry fails nearly immediately

It feels like the iphone is being blocked from re-establishing a esp session, or an inbound esp session is being blocked by the firewall. But I’m struggling trace the fault and find the smoking-gun bit of evidence that leads me to the fault.

I’m hoping someone says “you overlooked xyz”, and like magic it works. (yes I’m optimistic).

ta, Jeff

The default firewall takes the approach to use “ipsec-policy=” to find IPSec traffic. Do you see lines like this in your config?

/ip firewall filter add action=accept chain=forward comment=“defconf: accept in ipsec policy” ipsec-policy=in,ipsec
/ip firewall filter add action=accept chain=forward comment=“defconf: accept out ipsec policy” ipsec-policy=out,ipsec
/ip firewall nat add action=masquerade chain=srcnat comment=“defconf: masquerade” ipsec-policy=out,none out-interface-list=WAN

I’m not sure the “blanket accept” is needed… as the above rules should capture traffic from a devices on LAN need IPSec upstream since default “accept from LAN” filter rule should allow the initial IPSec connection out, and then it be tracked to be allowed back in.

Do you have any other IPSec stuff on the route itself, e.g. is there a L2TP, IKEv2 etc VPN configured on the router?

Might want to post your actual firewall/config.

Morning, thanks for replying.

conf as below, which is now default (aside from my bit of redirection for esp, dns, and iot). I was hesitant to pollute the board with my conf until it was required.

You are correct about the blanked 500/4500 fwd not being required. I don't recall ever seen the counter advance passed zero.

Yep I have the exact three lines you referenced. I'll research those when I get back from my Sunday AM cycle with a bud

[admin@homenet] /ip/firewall> export

2023-07-30 05:19:21 by RouterOS 7.10.2

software id = R1PC-N1ZL

model = RBD52G-5HacD2HnD-TC

/ip firewall layer7-protocol
add name=iot_dns_suffix regexp=.tuyaeu.com
add name=wyze_dns_suffix regexp="(.nist.gov|.google.com|clock.fmt.he.net|.wyzecam.com|.amazonaws.com|.iotcplatform.com)"

/ip firewall address-list
add address=208.67.222.222 list=knowndns comment="OpenDNS"
add address=208.67.220.220 list=knowndns comment="OpenDNS"
add address=192.168.88.1 list=knowndns comment="home"
add address=192.168.89.1 list=knowndns comment="home"
add address=203.0.178.191 list=knowndns comment="ISP"
add address=203.215.29.191 list=knowndns comment="ISP"
add address=192.168.88.0/24 list=hometnet comment="normal net"
add address=192.168.89.0/24 list=hometnet comment="iot"

/ip firewall filter
add action=accept chain=input comment="allow wifi calling" dst-address=192.168.88.0/24 dst-port=500,4500 protocol=udp
add action=accept chain=input dst-address=192.168.88.0/24 protocol=ipsec-esp
add action=drop chain=forward comment="drop iot to homenet" dst-address=192.168.88.0/24 log=yes log-prefix="drop iot to homenet" src-address=192.168.89.0/24
add action=drop chain=input comment="iot block access to dns != tuyaeu.com" dst-port=53 layer7-protocol=!iot_dns_suffix log=yes log-prefix="china iot DNS drop" protocol=udp src-address=192.168.89.0/25
add action=drop chain=input dst-port=53 layer7-protocol=!iot_dns_suffix log=yes log-prefix="china iot DNS drop" protocol=tcp src-address=192.168.89.0/25
add action=drop chain=forward comment="drop WAN dns" dst-address-list=!knowndns dst-port=53 log=yes log-prefix="drop dns WAN udp request" protocol=udp
add action=drop chain=forward dst-address-list=!knowndns dst-port=53 log=yes log-prefix="drop dns WAN tcp" protocol=tcp
add action=accept chain=input comment="defconf: accept established,related,untracked" connection-state=established,related,untracked
add action=drop chain=input comment="defconf: drop invalid" connection-state=invalid
add action=accept chain=input comment="defconf: accept ICMP" protocol=icmp
add action=accept chain=input comment="defconf: accept to local loopback (for CAPsMAN)" dst-address=127.0.0.1
add action=drop chain=input comment="defconf: drop all not coming from LAN" in-interface-list=!LAN
add action=accept chain=forward comment="defconf: accept in ipsec policy" ipsec-policy=in,ipsec
add action=accept chain=forward comment="defconf: accept out ipsec policy" ipsec-policy=out,ipsec

add action=fasttrack-connection chain=forward comment="defconf: fasttrack" connection-state=established,related hw-offload=yes
add action=accept chain=forward comment="defconf: accept established,related, untracked" connection-state=established,related,untracked
add action=drop chain=forward comment="defconf: drop invalid" connection-state=invalid
add action=drop chain=forward comment="defconf: drop all from WAN not DSTNATed" connection-nat-state=!dstnat connection-state=new in-interface-list=WAN

/ip firewall nat
add action=masquerade chain=srcnat comment="defconf: masquerade" ipsec-policy=out,none out-interface-list=WAN
add action=dst-nat chain=dstnat comment="re-route internet DNS to local DNS" dst-port=53 protocol=tcp src-address-list=hometnet to-addresses=192.168.88.1 to-ports=53
add action=dst-nat chain=dstnat dst-port=53 protocol=udp src-address-list=hometnet to-addresses=192.168.88.1 to-ports=53

Sorry I didn’t respond to the bit. Nope I don’t have any ipsec stuff configured.

Pretty well its a stock home setup. The only variation is I have my iot wireless stuff (smart led’s, vacuum, etc) on a second/restricted subnet.

The IPSec traffic from a phone with VoWIFI goes through firewall as “forwarded”, so the accept on 500,4500 chain=input is doing nothing (unless you had some IPSec VPN enabled on Mikrotik).

You might try to see if disabling the fasttrack firewall rule fixes this problem. Rules around that and IPSec & HW2/3 offload starts getting complex. If that does fix your VoWIFI issues, then might be able to add ipsec-policy=!ipsec to the fasttrack rule in /ip/firewall/filter so it can still be enabled (but this part I’m not so sure about - why I suggest just disable it to test if VoWIFI is fixed first).

thx Amm0 for the replies.

Disabling fasttrack didn’t seem to change the behaviour.
A work colleague suggested I look at the medium side for buffers/broadcasts, so booted everything other than my PC and iPhone off the network. No change there either.

The symptoms make it feel like the device is locked onto a stale/incorrect endpoint. When I retry a failed SMS I can see a cpl packets of accepted ESP traffic. While watching the NAT connections and filter logging, I noticed connections to 101.x.x.x addresses NOT in this list (bizarre!)
https://www.nslookup.io/domains/epdg.epc.mnc001.mcc505.pub.3gppnetwork.org/dns-records/#opendns

On pot luck I flipped to static dns entries (using the authoritative dns list from nslookup.io) and cycled the device to purge its dns list (flight mode on/off).

  • partial success. I had a much more reliable SMS experience. I could not find a pattern to indicate if particular endpoints were causing an issue, but overall it leaned toward the number of endpoints I’m connecting to.
  • mostly success. If I keep my static DNS list to only 2 or 3 of the public entries, I’m unable to reproduce the fault. Occasionally I can see the device “start to timeout” with a progress slider, but it rarely leads to a failure. When it does, retry immediately works.

I’m not convinced its an issue with the carrier endpoints, but more to with the number of random esp endpoints my device attempts to connect to and how the MK device handles those sessions.

What I’m seeing on the NAT side is the device initially establishes a session to all endpoints, but communicates over a single (presumably which ever carrier endpoint negotiated first). The other sessions will timeout and drop. If I kill that remaining session, the device will re-establish it on next message, and typically to the same address. This sort of correlates to the broken scenario, because if its talking to a stale/busy endpoint, it doesn’t recycle back to the public a-record list and attempt to re-establish with an active/idle carrier endpoint.

Sorry if this isn’t more precise. I’m in the “knowledgeable, mostly capable, but dangerous” camp of techs. When I have time I’ll try to extract more clues from the traffic (and in general complain about iphones not exposing any debug logging so that I’m not guessing about what its not happy with)

I’m sorry I’m not the expert on IPSec and the firewall. And, I’ve never run into trouble with ePDGs/VoWIFI to study it.

In reality it just IKEv2 client on a LAN, and that should just work & an SA should be setup automatically. The DNS part isn’t complex, AFAIK it’s just doing an A record based on the MCC/MNC. Now the Mikrotik DNS has had bugs in past V7 version but AFAIK those have been fixed. So I’m sure DNS has much to do with your issues.

Perhaps another firewall entry might help, since the return traffic might be input - but really haven’t study the packet flow here…

 /ip firewall filter add chain=input ipsec-policy=in,ipsec action=accept

Hopefully someone else sees your post… since I think it more IKEv2 outbound isn’t working generically, with VoWIFI just being the only use case for it & that VoWIFI just happens to need it.