2.4G wifi randomly stops (solved)

Hi,

I'm wondering if anyone has experienced issues similar to this with 2.4G wifi networking. As summary of the issue is that after an initial reset and clean start with the attached config, 2.4G wifi works. However, after a time (or maybe some other event that is not-obvious), the 2.4G wifi stops. 5G and ethernet are unaffected.

It also appears to be a l3 issue. I also seems to have started since an upgrade to v7.19.4. I've not tried downgrading or the test branch yet.

I've spent a few days trying to debug this, so I hope someone has an idea.

Setup

  • Mikrotik Chateau series, v7.19.4
  • 4 bridges, with one ethernet and 2 wifi interfaces bound to each (see config).
  • ether1 is WAN

What’s working

  • After a reset, everything. Then after a time, the 2.4G wifi stops working.

What’s not working

  • 2.4G wifi, after a time.

Debugging and fault finding

Starting with a fresh restart with the attached config. everything works. After a time, when the 2.4G wifi (wifi2 interface) stops working, this is what I've done. I had a separate test machine which I conducted tests from, which is an up-to-date OSX.

After each test below, I rebooted the router, and switched the client wifi off and on again.

  1. Confirm no issue with the 5G network
% ping -n -W 5000 192.168.0.1
PING 192.168.0.1 (192.168.0.1): 56 data bytes
64 bytes from 192.168.0.1: icmp_seq=0 ttl=64 time=5.734 ms
64 bytes from 192.168.0.1: icmp_seq=1 ttl=64 time=4.023 ms
64 bytes from 192.168.0.1: icmp_seq=2 ttl=64 time=5.114 ms
64 bytes from 192.168.0.1: icmp_seq=3 ttl=64 time=3.196 ms
64 bytes from 192.168.0.1: icmp_seq=4 ttl=64 time=4.778 ms
64 bytes from 192.168.0.1: icmp_seq=5 ttl=64 time=6.309 ms

All okay so far.

  1. Disable the 5G networks, forcing all devices to the 2.4G network
/interface/wifi/set wifi1 disabled=yes

This will disable all secondary interfaces as well. The client now reports a 5Mbps TX rate on the 2.4G network.

% ping -n -i 6 -W 5000 192.168.0.1
PING 192.168.0.1 (192.168.0.1): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
64 bytes from 192.168.0.1: icmp_seq=3 ttl=64 time=221.404 ms
<snip>
52 packets transmitted, 2 packets received, 96.2% packet loss
round-trip min/avg/max/stddev = 221.404/2402.196/4582.988/2180.792 ms

While not a complete outage, it is very slow.

  1. Removed all interfaces from bridges, except for wifi2 (2G wifi primary interface)

No change in the ping statistics. However, as I was watching with a tcpdump, I noticed that l2 traffic was seemingly okay. So I tried an arp ping.

arping -c 10 192.168.0.1
ARPING 192.168.0.1
42 bytes from f6:1e:57:6e:cd:78 (192.168.0.1): index=0 time=54.049 msec
42 bytes from f6:1e:57:6e:cd:78 (192.168.0.1): index=1 time=6.130 msec
42 bytes from f6:1e:57:6e:cd:78 (192.168.0.1): index=2 time=134.100 msec
42 bytes from f6:1e:57:6e:cd:78 (192.168.0.1): index=3 time=60.035 msec
42 bytes from f6:1e:57:6e:cd:78 (192.168.0.1): index=4 time=50.867 msec
42 bytes from f6:1e:57:6e:cd:78 (192.168.0.1): index=5 time=93.288 msec
42 bytes from f6:1e:57:6e:cd:78 (192.168.0.1): index=6 time=7.524 msec
42 bytes from f6:1e:57:6e:cd:78 (192.168.0.1): index=7 time=6.767 msec
42 bytes from f6:1e:57:6e:cd:78 (192.168.0.1): index=8 time=70.833 msec
42 bytes from f6:1e:57:6e:cd:78 (192.168.0.1): index=9 time=46.205 msec

--- 192.168.0.1 statistics ---
10 packets transmitted, 10 packets received,   0% unanswered (0 extra)
rtt min/avg/max/std-dev = 6.130/52.980/134.100/38.758 ms

A bit slower than expected, but layer 2 traffic seems to be good, indicating a connection issue between the wifi interface and the bridge.... maybe.

  1. Removed all virtual wifi interfaces for 5g interfaces
/interface/wifi/remove [ /interface/wifi/find where master-interface=wifi1 ]

From here on, I've put a summary of the results, I hope that it suffices.

  • L2 ping okay
  • L3 ping not okay
  1. Disabled all virtual interfaces
/interface/wifi/set [ /interface/wifi/find where master-interface=wifi2 ] disabled=yes
  • L2 ping okay
  • L3 ping not okay
  1. Removed all virtual wifi interfaces for 2.4G interfaces
/interface/wifi/remove [ /interface/wifi/find where master-interface=wifi2 ]
  • L2 ping okay
  • L3 ping not okay
  1. Reset the 5G network
/interface/wifi/reset wifi1
  • L2 ping okay
  • L3 ping not okay
  1. Removed all /interface/wifi configs except for those effecting wifi2
  • L2 ping okay
  • L3 ping not okay
  1. Removed the wifi1 datapath, channel settings, security.ft settings
  • L2 ping okay
  • L3 ping not okay
  1. Hard power off
  • L2 ping okay
  • L3 ping not okay
  1. Removed country settings from wifi1
  • L2 ping okay
  • L3 ping not okay
  1. Removed all bridges except bridge 0
  • L2 ping okay
  • L3 ping not okay

At this point, you'll see from my config that I effectively locked myself out of the router. So I did a full reset, and setup from the config file

  1. Full reset, then quickly turned off wifi1 (the 5G network) to force all clients to 2.4G
/system reset-configuration no-defaults=yes keep-users=no skip-backup=yes run-after-reset=myconfig.rsc
...
/interface/set wifi1 disabled=yes
  • L2 ping okay
  • L3 ping okay

So, working with the default config. The client also shows a 130Mbps TX rate. However, it's a matter of time until this happens again.

current.rsc (24.8 KB)

1- please remove serial number from that export...
2- why 4 bridges ? You can do all with 1 unless I missed something really odd in your config. Use VLAN.
3- What if you reset device to factory default config and LEAVE it there ? What happens then with 2.4GHz ? To rule out HW issues (but I would be very surprised).

Could you change this:

/interface wifi channel
add band=2ghz-ax name=ch2 skip-dfs-channels=disabled

to this:

/interface wifi channel
add band=2ghz-ax disabled=no frequency=2412,2432,2452,2472 name=ch2 reselect-interval=6h..12h width=20mhz

and give it another try?

With these settings your 2.4GHz radio bandwidth will be limited to 20MHz, frequency will be picked from a list of non overlapping frequencies and the radio will perform a scan every 6 to 12 hours (and adjust frequency to best setting).

Hi, thanks @holvoetn, @erlinden for your quick answers. I think it's resolved, let me give an update of the changes.

  1. Removed the serial number, thanks.
  2. 4 bridges were to separate the networks. I converted this to a single bridge with vlans, and that seems to have resolved it.
  3. Didn't try leaving the default configuration there as I didn't need to.
  4. I added the reselection into the configuration as well, 2.4 is pretty noisy with a few other wifi routers in the area all trying to find clean air.

Regarding the root cause, I did manage to find a correlation with an Android streaming box that is sitting on the network. Every time I turned this on, it worked for a bit, then the router got into a bad state. Not going to mention the provider, but it creates lots of proprietary vpn tunnels to different locations, and this traffic seems to have upset the router.

A couple of take-aways for anyone finding this.

  1. These devices really don't seem to like having multiple bridges, probably due to the hardware offloading that happens on the bridge/ethernet interfaces.
  2. The documentation which detail setting up bridges with vlans is quite hard to read, mainly because there's so many different hardware variants, each with their own different method of configuration and the documents focus more on switches. You'll just have to try putting the commands in and see if you get an unsupported error. To be honest, the reason I originally used the bridges is that the configuration was just a lot simpler.

For reference, I've put the resolved config with comments here. This works with a Chateau PRO ax.

### GENERAL ###
# Globals
/
:global TZ "Asia/Tokyo"
#
### INTERFACE LISTS ###
/interface list member
remove [ find ]
/interface list
remove [ find builtin=no ]
add name=WAN     comment=wan
add name=WORK    comment=work
add name=GENERAL comment=general
add name=GUEST   comment=guest
add name=DEVICES comment=devices
add name=LOCAL   comment=local    include=WORK,GENERAL,GUEST,DEVICES
#
### BRIDGE & VLAN CREATION ###
# Done early to allow for tagging on VLANs. Also assign addresses early to allow access.
# vlan ids: work: 100 (vlan0) ; general: 110 (vlan1); guest: 120 (vlan2); devices: 130 (vlan3); management: 200 (switch-cpu)
# Teardown
/ip/address
remove [ find ]
/ipv6/address
remove [ find where !dynamic ]
/interface/vlan
remove [ find ]
/interface/bridge/port
remove [ find ]
/interface/bridge
remove [ find ]
# Bridge creation
/interface/bridge
add name=bridge0 admin-mac=F6:1E:57:XX:XX:78 auto-mac=no vlan-filtering=yes
# VLAN creation
/interface/vlan
add name=vlan0 interface=bridge0 vlan-id=100 comment="work"
add name=vlan1 interface=bridge0 vlan-id=110 comment="general"
add name=vlan2 interface=bridge0 vlan-id=120 comment="guest"
add name=vlan3 interface=bridge0 vlan-id=130 comment="devices"
add name=vlanm interface=bridge0 vlan-id=200 comment="management"
# Add members to the interface lists
/interface/list/member
add list=WAN     interface=ether1
add list=WORK    interface=vlan0
add list=GENERAL interface=vlan1
add list=GUEST   interface=vlan2
add list=DEVICES interface=vlan3
# VLAN addresses
/ip address
add address=192.168.0.1/24 interface=vlan0 network=192.168.0.0
add address=192.168.1.1/24 interface=vlan1 network=192.168.1.0
add address=192.168.2.1/24 interface=vlan2 network=192.168.2.0
add address=192.168.3.1/24 interface=vlan3 network=192.168.3.0
#
### ETHERNET/SWITCH ###
# Add ports. All access ports, tag all with the VLAN id. VLAN interface will be the termination
/interface/bridge/port
add bridge=bridge0 interface=ether2 frame-types=admit-only-untagged-and-priority-tagged pvid=100
add bridge=bridge0 interface=ether3 frame-types=admit-only-untagged-and-priority-tagged pvid=110
add bridge=bridge0 interface=ether4 frame-types=admit-only-untagged-and-priority-tagged pvid=120
add bridge=bridge0 interface=ether5 frame-types=admit-only-untagged-and-priority-tagged pvid=130
# Set the switch ports to be secured
/interface/ethernet/switch/port
set ether2      default-vlan-id=100
set ether3      default-vlan-id=110
set ether4      default-vlan-id=120
set ether5      default-vlan-id=130
set switch1-cpu default-vlan-id=200
# No need to add the ports to the port to a switch since there is only one switch (i.e. not supported)
#
### WIFI ###
#
# Teardown
/interface/bridge/port
remove [ find where interface~"wifi.*" ]
/interface wifi
remove [ find where master=no ]
reset [ find where master=yes ]
/interface wifi channel
remove [ find ]
/interface wifi configuration
remove [ find ]
/interface wifi security
remove [ find ]
/interface wifi datapath
remove [ find ]
/interface wifi steering
remove [ find ]
/interface wifi steering neighbor-group
reset [ find ]
#
# Wifi security profiles
/interface wifi security
add name=wsecprof0 authentication-types=wpa3-psk          wps=disable ft=yes ft-over-ds=yes passphrase="xxxxxxxxxx"
add name=wsecprof1 authentication-types=wpa3-psk,wpa2-psk wps=disable ft=yes ft-over-ds=yes passphrase="xxxxxxxxxx"
add name=wsecprof2 authentication-types=wpa3-psk,wpa2-psk wps=disable ft=yes ft-over-ds=yes passphrase="xxxxxxxxxx"
add name=wsecprof3 authentication-types=wpa2-psk,wpa3-psk wps=disable ft=yes ft-over-ds=yes passphrase="xxxxxxxxxx"
# 
# Wifi data paths (basically forwarding traffic to the bridge/vlan)
/interface wifi datapath
add name=wdpprof0 bridge=bridge0 vlan-id=100 client-isolation=no
add name=wdpprof1 bridge=bridge0 vlan-id=110 client-isolation=no
add name=wdpprof2 bridge=bridge0 vlan-id=120 client-isolation=yes
add name=wdpprof3 bridge=bridge0 vlan-id=130 client-isolation=yes
#
# Wifi steering
/interface wifi steering
add name=wstprof0 neighbor-group=wstprof0 2g-probe-delay=yes
add name=wstprof1 neighbor-group=wstprof1 2g-probe-delay=yes
add name=wstprof2 neighbor-group=wstprof2 2g-probe-delay=yes
add name=wstprof3 neighbor-group=wstprof3 2g-probe-delay=yes
#
# WiFi configuration profiles
/interface wifi configuration
add name=wconfprof0 security=wsecprof0 steering=wstprof0 mode=ap hide-ssid=yes ssid=mywifi-w country=Japan max-clients=50
add name=wconfprof1 security=wsecprof1 steering=wstprof1 mode=ap hide-ssid=no  ssid=mywifi   country=Japan max-clients=50
add name=wconfprof2 security=wsecprof2 steering=wstprof2 mode=ap hide-ssid=no  ssid=mywifi-g country=Japan max-clients=50
add name=wconfprof3 security=wsecprof3 steering=wstprof3 mode=ap hide-ssid=no  ssid=mywifi-d country=Japan max-clients=50
#
# Wifi channel configuration
/interface wifi channel
add name=ch2 band=2ghz-ax skip-dfs-channels=disabled reselect-interval=6h..12h
# This is the intersection of HK and JP regulatory zones to support home devices. Once all on JP, can use wider channels
add name=ch5 band=5ghz-ax skip-dfs-channels=10min-cac frequency=5180-5320:20,5500-5700:20 secondary-frequency=5210,5290,5530,5610,5690
#
# Wifi setup
/interface wifi
set [ find default-name=wifi1 ] channel=ch5 configuration=wconfprof0 disabled=no
set [ find default-name=wifi2 ] channel=ch2 configuration=wconfprof0 disabled=no
add configuration=wconfprof1 disabled=no mac-address=F6:1E:57:XX:XX:7E master-interface=wifi1 name=wifi3
add configuration=wconfprof1 disabled=no mac-address=F6:1E:57:XX:XX:7F master-interface=wifi2 name=wifi4
add configuration=wconfprof2 disabled=no mac-address=F6:1E:57:XX:XX:80 master-interface=wifi1 name=wifi5
add configuration=wconfprof2 disabled=no mac-address=F6:1E:57:XX:XX:81 master-interface=wifi2 name=wifi6
add configuration=wconfprof3 disabled=no mac-address=F6:1E:57:XX:XX:82 master-interface=wifi1 name=wifi7
add configuration=wconfprof3 disabled=no mac-address=F6:1E:57:XX:XX:83 master-interface=wifi2 name=wifi8
#
# Add to bridge/VLAN ports
/interface/bridge/port
add bridge=bridge0 interface=wifi1 frame-types=admit-only-untagged-and-priority-tagged pvid=100
add bridge=bridge0 interface=wifi2 frame-types=admit-only-untagged-and-priority-tagged pvid=100
add bridge=bridge0 interface=wifi3 frame-types=admit-only-untagged-and-priority-tagged pvid=110
add bridge=bridge0 interface=wifi4 frame-types=admit-only-untagged-and-priority-tagged pvid=110
add bridge=bridge0 interface=wifi5 frame-types=admit-only-untagged-and-priority-tagged pvid=120
add bridge=bridge0 interface=wifi6 frame-types=admit-only-untagged-and-priority-tagged pvid=120
add bridge=bridge0 interface=wifi7 frame-types=admit-only-untagged-and-priority-tagged pvid=130
add bridge=bridge0 interface=wifi8 frame-types=admit-only-untagged-and-priority-tagged pvid=130
#
### DHCP/ND ###
#
# Allow neighbour discovery on all. Note that this is subject to filtering later.
/ip neighbor discovery-settings
set discover-interface-list=LOCAL
# DHCP server address pools
/ip pool
remove [ find ]
add name=dhcpspool0 ranges=192.168.0.50-192.168.0.254
add name=dhcpspool1 ranges=192.168.1.50-192.168.1.254
add name=dhcpspool2 ranges=192.168.2.50-192.168.2.254
add name=dhcpspool3 ranges=192.168.3.50-192.168.3.254
# DHCP server assignment
/ip dhcp-server
add address-pool=dhcpspool0 disabled=no interface=vlan0 name=dhcps0
add address-pool=dhcpspool1 disabled=no interface=vlan1 name=dhcps1
add address-pool=dhcpspool2 disabled=no interface=vlan2 name=dhcps2
add address-pool=dhcpspool3 disabled=no interface=vlan3 name=dhcps3
# DHCP options
/ip dhcp-server option
remove [ find ]
add name=dhcpopt-101 code=2 value="s'$TZ'"
# DHCP options
/ip dhcp-server option sets
remove [ find ]
add name=dhcpos0 options=dhcpopt-101
add name=dhcpos1 options=dhcpopt-101
add name=dhcpos2 options=dhcpopt-101
add name=dhcpos3 options=dhcpopt-101
# DHCP server network config
/ip dhcp-server network
remove [ find ]
add address=192.168.0.0/24 dns-server=192.168.0.1 gateway=192.168.0.1 netmask=24 ntp-server=192.168.0.1 dhcp-option-set=dhcpos0 domain="local"
add address=192.168.1.0/24 dns-server=192.168.1.1 gateway=192.168.1.1 netmask=24 ntp-server=192.168.1.1 dhcp-option-set=dhcpos1 domain="local"
add address=192.168.2.0/24 dns-server=192.168.2.1 gateway=192.168.2.1 netmask=24 ntp-server=192.168.2.1 dhcp-option-set=dhcpos2 domain="local"
add address=192.168.3.0/24 dns-server=192.168.3.1 gateway=192.168.3.1 netmask=24 ntp-server=192.168.3.1 dhcp-option-set=dhcpos3 domain="local"
# DHCP client for WAN
/ip dhcp-client
remove [ find ]
add interface=ether1 use-peer-dns=no use-peer-ntp=no comment="WAN DHCP client"
/ipv6 settings
set accept-router-advertisements=yes forward=yes
# DHCP/ND IPv6 pool. dhcpspoolg is dynamic and created by the dhcpv6 client from the wan side. dhcpspoolN is a ULA range (for containers and site services)
# For our network, all clients are assigned from the same range. Firewall rules are controlled by the source/destination interface and port. This avoids having
# Maintain continuity of address ranges through the setup. However, subnets are numbered with a ULA Global ID 0, Subnet ID follows the IPv4 setting.
# Only enabling vlan0 at the moment since without a global address, lots of things break
/ipv6 pool
remove [ find dynamic=no ]
add name="dhcpspool0" prefix=fxxx:3530:6af8:0::/64 prefix-length=64
add name="dhcpspool1" prefix=fxxx:3530:6af8:1::/64 prefix-length=64
add name="dhcpspool2" prefix=fxxx:3530:6af8:2::/64 prefix-length=64
add name="dhcpspool3" prefix=fxxx:3530:6af8:3::/64 prefix-length=64
/ipv6 dhcp-client option
remove [ find ]
/ipv6 dhcp-client
remove [ find ]
add interface=ether1 use-peer-dns=no request=prefix,address allow-reconfigure=yes pool-name=dhcpspoolg pool-prefix-length=64 prefix-hint=::/61 use-interface-duid=yes script="" comment="WAN DHCP client"
# Neighbour discovery. Remove all, disable default
/ipv6 nd
remove [ find where !default ]
set [ find where default ] disabled=yes
add interface=ether1 ra-interval=3m20s-10m ra-delay=3s ra-lifetime=30m ra-preference=medium advertise-mac-address=yes advertise-dns=no managed-address-configuration=no other-configuration=no
:foreach ifid in=[ /interface/vlan/find where name~"vlan[0-9]+" ] do={
  :local ifname [/interface/vlan/get $ifid name]
  # Need to wait until the link-scope address is ready. This can take a few seconds
  :local counter 0
  :while ( ($counter < 10) && ([/ipv6/address/print count-only as-value where interface="$ifname" and dynamic and address in fe80::/10 ] < 1) ) do={
    :log info "Waiting for $ifname link-local address"
    :delay delay-time=1s
    :set counter ($counter + 1)
  }
  :local linklocal [/ipv6/address/get [/ipv6/address/find where interface="$ifname" and dynamic and address in fe80::/10 ] address]
  :local linklocaladdress [:pick $linklocal 0 [:find $linklocal "/"]]
  add interface=$ifname ra-interval=3m20s-10m ra-delay=3s ra-lifetime=30m ra-preference=medium advertise-mac-address=yes advertise-dns=yes managed-address-configuration=no other-configuration=no dns=$linklocaladdress
}
# Configure the local prefixes
# Very hacky, asumes only one digit for index. Disabled ULA for now, old Android seems to have an issue
/ipv6 nd prefix
remove [ find where !dynamic ]
:foreach ifid in=[ /interface/vlan/find where name~"vlan[0-9]+" ] do={
  :local ifname [/interface/vlan/get $ifid name]
  :local ifnumber [:pick $ifname ([:len $ifname]-1) [:len $ifname] ]
  :local ifprefix [ /ipv6/pool/get [ /ipv6/pool/find where name="dhcpspool$ifnumber" ] prefix ]
  #add autonomous=yes interface=$ifname on-link=yes prefix=$ifprefix
}
# Add a single network interface, which will correct the routing table. Do this until the IPv6 prefix is fixed.
/ipv6/address
add interface=vlan0 from-pool=dhcpspoolg advertise=yes eui-64=yes
#
### DNS ###
# Configure DNS
/ip dns
set allow-remote-requests=yes cache-size=20480
set mdns-repeat-ifaces=[ :local bs [:toarray ""]; :foreach b in=[/interface/vlan/print as-value proplist=name where name~"vlan[0-9]+"] do={:set bs ($bs,($b->"name"))}; $bs ]
set servers=2606:4700:4700::1111,2001:4860:4860::8888,2001:4860:4860::8844,1.1.1.1,8.8.8.8,8.8.4.4
#

Thanks again for your help!

For anyone finding this and having a similarly weird issue, I finally found the root cause. I had a USB hub plugged in and mounted parallel to the left antenna (viewed form the front side), about 5 cm away. When I updated/tested, I pulled out the router and things worked, then after a short time, when I put it back, things did not. I also noticed that wifi authentication was not working. Any device that had access was authenticating on 5GHz and then transitioning to 2.4GHz.

When I relocated the hub, things worked and have been stable ever since. Still, I have no idea why l2 traffic was working and l3 traffic was not. I assume some RF voodoo.