Strange disconnect with delayed reconnect, "no beacons&

I have two MT v2.9.11 boxes. One is an AP with an omni antenna and the other is acting as a station with a directional grid. They are about .2 miles apart and the signal level is ~-55. The station has a CM9 card, the AP has an Atheros 5354. 802.11a with no nstream

Everyone couple hours the station will disconnect from the AP and won’t rejoin until 20-60 minutes later. Dis and re enabling the interface on either end does not fix the problem.

When this occurs I see the following in the logs:

At the AP

wireless,info wlan2: disconnected 00:0B:6B:37:A2:A3, inactivity

At the station

00:10:18 wireless,info wlan1: lost connection to 00:02:6F:21:EC:80, got disassoc: no activity (4)
00:10:20 wireless,info wlan1: connected to 00:02:6F:21:EC:80 on 5765, SSID JackalNet2
00:33:18 wireless,info wlan1: lost connection to 00:02:6F:21:EC:80, no beacons
00:33:23 wireless,info wlan1: failed to connect to 00:02:6F:21:EC:80, join timeout
00:33:36 wireless,info wlan1: failed to connect to 00:02:6F:21:EC:80, join timeout
00:33:51 wireless,info wlan1: failed to connect to 00:02:6F:21:EC:80, join timeout
00:34:15 wireless,info wlan1: failed to connect to 00:02:6F:21:EC:80, join timeout
00:34:27 wireless,info wlan1: failed to connect to 00:02:6F:21:EC:80, join timeout
00:34:51 wireless,info wlan1: failed to connect to 00:02:6F:21:EC:80, join timeout
00:35:03 wireless,info wlan1: failed to connect to 00:02:6F:21:EC:80, join timeout
00:35:33 wireless,info wlan1: failed to connect to 00:02:6F:21:EC:80, join timeout

Here is the AP config

set wlan2 name=“wlan2” mtu=1500 mac-address=00:02:6F:21:EC:80 arp=enabled disable-running-check=no
radio-name=“00026F21EC80” mode=ap-bridge ssid=“JackalNet2” area=“” frequency-mode=manual-txpower
country=no_country_set antenna-gain=0 frequency=5765 band=5ghz scan-list=default
rate-set=configured supported-rates-b=1Mbps,2Mbps,5.5Mbps,11Mbps
supported-rates-a/g=6Mbps,9Mbps,12Mbps,18Mbps,24Mbps,36Mbps,48Mbps basic-rates-b=1Mbps
basic-rates-a/g=6Mbps max-station-count=2007 ack-timeout=dynamic tx-power-mode=default
noise-floor-threshold=default periodic-calibration=disabled burst-time=disabled dfs-mode=none
antenna-mode=ant-a wds-mode=disabled wds-default-bridge=none wds-ignore-ssid=no
update-stats-interval=disabled default-authentication=yes default-forwarding=yes
default-ap-tx-limit=0 default-client-tx-limit=0 hide-ssid=no security-profile=default
disconnect-timeout=10s on-fail-retry-time=100ms preamble-mode=both compression=no
allow-sharedkey=no comment=“” disabled=no

And the station config

set wlan1 name=“wlan1” mtu=1500 mac-address=00:0B:6B:37:A2:A3 arp=enabled
disable-running-check=no radio-name=“000B6B37A2A3” mode=station ssid=“JackalNet2” area=“”
frequency-mode=manual-txpower country=no_country_set antenna-gain=0 frequency=5765 band=5ghz
scan-list=default rate-set=default supported-rates-b=1Mbps
supported-rates-a/g=6Mbps,9Mbps,12Mbps,18Mbps,24Mbps,36Mbps,48Mbps basic-rates-b=1Mbps
basic-rates-a/g=6Mbps max-station-count=2007 ack-timeout=dynamic tx-power-mode=default
noise-floor-threshold=default periodic-calibration=disabled burst-time=disabled dfs-mode=none
antenna-mode=ant-a wds-mode=disabled wds-default-bridge=none wds-ignore-ssid=no
update-stats-interval=disabled default-authentication=yes default-forwarding=yes
default-ap-tx-limit=0 default-client-tx-limit=0 hide-ssid=no security-profile=default
disconnect-timeout=3s on-fail-retry-time=100ms preamble-mode=both compression=no
allow-sharedkey=no comment=“” disabled=no

Some additional information from tonights troubleshooting:

Noise floor on both ends is always -95 or better.

So here’s the strangeness:

“A” is the AP with an omni, “B” is the CPE with the 23db grid

(1) I can never get it to associate on freq 5745, even both antennas and cards should be fine with that.

(2) At the times when the connection drops, the CPE can still see the AP on a scan at a very good signal level (-55) but will not associate

(3) I dis/re enabled the cards on both ends and also rebooted both. No luck.

(4) At the time when the CPE would not associate with the AP, I could switch roles (make B the AP, etc…) and A would associate fine to B

2-3 times a day the connection will just drop for 20-40 minutes then come back on it’s own. No amount of rebooting or fiddling seems to bring it back before it is ready.

Given that the noise floor is very low and that the connection doesn’t work at all on 5745 (though it should), is it likely that one of the radio cards is bad?

I have seen this when noise is bad, Being that you can connect on 5745 also could be noise, does a channel change make much difference?

We had the same issue 3 times. Excellent signal, low noise, worked some days or even weeks with no problem, and then “no connectivity”. I put “” in the text since there still was a possibility to get connected at some 20 meters away from the AP. At 1 km (remote client’s location) it was impossible to connect to this AP, however the signal was very good (-50 etc). It seems to me like if the card somehow sets its ACK-timeout to the very minimum (like “indoors”) disregarding settings in software and therefore, it is possible to see the radio signal, but the connection times out (other party does not acknowledge connection).

Ah, almost forgot: after this failure happened, we were almost unable to connect on 5 GHz even at short distances (1 meter), only 2.4GHz-B provided quite stable indoors connection.

In all 3 cases, replacing the CM9 card helped.

Most likely, there is defect in production because all these faulty parts have been purchased recently. Less likely version is that there is a bug in the software (all the settings per MAC-adress are stored in RouterOS, so when you unplug a NIC from router and then later plug it back again, it shows old SSID etc, so perhaps RouterOS somehow overrides user-made ACK-setting and what is even worse, this overriding permanently gets tied to card’s MAC-address). So, perhaps you may also try /system reset and re-input all the settings manually. If this does not help, then my point is: your card is dead. Replace it.