Atheros client re-association problem revisited

I have been following the previous closed post viewtopic.php?f=7&t=15647&hilit=sr2+problem but feel it is necessary to reopen this subject.

In summary, based on my experience using Atheros AP’s, Atheros clients, on an all Mikrotik network running NSTREME. If an AP has less than 10 clients on it, there are no problems. Once you hit about 15 clients, the problem begins. On powering up the AP, clients will associate in groups of five or so, all registering with -95 signal strength. If you have a client at the base of the tower or very close, they will associate immediately with normal signal strength, usually. From that point forward, about every 5 seconds we see clients associate then drop off. This repeats in groups of five or six. Sometimes 2/3 of the clisnts will associate and then all drop off. Unless you take some action, they will never stop associating and reassociating.

Some things that appear to fix it:

  1. Change the antenna to TxA-RxB. Everyone associates with normal signal strength but uploads are slow and client Tx CCQ is bad. Not really a fix.

  2. Change from 802.11B to 802.11B-G. Sometimes this solves the problem.

  3. Rapidly enable and disable the interface. Probably the best solution from my experience. I even wrote a script to do this every 2 seonds. Works sometimes.

This is what I have observed so far. Here are my hardware evolutions:

Original
AP - Senao NMP8602
Client - Compex WLM54G
Problem first seen when AP hit 17 clients

First Change
AP - Replaced with new, unused Senao NMP8602
Client - Compex WLM54G
Problem continued

Second Change
AP - Replaced with SR2 and new 532A RB and replaced sector antenna, coax, poe
Client - Compex WLM54G
Problem continued

Third Change, different sector, different clients, different tower, same problem
AP - Replaced with XR2 and new 532A RB
Client - Compex WLM54G
Problem continued

I read throughly the previous post about this problem viewtopic.php?f=7&t=15647&hilit=sr2+problem. Having experienced and fought this problem myself with the fear of losing tons of customers I believe most of these people posted with the same problem based on similar symptoms.

This is an urgent situation, more urgent that version 3 Beta (unless that fixes it), more urgent than updates to the Dude, etc. If I could run 2511 cards and fix all my woes I would but Prism won’t do NSTREME and I am fully invested in MT and NSTREME so it is too late to turn back.

Help MT, please help us solve this asap. I can provide data and assistance from the field if needed.

If you post to this topic, please post factual information and not opinions about anyone, any company, etc. Resolution is more important than expressing emotion.

Nothing to do with emotion.

  1. You were using Compex cards on your clients. Have you tried using a Compex card on the AP side.

  2. I have no issues using Atheros (All Ubiquiti based) radios as long as I use all SR radios.

  3. The problem resolves if you use the old Senao NMP-2511 cards.

All Compex is not something I have tried but that’s a good idea. 2511’s are not an option because they are prism based and they won’t do nstreme which I run exclusively. If it weren’t for nstreme I would gladly drop the 2511’s in because they definately don’t have this problem.

I’ve had no issues with Compex cards but I run them exclusively in B mode only.

What is the maximum numbers of users on one AP? Are you running nstreme?

What are your NSTREME settings?

I had a similar problem.. changed from best-fit to dynamic-size and things are working well now. I have polling also enabled.

That has worked for me too, sometimes, other times no.

For myself I have experienced odd issues like this in the past. My observations were:

NSTREME preforms worse if the noise is higher in the area. Especially in PtMP enviroments.
Almost all of my odd issues like yours have been a result of outside influence, stong background noise, etc.

Something I have noticed is that the noise floor reading in ROS, could be quite good at -99, or -105 yet on a spectrum analyzer it shows strong non-802.11b,g signals that caused enormous problems when I tried to operate near their frequency. So just becuase the noise floor is still resonable it doesn’t mean their isn’t something else interferring.

A few questions I would have for you:

  1. Does the problem persist without NSTREME?
  2. Have you tried different channels?
  3. If you are using all Mikrotik and Atheros, have you tried 5MHz or 10MHz channel sizes?
  4. have you checked the area for other stong 2.4GHz signals?
  5. do you have anyone on the AP that would be considered a borderline connection (Tx/Rx of <-83)?
  6. Do you have the AP rates locked down or is it running at default settings?

Cheers

have you tried to run Nstreme without polling? Disable polling on the AP and check the results.

I am not using NSTREME, but am using 10mhz channels and that has helped us out alot as alot of 802.11 stuff gets ignored because I am listening to 10mhz channels instead of 20mhz.

Also, if you look at my post history you will see that I had tons of problems for a while. Most of the problem was clients dropping every few minutes and reassociating. My solution was rather unexpected. I discontinued using The Dude and my problems went away. My AP is now happily serving 30 clients in 10mhz channels and each one is set to an AP limit of 2.5M down/390k up. Working great. Now I need to add a few more APs.

I too have been having this re-association problem. When the AP is rebooted, or power is cycled, many clients will not re-associate. AP is 8602+(on 175ft tower) with clients using approx 30% Tranzieos, 30% CB3s, and 40% MikroTik w/r52 cards. The problem began recently when we added a few additional users, putting us somewhere around 20 on that AP, but problem effects all users not just the recent additions. 2 other towers with similar setups do not show the same problem, but they are only serving 10 or fewer each.

Notes: The AP is running a virtual SSID in addition to the primary one. The only thing I’ve noticed that helps at all is to shut down The Dude before the reboot and not open it back up until after clients have associated, but this has not worked every time.

Please help, we are loosing customers over this.

I have heard this comment about the Dude several times. Wonder what the Dude has to do with it? Is is possibly the SNMP polling the dude does of the AP? The Pinging? Anyone have a clue about this?

Today I experimented, I changed frequencies and to get the clients to re-asssociate, I changed from 2.4-B to 2.5-B/G. Then enabling and disabling the interface they all came back on. Unfortunately going back to B was not as easy but eventually by enabling and disabling, I got them all back on. Disabling the Dude had no affect. I contact tech support and they suggested incrasing the CPU speed, that NSTREME took an increased amount of resources. I checked CPU while the clients were dropping on and off and it never got over 15% so I don’t think that is the issue. They also suggested Ver 3 Beta but I am concerned about it in a production environment. I have also heard a lot of upgrade problems going from 2.9.

Uh Oh,

I had this problem on non-nstreme APs and eventually gave up on Atheros. The problem makes the AP useless. My purpose was to deliver higher speeds through 802.11G. The only way I was able to do this was to use an external radio and just use the Mikrotik to manage other functions. I wonder if using one of the other router-oses like Antcore would work.

I thought this problem had been resolved or did not exist with nstreme. I’ve been rolling out 5G APs on my towers, but have a max of 14 clients on any given AP. Looks like I’m heading for trouble again. Gag. Anyone know of an prism based 802.11a card?

Yeah I see what you are saying. Unfortunately I think we are pretty close to starting to look for another vendor due to these issues too. It’s been revisited over and over again on these forums, but support keeps blaming this on CPE/Noise. One common thing that I have read in a number of different posts is that if they use an AP from a different vendor - even Linksys - that this problem does not occur. I submit that how can the problem be CPE/Noise if someone can replace the AP with equipment from any other vendor - even SOHO crap - and the problem does not show up?

I’m currently working on a way to test this myself, but I have not been able to do so at this time - so that’s based purely on what I’ve read in posts on this forum.

This problem happens when you change any setting on the wireless card and click apply with more than 10 or so clients. I have done many test to see if there is a setting that can help with this and found none, but i will try and explain what i have found so that someone can fix it.

First off, from what i have found is that it is a problem with mass cpe’s trying to reconnect at the same time. I tested this by moving about 30 customers off an AP(Mikrotik with Prism card) to another AP(Mikrotik with Atheros card) one at a time. Prism cards seem to be the fix for us since we dont use nstream, just normal ap handing dhcp, but i want to use long range cards i still put them up to test. I moved all 30 clients 1 at a time and all 30 were registered and great signal, no noise, and working great.I changed the radio name(not SSID) by 1 character and clicked apply the 30 clients droped off and 5 came back and log stated getting the old sent deauth msges. I disabled and re-enabled wireless card over and over again till they all came back up then i clicked hide ssid and back to 5 clients. unhid ssid still 5. after playing with it i found that you can change any setting and hit apply and drop the clients or get them to come back up, ex. one time i changed the ap back and forth from 2 channels and they eventually came back up. I put all the clients back on the original prism ap and started over again all 30 move over fine 1 at a time. then i put them back on original ap and disabled the prism ap and changed the SSID of second ap to that of the prism ap and got 5 clients and sent deauth msges in log, so it looks like the atheros+mikrotik gets bugged out from having 10 or more clients try and connect at same time(and we changed to every different atheros card we had, sr2, compex,ect..).

here is my fix for this problem though not a true fix but if the tower drops clients its already fixing itself before you know there is a problem.

i made a script called “watcher”

:for i from=1 to=1 do={
/interface wireless monitor wlan1 interval=1 do={
:if ($registered-clients < 10) do={
/interface wireless disable wlan1;
/interface wireless enable wlan1;
}
}
}


then made a scheduler that was set to startup, has interval of 00:00:07, and event watcher


I put that on tower and changed a setting the clients droped to 5 but then it fixed its self after about 5 mins.

hope this helps.

Good data points, thank you kbyrd.

Here’s a brief summary of what I believe to be true based on our experience and these posts:

  1. This is a Atheros AP to Atheros client problem.
  2. The problem is not limited to NSTREME, as kbyrd is not using NSTREME.
  3. The problem is not limited to MT AP to MT client.
  4. The problem seems to be related to any MT AP with more than 10 clients.
  5. Rebooting the clients or the AP does not fix the problem once the AP starts this ASSOC/DIS-ASSOC cycle.
  6. Any change to the AP that causes clients to dis-associate will trigger this problem.
  7. There is no coorelation between noise and the problem, we see it on towers near town and out in the woods.
  8. This does not appear to be a CPU resource problem as the CPU never goes over 15% during the assoc/dis-assoc cycles.

An idea for a work around would be a script to put on clients to create a random dlay such that they do not all try and associate at the same time. From what kbyrd has seen, if you manually bring the clients back one at a time the problem doesn’t occur.

I have better results re-associating the radios by putting DFS Mode to “radar detect” in wireless options, scanning the interface worked also although with less success. of course it is an atheros problem because all my cards are CM9. I see this issue for a long time in this forum.

before I had a wrap-star-os AP without any problem, the card was atheros with CM9, so I am thinking seriously to deploy some wrap-star-os APs in the busiest places.

hey, i won´t change MT for my long P2P links, neither for my Mikrotik core router

I tested this about 18 months ago and at that time in my system there were only CB3s at customer sites, which I don’t think are Atheros chipset. The only Atheros part was the SR2 or Senao Equivalent ( forget the part number) at the AP. Both failed.

BTW I had a previous post removed where I was suggesting a method of testing and mentioned a competitor’s name explicitly. Perhaps they will let it stay if I just say: If you are currently experiencing this problem and could swap just the CPU with something else it would nail down for certian if it is an Atheros chipset problem or MT problem. Jose’s post seems to indicate it is an MT problem but a clear cut test would be nice.