Hidden node issue?

Hi guys, Ive had a couple threads related to this issue ive been having, but this might be a different issue altogether if its true.

I submitted a support request to mikrotik support about my RB433ah/XR2 radio lockups.

here is what i submitted:

Hi there, I have another issue i have been unable to resolve thru the

regular means. I have 3 rb433AH Access points with around 50 subs on
them at 2 seperate tower sites. two have xr2 radios, the other has an
SR2.

every day one or more of these AP’s have a “lockup”. What I mean by
that is the Wlan interface stops passing traffic until I disable the
wlan interface and re-enable it. Im using PPPOE to authenticate most of
my users, the other users connect via dhcp.

In the logs ,when the WLAN interface is not passing traffic, I will see
all the dhcp users attempt dhcp, and then timeout over and over. Same
thing for pppoe users, i will see pppoe-waiting for call- terminating -
disconnected. over and over.

when i torch the wlan interface i see nothing at all.

I have no idea why this happens and have not found any other mention of
a similar issue anywhere.

the RB 433AH units are either on 24v POE injectors or 24v AC adaptors.


So support suggested that i enable 1m basic rate and set my arp-timeout to dynamic instead of static.

In stead of completely locking up, I saw several times overnight where the AP would seem to lock up just like before, but right itself after an hour or two. so in other words, more differently the same.

Here is the latest response i got:

Hello,

the wireless logs doesn’t show any disconnects. The only thing that could be is
the AP has too many clients and sometimes the clients could send at the same time
and could experience the hidden node issue (collisions on the AP). Maybe you could
try to split this AP into 2 wireless cards so the AP would not be so loaded?

Anyone here have experiance with the hidden node issue? It should be noted that we are running rts/cts on all our CPE’s 1024 frag/ 200rts/cts

Sounds like the 5.0rc1 nv2 wireless bug.
Are you running v5.0rc1 and nv2 wireless mode?
If not then what version are you running?

Im running all 4.10 packages. this bug has followed me thru all the v4.x packages ive tried.

here is the latest from support:

“in the support output file there was one time when the router tried to reset the
card and failed and maybe that caused a small freeze for a while. There is nothing
what we could do as it is the hardware - the card didn’t respond. Maybe it is
overheating.”

I doubt it’s overheating, these units are in an outside cabinet and its a typical canadian fall right now.

this is the config on my wlan interface:

Flags: X - disabled, R - running
0 R name=“wlan1” mtu=1500 mac-address=00:15:6D:10:31:2D arp=enabled
interface-type=Atheros AR5213 mode=ap-bridge ssid=“5555” frequency=2462
band=2.4ghz-b scan-list=default antenna-mode=ant-b wds-mode=disabled
wds-default-bridge=none wds-ignore-ssid=no default-authentication=yes
default-forwarding=no default-ap-tx-limit=0 default-client-tx-limit=0
hide-ssid=yes security-profile=default compression=no

I have seen some strange things with hidden nodes and APs.

Here are some things to try…

On your APs:

  • disable csma
  • preamble long
  • On Fail Retry Time 900
  • Disconnect Timeout 00:00:12
  • Adaptive Noise Immunity none
  • Hw Retries 15
  • Hw Protection Mode cts-to-self
  • Hw Protection Threshold: 0
  • if you are running mixed b/g then turn off 36 & 48 & 54 Mbps
  • Ack Timeout dynamic
  • Periodic Calibration enabled
  • Calibration Interval 00:10:00

Make sure you do not have overlapping channels on your near-by APs

If you can - try tunning on rts/cts on all of your clients with a hw-protection-threshold of 32 on all of your client radios

enable client rate limiting (on your client radios) for who can upload and flood the APs.

This should help some…

Tom Jones - WISP up in North Idaho

edit - note - I have some APs with over 100 clients and I have found this works best.
edit - another note - with this many clients, it sometimes may take a few minutes after an AP reset to get all the clients associated again - untill then you will see the client registration jumping up and down for a minute or so… just disconnect your winbox and come back in a few minutes.

Hi, thanks for the imput. I will try implimenting your suggestions tomorow and see what happens, couple of questions tho:

  • disable csma, where do i go to disable that? dont see it in the wlan settings.

  • also i dont have “hw-protection-threshold” on any of my cpe’s but rts/cts threshold is set to 200, is that the same thing with a different name? Also have a fragmentation threshold of 1024.


    notes:

pre-amble is already long.

I have client rate limiting set via radius attributes, most clients connect via pppoe. The others are set via dhcp and a basic queue rule. also have a tcp connection limit of 80 per ip set.

It’s an 802.11b only network. usually forced to 11meg only, but now have set to 11m or 1m rates will be setting back to forced 11m rates tomorrow.

Oh and channels are not overlapping for sure. but in some areas our clients are getting signal from other WISP’s on the same channel when i do a site survey from thier cpe.

sucks but cant really do anything about it till we get our 3.65 moto stuff going.

CSMA is in Wireless Tables – interface – wlan – nstream – Disable CSMA

hw-protection-threshold is sometimes called rts or cts threshold on different devices. I normally use client rts/cts threshold of 32 on ALL clients that talk to an AP where the AP is busy and has lots of clients. This forces the clients to send a RTS to the AP and when the AP sends a CTS to the requesting cleint - then all other clients stop trying to talk to the AP - which gives a clear-clean channel for the AP to listen to the single client.

note - on the AP - CTS to self - this allows the AP to tell all clients dont send because the AP is sending to a single client

CTS to self on the AP and RTS/CTS on all clients help eliminate (or reduce) the hidden node problem where the channel is messed up with client traffic stomping on other clinet traffic both to and from the AP.

CTS to self on the AP and RTS/CTS on all clients does slow down a wireless network because of the extra wireless packets sent and received - but sometimes on busy congested wireless networks this is the only way to resolve the hidden node problem.

OK i have set up your suggestions on one of those problem towers and ill watch and see how she behaves. will let you know what i see.

so far so good, but will have to see how it behaves overnight. question, should i have polling enablewd or disabled?

radio lived out the night without behaving like it did before on dynamic-arp.

now to watch and see how it behaves over the next couple days. looking good tho.

I do not use nstream or polling on my 2.4 APs. This is because most of my wireless clients are not mikrotik devices.

Also - I do use watchdogs in my APs and client devices.

My Mikrotik APs have the watchdog configured to ping an IP address in my office/NOC. If and when an AP looses communications to my office, the AP will reboot.

My 100 percent of all my client devices also use a watchdog which pings my NOC also. If a client devices looses a connection it will auto reboot and re-connect to the AP which will then let the client device perform a watchdog ping check to my office/NOC.

My wireless network is much more reliable because I use a watchdog IP ping check in all back-haul point-to-point links, a watchdog in all my APs and watchdogs on all client devices.

What is rather neat is I can auto reboot hundreds and hundreds of microwave radios based on what IP address I turn-off here in my office/NOC that my watchdog pings are checking to. Also, if I have 2 APs servicing a client zone and one AP is not working correctly, clients on that AP will auto-reboot untill they establish a connection to the remaining working AP covering the same client zone.

So the AP hasnt required a reboot since the changes were made, but i am seeing some interesting behavior. This morning it looks like all my assoiciated wirless clients re-associated with the Wireless AP, except for 2 users, 1 has been in session for 4 days, the other for 2.
now, that said, i havnt had any complaints for that tower at all,which is abnormal.


I disabled polling this morning since none of our cpe’s are mikrotik.

If the non-mikrotik client CPEs have a watchdog feature - try it out.

I wish. engenious cpe’s mostly 2611p’s but some 1650’s and some more older realtek based ones. 3220ext’s.

so the issue we had this morning was a very large area power outage. almost all the cpe’s in that area it affected, but not the AP.

I didnt have any problems on the other two radios this weekend on the old settings, but all the seasonal people are leaving so we have half the usage we would have during the summer.

I have changed those other two AP’s to the new settings. My stress testing will probably have to wait till next season tho.

Alright so things are much improved but some stuff is still getting cranky. last night and this morning basically the same things happened, but this time to see what happens, while the WLAN interface was unable to pass any meaningful traffic, i enabled the 1meg basic data rate. Immidiatly all the associated radios dropped to that 1meg rate and stayed there.


can this all be caused by one clients CPE going apeshit? shouldnt the cts to self setting prevent that?

The small AP that never has a problem did this this morning too. the 411/xr2.

it was on static arp and forced 11meg, basically all my old settings. this would be the first time ive seen it do this since it was deployed over amonth ago.

right after i rebooted the wlan card, i noticed i had one user going full tilt on his 1meg/1meg limited package.

so i figured i better apply the new settigns to this one too.

Try this on all your Mikrotik APs. I use these settings on my APs where they are loaded with over 100 clients - all clients are 3 to 20 miles away. I have driven my RB433AH up to over 150 clients and the network was still talking.

I am assuming you are running 2.4 Ghz APs.

Band B/G

Data Rates: configured
Supported Rates B: 1, 2, 5.5, 11
Supported Rates A/G: 6, 18, 24 (uncheck 9, 12, 36, 48, 54)
Basic Rates B: 1 (uncheck 2, 5.5, 11)
Basic Rates A/G: 6 (uncheck 9, 12, 18, 24, 36, 48, 54)

Advanced:
ACK: dynamic
Periodic Calibration: enabled
Calibration Interval: 00:10:00
Hw Retries: 15
Hw Protection Mode: cts-to-self
Preamble Mode: long
Disconnect Timeout: 00:00:12
On Fail Retry Time: 900

Nstream:
Enable Nstream - not checked
Enable Polling - not checked
Disable CSMA - checked

This should help your AP - it may take around 20 minutes for everything to become stable for best throughput.

NOTE: On your client radios - If you manage 100 percent of all your client radios then try the settings below
client radio set to use: RTS/CTS
client rts/cts threshold: 32
((( Note - I suggest your use bandwidth rate limiting on 100-percent it all of your client radios. It only takes a single client saturating the AP to screw up the entire wireless network )))
((( I would suggest a starting bandwidth rate limit in each client of 256k to 512k UP and a limit of 256k to 1-meg down (((in each client or at least your hog bandwidth sucking clients))) — you can always come back and find-tune your QoS rules after your network is stable)))


Tom Jones - a WISP up in North Idaho

I upgraded to 4.13 on the suggestion of mikrotik support for the new wireless driver. It didn’t seem to make a difference all my subs are rate limited via pppoe or by a simple queue rule, its a condition of them being connected to the AP. All my radios are forced into b only mode anyways… so ill try your b rates. as i recall, enabling the basic 1meg rate and 1meg supported dint really affect it last time.