I have a little interesting issue with CAPsMAN. Well, let’s say it works but it does not work…
Area:
Heavy AP concentration of mobile 2.4GHz APs. So 2.4 is not really favoured. Very few AC capable APs, thus spectrum is fairly open. Main boardroom is 20x6 meters and can host around 25 people at the same time.
Kit:
RB3011 as edge and CAPsMAN controller
PowerBox (RB960PGS-PB) as in-ceiling splitter to four APs
Four APs (RbcAPGi-5acD2nD), roughly 20m apart
Base Config:
Three SSIDs per freq band (both on 2.4 and 5.8GHz)
Full roaming on all APs required
Only WPA2-PSK with aes ccm encryption
Wireless key per SSID is complex passwords containing all kinds of characters
RB3011 to firewall between the SSIDs as they are on separate VLANs and need to get out to the internet
What I tried:
Normal CAPsMAN setup (keep it stupid, simple)
Added WPA PSK to the current WPA2-PSK setup
Added tkip encryption to the already setup aes ccm encryption
Set a Group-key Update value of 30mins
Created a datapath on a bridge per SSID, each bridge participates in its own VLAN. Local forwarding and client2client forwarding disabled implicitly
Four channels created one for 2.4 and the rest on 5.8. Initially the TX power on the 2.4GHz was set very low to force devices to rather connect at 5.8 GHz. Band on 2.4GHz was set to g/n only and extension channel implicitly disabled. On the 5.8 channels the band was set to n/ac, TX power fair for indoor use and eC extension channel used.
**During my troubleshooting I now enabled all bands on 2.4 (b/g/n) and 5.8 (a/n/ac)
Six configurations created, one pair per SSID. This allows to TX the same SSID on both 2.4GHz and 5.8GHz.
**Every CAPs config has a name, mode set to AP, SSID, distance set to dynamic, hw retries set to 15, hw protection mode set to rts/cts, disconnect timeout set to 15sec, keepalive frames are enabled, country set to south africa, max station count set to 30 (this is 30 devices per ap right?), multicast helper to full (found that some devices likes this more), enabled HT TX/RX Chains 0 and 1 with a HT guard level of any.
**channel config mentioned in point 6 above
**no data rates set, some devices (most on 2.4 only does not like if any data rate is configured)
**Datapath set as above
**and Security as well.
CAPs Provisioning is not used
Static CAP interfaces were created, one per SSID, AP and frequency. One SSID is configured on the main interface using the same MAC address as the designated AP with two slave devices for the remaining SSIDs. Unique MAC IDs were created per SSID. ****This is exactly how the provisioning service will configure it anyway.
Upgraded all devices to the latest ROS 3.17 (6.42.4 - hehe) with the latest firmware
Problems experienced:
Some client devices (mobile devices or laptops) simply goes into a connect/disconnect loop. Some devices shows (Obtaining IP address) and then simply does not connect. Others connects, gets IPs but then drops at a random interval
Many devices experience the infamous 4-way handshake nightmare
Some devices connects to the 2.4, then disconnects and connect to 5.8, get an IP issued by the DHCP server, but cannot access anything (towards the internet of course)
Many devices connects and disconnects for no reason.
Some device links perfectly, can VPN out to the internet and browse with no issues at all.
****I say SOME DEVICES as the problems do follow the device, but it is not manufacturer dependant. This is what stumps me. If it was iPhones I would say it is the aes ccm tkip combo. If it was Android devices it may be noise due to the 2.4GHz saturation. If it was HP laptops we could say it is the colour of the rainbow
Some devices simply sends a deauth: unspecified to the AP
Either everyone has similar ussues or this issue is unknown or new
So, in order to verify if it is device density per AP or CAPsMAN’s inability to deal with that level of device density per AP, I removed the AP from the CAPsMAN setup and did a local only setup.
Thus, the 2.4GHz radio’s SSID is BLA-GST and has a virutal AP with SSID BLA-LTP, the same was done for the 5.8 radio.
When I do a spectral scan I can only see the 5.8GHz (physical AP) SSID being broadcast. The virtual AP on the 5.8 and both on the 2.4 does nothing. This it tells me it is not CAPsMAN causing the issues listed, but a bug in the Wireless package. Either accross the board or maybe just on these particular APs.
If anyone wants to log in remotely, this can be arranged at any time. I will allow access to all network devices (except the core switch). Please support, is there anything we can do? Anybody else maybe?
PS: I still have devices that can see all the various SSIDs, accepts the keys and then goes into a connect/disconnect loop. Both on CAPsMAN or without
PS2: On the AP (the RbcAPGi-5acD2nD or cAP ac (arm)), under the “Current Tx Power” tab, only the 2.4GHz radio displays values. Both virtual APs and the 5.8 radio does not.
PS3: For some reason in ROS 6.42.4 CAPsMAN creates twice the amount of virtual APs it is supposed to - See attached picture…
RaynoP
If you always think inside the box, you will never live on the outside
Do you have an export of your caps-man configuration?
Multiple virtual AP’s might be because you have multiple slave configurations under your “provisioning” rules.
Another possibility is that the upgrade slightly altered the configuration and you have old and new interfaces mixed.
What happens when you delete the affected interfaces and do a /caps-man remote-cap provision ?
I don’t have a clue about the issue with the AP with the standalone configuration, maybe start from scratch (factory reset) and a minimal configuration and build/test from there?
I find that sometimes LOG is my best friend. Look at your LOG and try to figure out what devices causing this issues. They might have some sort of signal strengh problem. You could try to put some accest list rules so APs will disconnect any clients that could cause troubles with signal. I could look into your routers to see what’s what if you want. I’m certainly not a guru but I’m kinda good at solving these issues.
Good luck and have a nice day.
btw: did you upgraded from pre-6.40. RouterOS rr did your original setup already had post-6.40 RouterOS installed ?
True that, the forum probably isn’t the best place, but I’ve been around for a while and know they are very busy and cannot always tend to the emails they get promptly (MUMs etc etc). And very often you get referred back to the forums anyway
I do only use the provisioning feature “once” in order to confirm the manual config I did, is correct, that is IF I use it. Mostly my configs just works by doing a manual config.
I beg to differ regarding interfaces remaining there after upgrades as CAPsMAN creates everything dynamically. If you implicitly disable local forwarding the interfaces it creates are virtual. So comms to CAPs controller = no virtual interfaces at all. If you use local forwarding then your statement is true AFAIK
The only way I could get rid of them was to remove CAPs control on the device and reboot it. It would not delete the “ghost” interfaces, no matter what I did…
The standalone config is as vanilla as you can get. I very seldom have to redo configs as I know how to fix or change what is required. I am by no means gloating in any way though, so please do not misinterpret my statement. We all human, we all make mistakes
That is what gets me as well, I set additional logging rules to catch any form of CAPs or wireless errors, debug messages etc etc. Nothing. All you see is devices connecting and disconnecting - normal behaviour of wireless devices. That now excludes the couple of errors that does come through like “4-way handshake timeout” or “disconnecting due to access lists, reconnecting to another AP”. I do have the ACLs that kicks a device off if its signal is too weak. I use it to force devices off of the 2.4GHz onto the 5.8GHz band. Thus my TX power on 2.4 is very very low.
What is striking though is, everything was working perfectly until we got 25 odd guys in there each with at least 2 devices. The APs are close enough that the ACL will kick one off to connect to another AP. Yes I staggered each APs’ frequency and dropped TX power fo rall the good practice reasons
I generally update my CCR at home as soon as a new version comes out. Then after a day or five of smooth sailing I then roll it out to other devices. These particular devices has 6.40.something on. We baught them a little over three weeks ago. They still have that nice new smell hehe
I would like to have a look at your configuration in Winbox because I think this could be ACL problem. I’m sure you are very experienced but sometimes fine tunning of this setting is required along side with extensive testing. If some devices simply goes into connect/disconnect loop they might not be able to choose the right AP/BAND to connect to. Try fine tunning each CAP AP ACL and look at LOG files where different devices are connecting to. Other than that I think you configured everything just fine.
Please let me know on your progress
Thank you for the assistance, but I think I found the issue. It simply is device density. I did a spectral analysis yesterday and found 48 x 2.4GHz APs and 17 x 5.8GHz APs while scanning from my laptop. As soon as all external interference (APs) were switched off after hours, most of our problems went away.
I still have the same devices which gives “4-way handshake” errors and the connect/disconnect loops – same devices. There are no ACLs, so it cannot be ACLs. IMHO I think a lot more R&D has to be done on CAPsMAN and the APs needs to be hardened to cope with high device density installations. We “upgraded” from one UBNT UniFi AC radio that could do what four CAP ACs cannot do. The devices giving the errors are brand new, so they use software and drivers that have been tried and tested by Samsung, Apple and HP for many years now.
This 4way handshake issue I currently have, I have had on Mikrotik devices since I tried to connect the first apple device to an AP (oh and a lenovo laptop that simply does not work on Mikrotik wireless till today). This was years ago when Routerboard just started building “all-in-one” RBs where APs were built into RBs. Brilliant idea though
We get new toys and new features with new ROS versions, but very so often devices and features that has issues; simply never gets resolved. Has been like this since ROS 4.0…
Thank you for the advise. Bear in mind that in non-local forwarding mode CAPsMAN sets everything on the interface. Even if you try to set something and then enable CAPsMAN, it overwrites the settings as per its configuration. The only way I could get that to work (the settings to stick) was to enable local forwarding mode. Now the virtual interfaces do not go into their VLAN bridges and removes themselves every day. So I have to log in to every router every day to re-add the virtual interfaces into the correct bridge.
channel: 2437/20/gn(15dBm), SSID: PAS, local forwarding
set [ find default-name=wlan1 ] disabled=no name=wlan1-2.4 rx-chains=0 ssid=MikroTik tx-chains=0
managed by CAPsMAN
channel: 5580/20-eC/ac(20dBm), SSID: PAS, local forwarding
set [ find default-name=wlan2 ] disabled=no name=wlan2-5.8 rx-chains=0 ssid=MikroTik tx-chains=0
/interface vlan
add interface=ether1 name=wifi-gst.e1 vlan-id=152
add interface=ether1 name=wifi-pas.e1 vlan-id=150
add interface=ether1 name=wifi-tpl.e1 vlan-id=151
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/interface bridge port
add bridge=WIFI-PAS interface=wifi-pas.e1
add bridge=WIFI-GST interface=wifi-gst.e1
add bridge=WIFI-TPL interface=wifi-tpl.e1
add bridge=WIFI-PAS interface=wlan1-2.4
add bridge=WIFI-TPL interface=wlan161
add bridge=WIFI-GST interface=wlan162
add bridge=WIFI-PAS interface=wlan2-5.8
add bridge=WIFI-TPL interface=wlan163
add bridge=WIFI-GST interface=wlan164
/interface wireless cap
set caps-man-addresses=1.1.1.1 enabled=yes interfaces=wlan1-2.4,wlan2-5.8 lock-to-caps-man=yes
[RaynoP@MK-AP03] >
I think we may be speaking past each other. The virtual APs (wlan161/162/163/164) changes name every day for what reason I would love to know. Nothing switches off, nothing reboots. Yet every day those virtual APs will have different interface names and therefor no longer be part of the correct VLAN, in the bridge > port screen I only have "unknown"s.
As I mentioned in previous comments, a virtual AP is disabled when CAPsMAN does the forwarding, this runs on a UDP stream in the background. If local forwarding is enabled, the interfaces becomes enabled. You cannot add a disabled interface to a bridge???
You still did not elaborate as to what you deem as “proper” vlan setup?
Thank you for the reply. These are the only ceiling mount AC APs (RBcAPGi-5acD2nD) I can get my hands on. All the others are desktop.
I generally don’t NetInstall a RB unless it is required and being brand new out the box APs, which were factory loaded with ROS 6.41, and is literally in a vanilla setup, I tend to disagree that older ROS versions works better than newer (depending on what changed of course hehe). And to be honest this interface removal out of bridge setup is something I had on ROS since CAPsMAN v1.
Don’t misunderstand, everything WORKS. But, as soon as we have 40+ devices per AP and old 2.4GHz devices mixed with new, I get the 4way handshake issue. The heavy disconnection of devices were resolved when I switched over to local forwarding. So over and above the 2.4GHz radio not being able to handle multiple types of old and new 2,4 GHz radios, the virtual APs removes themselves from the bridges.