Community discussions

 
WirelessRudy
Forum Guru
Forum Guru
Topic Author
Posts: 3089
Joined: Tue Aug 08, 2006 5:54 pm
Location: Spain

Random client disconnects in NV2 ROSv6.35.2

Mon Jun 13, 2016 11:46 am

We finished some 700+ routers upgrading to v6.35.2.
Amongst them some 40 AP's and 650 clients connected to these. All AP use NV2.

We now find that whenever an AP is disconnecting clients because some text field has to be changed in the AP's wireless access-list 'comment' field, or because the AP just needed a reboot (for the upgrade) or was temporarily down (to do a 'spectral scan' in the middle of the night) some clients don't come back up anymore. 

These clients actually need a power cycle to get connected again.

What we see in difference to pre 6.35.x versions is that when all clients are dropped for whatever reason and are to be associate again to AP the tdma protocol has now problems in connecting them. 
Before clients were just connected at random but one after the other. Once a client was associated it stayed connected. Now we see the list of associated clients oscillating. Meaning some clients associate - drop off - associate again - drop off again - and associate. On a 25 cpe AP it now can take a full minute or longer to have all clients back associated where in the past that would not take half that time..... and we now also find that some clients associate and immediate disconnect again not to come back up.... client has to be given a power cycle.....

This is a pain since at times we need to do a spectral scan in the middle of the night and the next morning clients call in because their internet ain't working no more. All it needs is a power cycle but that's should not be the case....

Anybody had seen the same issue? 
Show your appreciation of this post by giving me Karma! Thanks.

Rudy R. Puister

WISP operator based on MT routerboard & ROS.
 
User avatar
TomjNorthIdaho
Forum Veteran
Forum Veteran
Posts: 998
Joined: Mon Oct 04, 2010 11:25 pm
Location: North Idaho
Contact:

Re: Random client disconnects in NV2 ROSv6.35.2

Wed Jun 15, 2016 7:39 pm

WirelessRudy -
Re: Anybody had seen the same issue?
Re:  ... some clients don't come back up anymore
You may want to search some of my posts.  I had a similar problem years ago.  My fix was the following:
On all Mikrotiks (APs, clients, - anything Mikrotik), I have a Netwatch pinging my central office (192.0.2.254 - research what this IP is).  When Netwatch fails a ping, it then runs a script.
The script does the following:
#1
-  Log something that the script was called
 - ping 192.0.2.254
 - if ping reply to/from 192.0.2.254 is good then log something and counter and exit/abort the script
 - if ping reply to/from 192.0.2.254 is bad, then counter=counter+1
 - if counter = 30 ( or what ever number you decide on) , then write a log-file stating an auto-reboot was performed (time & date) - then reboot the Mikrotik.
 - sleep 10 seconds
 - loop back to the "ping 192.0.2.254" section in the script

I use this on every Mikrotik everywhere in my network.  Note - with this, clients will survive a spectral-scan and site-surveys.
This procedure will give you (10 seconds * counter) prior to a Mikrotik reboot

I avoid the stock Mikrotik watchdog feature because it is to aggressive at rebooting when there is a delay in the network or the network is congested/saturated.
North Idaho Tom Jones

EDIT - O - I almost forgot a side benefit - if you want to auto-reboot every Mikrotik in your network at the same time - then just turn off your 192.0.2.254 IP address for a few minutes.  This works super-great when you want to force some new DHCP options to everything.
 
User avatar
TomjNorthIdaho
Forum Veteran
Forum Veteran
Posts: 998
Joined: Mon Oct 04, 2010 11:25 pm
Location: North Idaho
Contact:

Re: Random client disconnects in NV2 ROSv6.35.2

Wed Jun 15, 2016 8:00 pm

Something else to consider - which is what I also do ...
If you use Cacti (or any other type of SNMP based bandwidth graphing software, - then- also have your Cacti graph the number of connected clients for each WLAN on every AP that you have.
This will allow you to get a quick total connected clients graph per AP WLAN you have in your network.
Also - with Cacti, you can graph your WDS (back-haul) links with information such as signal strength, SNR, tx-rate, rx-rate,  frequency, cpu load and temprature.  This comes in handy if you ever change a channel, you can see what channels on your links performed the best at what frequency - and you can go back a day or week or month or anytime you want to look at your graphs

North Idaho Tom Jones
 
WirelessRudy
Forum Guru
Forum Guru
Topic Author
Posts: 3089
Joined: Tue Aug 08, 2006 5:54 pm
Location: Spain

Re: Random client disconnects in NV2 ROSv6.35.2

Wed Jun 15, 2016 10:02 pm

Guys; very welcoming these suggestion. We'll look at the solutions (we already started to use Dude, but still a bit in a learning curve) and see how and if it can benefit us.

But the main problem of the disassociate unit not coming back is not really cured with it. But I am in contact with support about it and honestly, I haven't seen a similar issue the last days. But then again, I didn't switch an AP off for whatever reason the last days....
Anyway, together with Mt with hope to find a cure....

But as said, suggestions for network monitoring and alerts etc. are very much welcomed. 
Show your appreciation of this post by giving me Karma! Thanks.

Rudy R. Puister

WISP operator based on MT routerboard & ROS.
 
User avatar
ZeroByte
Forum Guru
Forum Guru
Posts: 4051
Joined: Wed May 11, 2011 6:08 pm

Re: Random client disconnects in NV2 ROSv6.35.2

Wed Jun 15, 2016 10:31 pm

Great post, Tomj!
O - I almost forgot a side benefit - if you want to auto-reboot every Mikrotik in your network at the same time - then just turn off your 192.0.2.254 IP address for a few minutes.  This works super-great when you want to force some new DHCP options to everything.
Did you anycast this 192.0.2.254 for increased robustness? It sure would be catastrophic if that one host fails / gets isolated and causes your entire operation to take a power nap. Perhaps you have several such ping targets and simplified it for the example, but I felt like this nuance should be pointed out in case anyone else deploys such a solution without realizing this could happen. Or perhaps that IP is configured on your core router so if it's down, the whole kit and kaboodle is down anyway so who cares if CPEs start rebooting like lemmings off a cliff? ;)
When given a spoon,
you should not cling to your fork.
The soup will get cold.
 
User avatar
TomjNorthIdaho
Forum Veteran
Forum Veteran
Posts: 998
Joined: Mon Oct 04, 2010 11:25 pm
Location: North Idaho
Contact:

Re: Random client disconnects in NV2 ROSv6.35.2

Wed Jun 15, 2016 10:48 pm

Great post, Tomj!
O - I almost forgot a side benefit - if you want to auto-reboot every Mikrotik in your network at the same time - then just turn off your 192.0.2.254 IP address for a few minutes.  This works super-great when you want to force some new DHCP options to everything.
Did you anycast this 192.0.2.254 for increased robustness? It sure would be catastrophic if that one host fails / gets isolated and causes your entire operation to take a power nap. Perhaps you have several such ping targets and simplified it for the example, but I felt like this nuance should be pointed out in case anyone else deploys such a solution without realizing this could happen. Or perhaps that IP is configured on your core router so if it's down, the whole kit and kaboodle is down anyway so who cares if CPEs start rebooting like lemmings off a cliff? ;)
re:  It sure would be catastrophic if that one host fails
If that one hosts fails (my 192.0.2.254), then yes I have a serious network problem.  That IP address ( 192.0.2.254 ) is a local IP address on a loopback interface of my core router.  If that goes down - then I truly have lost all network access to everything everywhere to/from Internet and to/from all client customers.
Another trick re the auto-reboot info - which I also do sometimes:
Have all Mikrotik client devices test to 192.0.2.254
Have all core routers & APs & backhauls test to 192.0.2.253
This way you can easily only reboot customers or only reboot your network devices - or both at the same time.
Also you can setup 192.0.2.x zones - where clients in different areas test to a specific IP 192.0.2.z IP address - thus allowing you to auto-reboot zones instead of everything at the same time.
The IP address 192.0.2.254 is does not route through the Internet - it is intended (kinda like 192.168.x.x ) to be a test network.  Thus 192.0.2.x/24 is safe to use because nobody on the Internet can talk to it - only your networks can talk to it.
re: anycast
I treat it just like a normal live IP address.
Thank you for you positive reply
North Idaho Tom Jones

EDIT - added the following - also - on my client devices - I also log to a Mikrotik file the frequency of the AP and the signal strength and connect rate the client sees when it connects.  If you have multiple APs covering a sector - then you can look at the client log for what happened - last auto reboot - what IP it connected to when the client restarted and what the signal strength was.

North Idaho Tom Jones

Another EDIT - I am looking to enhance my scripts with some extra code that prior to an auto-reboot - the Mikrotik will perform a scan and a frequency usage into a file in the log.  Then I would have the ability to look any clinet and see what the wireless conditions are like at the remote customer location.
 
User avatar
TomjNorthIdaho
Forum Veteran
Forum Veteran
Posts: 998
Joined: Mon Oct 04, 2010 11:25 pm
Location: North Idaho
Contact:

Re: Random client disconnects in NV2 ROSv6.35.2

Wed Jun 15, 2016 11:08 pm

Give this a try -
(If you do not have a 192.0.2.254 IP address, then change to the IP address of your core router)
(when you test it - look at the colors in your Mikrotik log )
(edit - also look at your files directory for a file named "ScripRebootReason.txt".  The file is time-stamped when the last auto-reboot took place


1'st - disable the Mikrotik Watchdog IP (if you use the Mikrotik Watchdog to test to an IP address)

2'nd - Create a netwatch
Host:     192.0.2.254
Down:     log info "Netwatch missed a ping to 192.0.2.254 - starting 5 minute timeout script" ; /system script run NetWatchBoot-192.0.2.254

3'rd - Create a script
Name:     NetWatchBoot-192.0.2.254
Body of the script (cut and paste):
:local addresstoping 192.0.2.254;
:local interface "wlan1";
#
:local continue true;
:local counter 0;
:local maxcounter 18;
:local sleepseconds 10;
:local goodpings 0;
:log error "-----> Tom's Netwatch-Script-Warning - Netwatch could not ping $addresstoping - Will begin further testing in $sleepseconds seconds - and will continue for $maxcounter times $sleepseconds seconds";
:while ($continue) do={
:set counter ($counter + 1);
:delay $sleepseconds;
:if ([/ping $addresstoping interval=1 count=1] =0) do={
:log info "----->ping to $addresstoping failed on attempt $counter of $maxcounter -- Will try again in $sleepseconds seconds";
} else {
:log warning "-----> ping success on to $addresstoping attempt $counter of $maxcounter <----- No Further testing needed --- Program will exit -----";
:set continue false;
:set goodpings ($goodpings +1);
/interface wireless monitor $interface once without-paging do={
:local status $"status";
:local band $"band";
:local freq $"frequency";
:local wprotocol $"wireless-protocol";
:local noise $"noise-floor";
:local signal $"signal-strength";
:local snr $"signal-to-noise";
:local thruput $"p-throughput";
:log info "-----> Status: $status --- Band: $band --- Frequency: $frequency --- WProtocol: $wprotocol --- NoiseFloor: $noise";
:log info "-----> Optional Info if Available ---> SignalStrength: $signal --- SNR: $signal --- PThroughput: $throughput";
/interface wireless monitor $interface once
:local txr $"tx-rate";
:local rxr $"rx-rate";
:local sstr $"signal-strength";
:local signoise $"signal-to-noise";
:local curdistance $"current-distance";
:local txccq $"tx-ccq";
:local rxccq $"rx-ccq";
:log info "-----> TxRate: $txr --- RxRate: $rxr --- SignalStreng: $sstr --- SignalToNoise: $signoise --- CurrentDistance: $curdistance --- TxCcq: $txccq --- RxCcq: $rxccq";
};
}
:if ($counter=$maxcounter) do={:set continue false;}
}
:if ($"goodpings" = 0 ) do={
:log info "-----> Rebooting in 15 seconds";
:delay 5;
/file print file=ScriptRebootReason
/file set ScriptRebootReason.txt contents="Rebooted by Toms script  on $[/system clock get date] at $[/system clock get time]"
:log error "-----> Rebooting in 10 seconds";
:delay 5;
:log error "-----> Rebooting in 5 seconds";
:delay 5;
:log error "-----> Rebooting now";
:delay 1;
/system reboot
/system reboot
/system reboot
/system reboot
}
Last edited by TomjNorthIdaho on Wed Jun 15, 2016 11:27 pm, edited 1 time in total.
 
User avatar
TomjNorthIdaho
Forum Veteran
Forum Veteran
Posts: 998
Joined: Mon Oct 04, 2010 11:25 pm
Location: North Idaho
Contact:

Re: Random client disconnects in NV2 ROSv6.35.2

Wed Jun 15, 2016 11:15 pm

If you test the netwatch & script I just posted - then you logs will contain something like this:

may/31 11:44:07 system,error,critical router was rebooted without proper shutdown
may/31 11:44:19 interface,info ether1 link up (speed 100M, full duplex)
may/31 11:44:22 script,info Netwatch missed a ping to 192.0.2.254 - starting 5 minute timeout script
may/31 11:44:22 script,error -----> Tom's Netwatch-Script-Warning - Netwatch could not ping 192.0.2.254 - Will begin further testing in 10 seconds - and will continue for 18 times 10 seconds
may/31 11:44:31 dhcp,info dhcp1 deassigned 192.168.56.120 from C0:C1:C0:7B:FF:CF
may/31 11:44:31 dhcp,info dhcp1 assigned 192.168.56.120 to C0:C1:C0:7B:FF:CF
may/31 11:44:32 script,info ----->ping to 192.0.2.254 failed on attempt 1 of 18 -- Will try again in 10 seconds
may/31 11:44:42 script,info ----->ping to 192.0.2.254 failed on attempt 2 of 18 -- Will try again in 10 seconds
may/31 11:44:52 script,info ----->ping to 192.0.2.254 failed on attempt 3 of 18 -- Will try again in 10 seconds
may/31 11:45:02 script,info ----->ping to 192.0.2.254 failed on attempt 4 of 18 -- Will try again in 10 seconds
may/31 11:45:12 script,info ----->ping to 192.0.2.254 failed on attempt 5 of 18 -- Will try again in 10 seconds
may/31 11:45:19 wireless,info D4:CA:6D:AC:1C:D3@wlan1 established connection on 5660000, SSID Red-Spectrum.com
may/31 11:45:22 script,info ----->ping to 192.0.2.254 failed on attempt 6 of 18 -- Will try again in 10 seconds
may/31 11:45:29 system,info item added
may/31 11:45:29 system,info item added
may/31 11:45:29 dhcp,info dhcp-client on WAN got IP address 66.35.12.10
may/31 11:45:32 script,warning -----> ping success on to 192.0.2.254 attempt 7 of 18 <----- No Further testing needed --- Program will exit -----
may/31 11:45:33 script,info -----> Status: connected-to-ess --- Band:  --- Frequency:  --- WProtocol: nv2 --- NoiseFloor: -114
may/31 11:45:33 script,info -----> Optional Info if Available ---> SignalStrength: -29 --- SNR: -29 --- PThroughput:
may/31 11:45:33 script,info -----> TxRate: 6.5Mbps-20MHz/1S --- RxRate: 6.5Mbps-20MHz/1S --- SignalStreng: -29 --- SignalToNoise: 85 --- CurrentDistance: 1 --- TxCcq:  --- RxCcq: 4
jun/05 09:28:05 script,info Netwatch missed a ping to 192.0.2.254 - starting 5 minute timeout script
jun/05 09:28:05 script,error -----> Tom's Netwatch-Script-Warning - Netwatch could not ping 192.0.2.254 - Will begin further testing in 10 seconds - and will continue for 18 times 10 seconds
jun/05 09:28:16 script,warning -----> ping success on to 192.0.2.254 attempt 1 of 18 <----- No Further testing needed --- Program will exit -----
jun/05 09:28:16 script,info -----> Status: connected-to-ess --- Band:  --- Frequency:  --- WProtocol: nv2 --- NoiseFloor: -115
jun/05 09:28:16 script,info -----> Optional Info if Available ---> SignalStrength: -27 --- SNR: -27 --- PThroughput:
jun/05 09:28:16 script,info -----> TxRate: 240Mbps-40MHz/2S/SGI --- RxRate: 300Mbps-40MHz/2S/SGI --- SignalStreng: -27 --- SignalToNoise: 88 --- CurrentDistance: 1 --- TxCcq: 94 --- RxCcq: 99
jun/06 13:03:05 script,info Netwatch missed a ping to 192.0.2.254 - starting 5 minute timeout script
jun/06 13:03:05 script,error -----> Tom's Netwatch-Script-Warning - Netwatch could not ping 192.0.2.254 - Will begin further testing in 10 seconds - and will continue for 18 times 10 seconds
jun/06 13:03:15 script,warning -----> ping success on to 192.0.2.254 attempt 1 of 18 <----- No Further testing needed --- Program will exit -----
jun/06 13:03:15 script,info -----> Status: connected-to-ess --- Band:  --- Frequency:  --- WProtocol: nv2 --- NoiseFloor: -115
jun/06 13:03:15 script,info -----> Optional Info if Available ---> SignalStrength: -19 --- SNR: -19 --- PThroughput:
jun/06 13:03:15 script,info -----> TxRate: 300Mbps-40MHz/2S/SGI --- RxRate: 300Mbps-40MHz/2S/SGI --- SignalStreng: -19 --- SignalToNoise: 96 --- CurrentDistance: 1 --- TxCcq: 100 --- RxCcq: 99
jun/10 10:34:05 script,info Netwatch missed a ping to 192.0.2.254 - starting 5 minute timeout script
jun/10 10:34:05 script,error -----> Tom's Netwatch-Script-Warning - Netwatch could not ping 192.0.2.254 - Will begin further testing in 10 seconds - and will continue for 18 times 10 seconds
jun/10 10:34:15 script,info ----->ping to 192.0.2.254 failed on attempt 1 of 18 -- Will try again in 10 seconds
jun/10 10:34:26 script,info ----->ping to 192.0.2.254 failed on attempt 2 of 18 -- Will try again in 10 seconds
jun/10 10:34:37 script,info ----->ping to 192.0.2.254 failed on attempt 3 of 18 -- Will try again in 10 seconds
jun/10 10:34:48 script,info ----->ping to 192.0.2.254 failed on attempt 4 of 18 -- Will try again in 10 seconds
jun/10 10:34:59 script,info ----->ping to 192.0.2.254 failed on attempt 5 of 18 -- Will try again in 10 seconds
jun/10 10:35:09 script,warning -----> ping success on to 192.0.2.254 attempt 6 of 18 <----- No Further testing needed --- Program will exit -----
jun/10 10:35:09 script,info -----> Status: connected-to-ess --- Band:  --- Frequency:  --- WProtocol: nv2 --- NoiseFloor: -115
jun/10 10:35:09 script,info -----> Optional Info if Available ---> SignalStrength: -18 --- SNR: -18 --- PThroughput:
jun/10 10:35:09 script,info -----> TxRate: 300Mbps-40MHz/2S/SGI --- RxRate: 270Mbps-40MHz/2S/SGI --- SignalStreng: -18 --- SignalToNoise: 97 --- CurrentDistance: 1 --- TxCcq: 99 --- RxCcq: 94
14:29:24 system,error,critical login failure for user admin from 66.35.8.19 via winbox
14:29:38 system,info,account user admin logged in from 66.35.8.19 via winbox
14:37:14 system,info,account user admin logged in from 66.35.8.19 via ssh
 
User avatar
TomjNorthIdaho
Forum Veteran
Forum Veteran
Posts: 998
Joined: Mon Oct 04, 2010 11:25 pm
Location: North Idaho
Contact:

Re: Random client disconnects in NV2 ROSv6.35.2

Wed Jun 15, 2016 11:24 pm

When testing it - if you look at your logs - you can see that a client is going to reboot in 15 seconds prior to the actual reboot.
This gives you time to abort/stop the script/job if you do not want the Mikrotik to actually reboot while you are working on it.

To disable it - just disable the netwatch - then it will no longer run.  This is handy sometimes.

I wish Mikrotik would modify their watchdog to perform something more friendly like this.  It would save on all kinds of false/un-needed reboots.

North Idaho Tom Jones

Who is online

Users browsing this forum: No registered users and 43 guests