Random connection dropping vol2

john231 · March 28, 2020, 11:27am

Hello,
I’m still having problems with my mikrotik setup
Previous posts:
http://forum.mikrotik.com/t/mobile-devices-unusable/128011/1
http://forum.mikrotik.com/t/random-connection-dropping/134297/1
Problem is that from time to time the internet connection drops while a device is still connected to the AP. The problem specially affects mobile devices but PCs and Macs are affected as well.
Now the problem occurs for example when browsing the internet watching videos etc and at that precise moment you can’t event connect to the main router (gateway) @ 192.168.88.1. The device itself stays connected to the AP. The last fix by turning of rstp on the switches helped a lot but there are still issues.

The setup is very simple and i have attached every single devices configuration export as well (except switch, you can find that configuration in the previous thread, that configuration has not changed). Switches have been assigned static ip addresses and on 192.168.88.4 switch i have ticked the box that says Long PoE in cable.

The APs configuration is done by reseting the AP with no default config and the export file that is attached to this post is put in the terminal and executed.

Also i have attached the network topology:

Oru network topology-2.png
Every AC lite box runs 2.4GHz on different channels to make sure they don’t overlap.

Can anyone please tell me what am i doing wrong. I have been searching for a solution for months now and still got problems. This however is the most stable setup i could get. Default configuration with minor tweaks, as putting the router in bridge mode and assigning static ip address, will not work at all.
SXT-LTE is in default configuration only minor changes:

automatic upgrade enabled
sim card pin added
some static ip addresses assigned
dhcp lease time set to 1 day
logging and some services (telnet, api, ssh disabled)
Oru SXT-LTE (192.168.88.1).txt (4.49 KB)
OruAPMajaKatusTV (192.168.88.6).txt (2.4 KB)
OruAPMaja (192.168.88.2).txt (2.47 KB)
OruAPSaun (192.168.88.3).txt (2.41 KB)

SiB · March 29, 2020, 10:43pm

The best way to diagnose this problems is checking log’s at device.
You you have a downtime problem then you should check if some device not do reboot. Check in AP log-s that mac client have some reconnection or not. check at sxt if insternet that time is active.
In my opinion by checking logs you should found a problem.

This problems are very short in time loke 1-2sek or longer like 30s?

If you can “do/recreate a problem” then this will be perfect. If not then only checking logs will be good.
Setup the netwatch script at every ros device to check other one.
If you can install TheDude inside network then this will be good help for you, just add devices for ping test purpose.

About SXT(R) check this:
Simple watchdog: http://forum.mikrotik.com/t/sxt-lte-4g-cat6/134468/1
Advance watchdog: http://forum.mikrotik.com/t/tx-rx-fp-rx-dropped-pppoe-account/132656/1
Watch LTE parameters: http://forum.mikrotik.com/t/sxt-lte-4g-cat6/134468/1

I hope that way you can found that problem.

john231 · April 5, 2020, 10:33am

20-30s

Right now i have not been able to recreate that problem

Will do that

Fixed the links.
Simple watchdog: http://forum.mikrotik.com/t/sxt-lte-4g-cat6/134468/1
Advance watchdog: http://forum.mikrotik.com/t/tx-rx-fp-rx-dropped-pppoe-account/132656/1
Watch LTE parameters: http://forum.mikrotik.com/t/sxt-lte-4g-cat6/134468/1

john231 · May 6, 2020, 7:48am

Okay i did some further analysis and this packet drop is repeatable. by pinging 8.8.8.8 from a laprop connected to any AP i see packet drop of about 10% now when i ping from the sxt router i have 0 packet loss and with avg ping of 23ms. The laptop is showing rssi levels of -51 and below.

SiB · May 6, 2020, 8:09am

Watch LTE parameters - LteLogger should show you a change between BTS Band/Cell and many that behavior like

after reboot pings are ok
when I wait … then can fix itself
I move device position and it’s fix
can be logged to what cell id you are connect and when all works properly you can do:
cell lock to have connection with one band and his specific “antenna”.
log change of cellid and that way discover witch one is proper for you.

This all what I write are 80% true.
Some times when you are at CellLock and see the problems are sometimes… then you can thinking then about

differ signals means parametes of quality link between you and bts
you should correct position of device etc…

mutluit · May 6, 2020, 1:56pm

As other posters already said: analyze the log files of the devices.
It seems a device is rebooting due to high heat, or due to an internal error, for example when there is an endless loop in internal code or in a script…
Check, whether it’s a heat issue. If your devices have active cooling fans, then maybe a fan is defect, or if they are passive cooled, then try to cool the device with an external fan, for example a small USB fan… and monitor whether the error still happens.

Another possibility is that you maybe inadvertently are using a wrong power adapter for a device, ie. maybe its Ampere number is too small. Check & verify with the device documentation/specification.

john231 · May 7, 2020, 5:01am

I have all the logs enabled even debug but the log is empty? Is there a guide how to turn it on?
Screenshot 2020-05-07 09.15.21.png

john231 · May 7, 2020, 6:14am

What i said was that the LTE end is OK there is no problem pinging from the SXT-LTE. There is no heat issue, the SXT is outside with temps around 15 degrees celsius.

It seems to be a Layer 2 problem and seems to coincide with DHCP lease time expiring.

john231 · May 7, 2020, 6:17am

Also one thing i noticed, maybe it’s ok maybe not but when i look at the SXT-LTE neighbours list i see 2 mac addresses from 1 device.
Although on that AP on eth1 and wlan1 i have ARP disabled.
Screenshot 2020-05-07 09.17.34.png

john231 · May 7, 2020, 7:24am

Okay that mac address problem seemed to do the trick.. had to set the admin mac address on the bridge..

The admin mac address on the bridge is set to eth1 which is my “WAN” port that is connected to the switch.

Also got my logging working as well.
Screenshot 2020-05-07 10.23.47.png

SiB · May 7, 2020, 7:31am

and I say one, you do other one.

john231 · May 7, 2020, 8:35am

Like i said it is a layer 2 problem. It has nothing to do with CellLock or anything of the sorts. SXT-LTE works fine. The problem is with the internal network and when i setup the AP i added interfaces to the bridge as follows:

/interface bridge port
add bridge=bridge1 interface=ether1
add bridge=bridge1 interface=ether2
add bridge=bridge1 interface=ether3
add bridge=bridge1 interface=ether4
add bridge=bridge1 interface=ether5
add bridge=bridge1 interface=wlan2
add bridge=bridge1 interface=wlan1
add bridge=bridge1 interface=ether1

thus auto-mac feature on the bridge interface took wlan1 mac aadress for it’s own mac address as well.

it should have been like this..

/interface bridge port
add bridge=bridge1 interface=ether1
add bridge=bridge1 interface=ether2
add bridge=bridge1 interface=ether3
add bridge=bridge1 interface=ether4
add bridge=bridge1 interface=ether5
add bridge=bridge1 interface=wlan2
add bridge=bridge1 interface=wlan1

Or better yet the bridge should have it’s own mac address that is unique to the network.

SiB · May 8, 2020, 2:24pm

john231

True
my all bridge interface have got uniq admin mac-address.
I generate MAC address by adding new EoIP interface who generate mac-address itself - I not add EoIP, just open form of new tunnel to copy from it new mac-address and press Cancel.

john231 · May 15, 2020, 11:21am

One question still.. Now when i have all my interfaces in a bridge and bridge has been assigned an ip and ip neighbor discovery is set as default " !dynamic ".
When i look at discovered neighbours i see the bridge mac and eth1 mac of all the AP’s shouldn’t i only see the bridge one?

/ip neighbor print
 # INTERFACE ADDRESS                                 MAC-ADDRESS      
 0 ether1    192.168.88.1                            B8:69:F4:01:35:55
   bridge1  
 1 ether1    192.168.88.2                            02:2A:F3:AA:A1:E2
   bridge1  
 2 ether1    192.168.88.2                            B8:69:F4:B1:FC:C0
   bridge1  
 3 ether1    192.168.88.3                            02:2F:19:EF:AF:37
   bridge1  
 4 ether1    192.168.88.3                            B8:69:F4:95:63:8E
   bridge1  
 5 ether1    192.168.88.4                            B8:69:F4:23:27:AE
   bridge1  
 6 ether1    192.168.88.5                            B8:69:F4:B4:1D:66
   bridge1

What is the neighbour discovery used for?

SiB · May 15, 2020, 2:14pm

This is normal because this discavery is at Layer2 and show real interface, even if he is assign to bridge… look

john231 · May 17, 2020, 12:57pm

Is there a difference in which order ports should be added to the bridge?

this will cause problems..

/interface bridge port
add bridge=bridge1 interface=ether2
add bridge=bridge1 interface=ether3
add bridge=bridge1 interface=ether4
add bridge=bridge1 interface=ether5
add bridge=bridge1 interface=wlan2
add bridge=bridge1 interface=wlan1
add bridge=bridge1 interface=ether1

this will not..

/interface bridge port
add bridge=bridge1 interface=ether1
add bridge=bridge1 interface=ether2
add bridge=bridge1 interface=ether3
add bridge=bridge1 interface=ether4
add bridge=bridge1 interface=ether5
add bridge=bridge1 interface=wlan2
add bridge=bridge1 interface=wlan1

why is that?

SiB · May 17, 2020, 1:13pm

no difference

mkx · May 17, 2020, 2:35pm

… if you manually set MAC address on bridge first. And even then you might experience (transitional) loss of connectivity because management MAC may change this way or another. If you connect to RB via IP (and IP address setup survives changes in L2 configuration of your RB), you might have to re-connect. If you connect to RB via MAC, then you’ll have to connect to the new bridge MAC.

john231 · May 20, 2020, 7:21am

okay some updates i set up every way watchdog for all devices except watchdog from and to switches. For example SXT → OruAPMaja and OruAPMaja → SXT and so on for 10 second interval.
It has been live for 5 days now and i have had no reported problems (i have tested it multiple times by disconnecting random AP’s and i have gotten instant emails about it).

Also setup a watchdog from every device (except switches) against google dns ( 8.8.8.8 ) and also i have had no reports of on anything being wrong from the watchdog.

Now i did notice this morning when i moved from one room to another the connection dropped on the ipad and i was not able to ping 192.168.88.1 (SXT - main router) from the ipad.
The isssue lasted for about 1.5minutes and then ping started to work again. Maybe a problem switching from one AP to another? I was moving in the house so it could have been 192.168.88.2 (OruAPMaja) or 192.168.88.6 (OruAPMajaKatusTV) APs.

SiB · May 20, 2020, 9:18am

This is technical forum, we not speculate… check logs of this event. Try repeat a problem