Community discussions

MikroTik App
 
grzaks
just joined
Topic Author
Posts: 8
Joined: Thu Jun 02, 2022 11:51 am

LtAP Mini ROS v7.6 hanging but no watchdog reaction

Mon Jan 02, 2023 2:28 pm

I have ~30 LtAP Mini devices and one RB912UAG-5HPnD deployed in various locations. Some of them in cars moving around different cities.

I need your help in diagnosing what could be the reason that those devices "hang" from time to time in a wierd state which is:
  1. the scheduler stops running scheduled scripts
  2. GSM/ppp-client connection dies
  3. ethernet stops responding at all, winbox does not discover the device neither via IP nor MAC
  4. CAP LED turns on
  5. system watchdog does not react - the device is not rebooted
  6. there is nothing specific in the last log entries
Some more details:

Ad.1 the scheduler stops running scheduled scripts
I know about that because one of my scheduled scripts reboots the device after 60 failed connection attempts. I know by backed/server side that those connections are never made, so the device should reboot by this script from time to time. But it's not.

Ad.2 GSM/ppp-client connection dies
The device is configured to keep the openvpn connection to my vpn server, scheduled to fetch some URLs from time to time and also configured to send logs to remote syslog server. All of those die at the same time.

Ad.3 ethernet stops responding at all
This is the most annoying. From time to time I travel to the location of those devices and to diagnose the unresponsive device I try to connect laptop directly via ethernet. In normal circumstances I would be able to ssh to the device at 192.168.88.1, discover it via winbox or connect via MAC winbox option. None of this works when the device "hangs". I even tried to manually type the MAC address (from a sticker on the device) into winbox to connect to it, but it doesn't work either.

Ad.4 CAP LED turns on
This LED is never on in normal circumstances (except while rebooting maybe) but suddenly turns on when the device gets in this wierd unresponsive state.

Ad.5 system watchdog does not react - the device is not rebooted
The watchdog is enabled, but the device never gets restarted after it enteres this state.
[admin@dsg-EB820FE18DE1-O-GLO] /tool/netwatch> /system/watchdog/print
          watch-address: none
         watchdog-timer: yes
  ping-start-after-boot: 5m
           ping-timeout: 1m
       automatic-supout: yes
       auto-send-supout: no
Ad. 6 there is nothing specific in the last log entries

The devices are configured to send logs to remote syslog, as follows:
[admin@dsg-EB820FE18DE1-O-GLO] /tool/netwatch> /system/logging/print
Flags: * - DEFAULT
Columns: TOPICS, ACTION, PREFIX
#   TOPICS    ACTION  PREFIX
0 * info      remote  dsg-EB820FE18DE1-O-GLO
1 * error     remote  dsg-EB820FE18DE1-O-GLO
2 * warning   remote  dsg-EB820FE18DE1-O-GLO
3 * critical  remote  dsg-EB820FE18DE1-O-GLO
There is nothing specific in the last log entries that managed to get logged to remote syslog server before the device "hangs". I don't know what would get logged if the internet connection did not die. I also don't know what's in logs even when I set `action=memory` because I can't connect directly via winbox (see p.3). After manual power-on/power-off logs are lost.

I would love to provide you more details, but when this issue happens, I don't know what else I could do to diagnose what exacly happens on the device. Any hints what else to do in such situation?

One perhaps important thing I noticed is that when I had a "sanity reboot" script scheduled every few hours (to reboot the device "just in case") this "hanging" issue occured much often.

Below some information about the device configuration. Please say if you need more.
[admin@dsg-EB820FE18DE1-O-GLO] /system/resource> print
                   uptime: 1h49m24s
                  version: 7.6 (stable)
               build-time: Oct/17/2022 10:55:40
         factory-software: 6.44.6
              free-memory: 22.4MiB
             total-memory: 64.0MiB
                      cpu: MIPS 24Kc V7.4
                cpu-count: 1
            cpu-frequency: 650MHz
                 cpu-load: 6%
           free-hdd-space: 3568.0KiB
          total-hdd-space: 16.0MiB
  write-sect-since-reboot: 631
         write-sect-total: 42748
               bad-blocks: 0%
        architecture-name: mipsbe
               board-name: LtAP mini
                 platform: MikroTik
                 
[admin@dsg-EB820FE18DE1-O-GLO] /system/resource/usb> print
Columns: DEVICE, VENDOR, NAME, SPEED
# DEVICE  VENDOR                NAME                             SPEED
0 1-0     Linux 5.6.3 ehci_hcd  RB400 EHCI                         480
1 1-1     HP                    HP hs2340 HSPA+ MobileBroadband    480

# jan/02/2023 13:21:16 by RouterOS 7.6
# software id = 4GWQ-5ML7
#
# model = RB912R-2nD
# serial number = EB820FE18DE1
/interface ppp-client
add apn=internet disabled=no modem-init="AT+CFUN=1" name=ppp-out1 port=usb2

[admin@dsg-EB820FE18DE1-O-GLO] /system/gps> export hide-sensitive
# jan/02/2023 13:22:34 by RouterOS 7.6
# software id = 4GWQ-5ML7
#
# model = RB912R-2nD
# serial number = EB820FE18DE1
/system gps
set coordinate-format=dd enabled=yes gps-antenna-select=external port=serial0 set-system-time=yes

[admin@dsg-EB820FE18DE1-O-GLO] /system/scheduler> export hide-sensitive
# jan/02/2023 13:25:01 by RouterOS 7.6
# software id = 4GWQ-5ML7
#
# model = RB912R-2nD
# serial number = EB820FE18DE1
/system scheduler
add interval=3s name=fetchscript on-event=smspollingv2 policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon start-date=dec/27/2022 start-time=21:46:01
add interval=2m name=mqttsanity on-event=mqttsanity policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon start-date=dec/27/2022 start-time=21:46:09
add interval=10s name=gps2mqtt on-event=gps2mqtt policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon start-date=dec/27/2022 start-time=21:46:11
add interval=2m name=pppenabler on-event=pppenabler policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon start-date=jan/02/2023 start-time=12:52:18

[admin@dsg-EB820FE18DE1-O-GLO] /system/package> print
Columns: NAME, VERSION
# NAME      VERSION
0 gps       7.6
1 iot       7.6
2 routeros  7.6
 
grzaks
just joined
Topic Author
Posts: 8
Joined: Thu Jun 02, 2022 11:51 am

Re: LtAP Mini ROS v7.6 hanging but no watchdog reaction

Wed Jan 04, 2023 3:49 pm

Anyone? Any ideas I could try to research this issue further?
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 11444
Joined: Thu Mar 03, 2016 10:23 pm

Re: LtAP Mini ROS v7.6 hanging but no watchdog reaction

Wed Jan 04, 2023 4:11 pm

My guess would be some kind of nasty memory leak and/or kernel crash. If the USB port on those devices is otherwise unused, you can connect USB flash disk to device and set up logging to that USB stick ... you might get a few more logs immediately preceeding crash ... those that can not be sent to remote syslog server and are lost due to crash/reboot if only remaining logging destination is memory.

And unless you really require some functionality only introduced in v7, I'd downgrade those devices to v6 ... either stable or long-term (AFAIK there are no unsolved major bugs in stable, a.k.a. 6.49.7, so I'd go with that).

It is known that v7 doesn't run entirely problem-less on devices on lower end. Some run just fine (but possibly with much simpler setup than yours), some have big problems ... seems yours are somewhere in between (running fine for a while then crash). Also it's known that most functions run faster on v6 than on v7, only a few are performing better on v7 (because they are multi-core optimized; your device only has single CPU core so nothing on v7 will run faster than on v6).
 
User avatar
Amm0
Forum Guru
Forum Guru
Posts: 3259
Joined: Sun May 01, 2016 7:12 pm
Location: California

Re: LtAP Mini ROS v7.6 hanging but no watchdog reaction

Wed Jan 04, 2023 4:42 pm

I'd make sure the LTE modem fireware is upgraded to latest. It's also curious it's using PPP for the modem, since it should be using LTE if standard modem.
And unless you really require some functionality only introduced in v7, I'd downgrade those devices to v6 ... either stable or long-term (AFAIK there are no unsolved major bugs in stable, a.k.a. 6.49.7, so I'd go with that).
If you're using a Mikrotik LTE modem, this may be worth a shot for the LtAP mini. I have seen more random crashes on MIPSBE things than ARM with V7 (but watchdog has triggered in my cases, or at least AFAIK). We started using only ARM-based devices, since that's what V7 seems to target best (and likely test the most) as a result.

Only issue become if you're using a 3rd party LTE modem, or some V7 feature (like Let's Encrypt, Wireguard), you may actually need V7. In which case, some remote log collection or USB stick as suggested seems like a next step. I still have MIPSBE things, use V7 for MBIM etc on most now, but V6 is better for MIPSBE IMO if you can.
 
User avatar
Amm0
Forum Guru
Forum Guru
Posts: 3259
Joined: Sun May 01, 2016 7:12 pm
Location: California

Re: LtAP Mini ROS v7.6 hanging but no watchdog reaction

Wed Jan 04, 2023 4:45 pm

Ah, you have a:
HP hs2340 HSPA+ MobileBroadband
That might need PPP. But are you sure your carrier still support 3G, since in many regions it's being phased out.

Who is online

Users browsing this forum: BioMax, cmmike, fmcjunior and 40 guests