v. 7.14.3 - 7.15RC3 - 7.15RC4 router was rebooted without proper shutdown, probably kernel failure

After updating the firmware to v. 7.14.3 - 7.15RC3 - 7.15RC4, the router began to reboot itself.
In 4 days it was rebooted 3 times.
260+ Hotspot on-line users.
This problem has been recurring since version 7.14.3
Router CCR2004-1G-12S+2XS.

Log critical
router was rebooted without proper shutdown, probably kernel failure
kernel failure in previous boot

autosupout.rif send to support - Waiting…

system routerboard print
routerboard: yes
model: CCR2004-1G-12S+2XS
serial-number: HE408SWT4RG
firmware-type: al64
factory-firmware: 7.6
current-firmware: 7.15rc4
upgrade-firmware: 7.15rc4

I haven’t been able to find a solution for 2 weeks now.
Maybe someone had something similar?

Downgrade again to the latest know version.

1- Don’t use beta/rc versions on production devices unless you are willing to accept downtime (a.k.a.: proceed at your own risk)
2- If you can not accept downtime: also don’t use brand new release versions unless you see it has been running stable for most. Usually such problems pop up rather fast if something is wrong.

Started at 7.15 Stable.



7.15 Stable was released yesterday.

And this is also a clear sign it wasn’t stable

current-firmware: 7.15rc4
upgrade-firmware: 7.15rc4

Besides, you didn’t find a solution for 2 weeks now ?
So definitely not stable.

sorry
7.14.3 - 7.15RC3 - 7.15RC4

Updated 7.15 stable, waiting for reboot.

Reboots started by 7.15 which was released yesterday. But issues exist for 2 weeks. I don’t get it.

Did you have the problem with 7.14.3 already ?
Any versions prior to that one where it did not happen ?

Anyhow, it does look like a really specific problem since otherwise this place would be swamped with similar reports.
Since you mentioned you made a support ticket with accompanying autosupout.rif, best to wait for their answer.

In the mean time I still suggest to downgrade to a version where you did not see that problem.
Unless there is a specific reason why you upgraded ? Some other bug for which the fix was needed ?

Yes, this problem started in 7.14.3.
That’s why I updated further.

there was no problem in 7.12.1

Thank you, I updated to 7.15, if it happens again I will downgrade

CPU load increased in 7.12.1

Did you create supouts and report possible bug to MT
( aka supout on working RoS ( no reboots ) and then on non-working RoS version (experiencing reboots))

They will be able to answer you questions more accurately.

12/May/24 5:43 PM i send support

Q:
After updating the firmware to v.7.14.3, the router began to reboot itself.
In 3 days it was rebooted 3 times.
Router CCR2004-1G-12S+2XS.

Thanks.

A:
Hello,
Thank you for contacting MikroTik Support.
Per the error in the logs:
May/12/2024 03:58:07 system,error,critical router was rebooted without proper shutdown by watchdog timer
It is related to:
https://help.mikrotik.com/docs/display/ROS/Watchdog
Disable the feature, and please send us a new rif file if it happens again.
See if the device reboots, or maybe you have to “manually” unplug the power if it happens still.
Best regards,


If I disable Watchdog, then to reboot Mikrotik with power I will have to go to another city.

:slight_smile:

No problem, send airfare, will go to that city for you!!

What’s the reason to use that watchdog ?

That was a lousy answer from support. Blame watchdog itself? If watchdog is causing a reboot when it should not, that is a bigger bug!

If you don’t know, you can look at the autosupout.rif via an account on www.mikrotik.com there is an online viewer for the .rif files. That will have the log from the reboot. So you can check yourself. Perhaps the logs in .rif have a better clue?

But if support thinks watchdog is broken - enough to recommend disabling it - that’s concern to me since I use watchdog on every router.


Well the remote routers case is one. I’d rather have the router try a reboot if some “hang” is detected & email about the situation to review (and/or see if re-occuring). AFAIK it’s based on Linux version and uses /dev/watchdog so it’s kernel-based.

While a netwatch or other script can do similar ping monitoring, the whole idea is watchdog is protects those too since it operates a more basic kernel level (i.e. writing to /dev/watchdog so the kernel does not reboot itself). i.e. what if the netwatch process is hung because of runaway CPU?

No problem, but there are no planes flying to us.
Kyiv, Ukraine
Welcome!


No :frowning:
May/30/2024 19:30:12 system,error,critical router was rebooted without proper shutdown, probably kernel failure
May/30/2024 19:30:13 system,error,critical kernel failure in previous boot

I’m still looking…

I’d ping Mikrotik again. You shouldn’t have to disable watchdog. If it really is a watchdog bug, that’s kinda serious IMO - the last thing you’d want is the monitoring for crashes, to cause crashes…

Most likely hardware…


file autospout.rif в .startup → @syslog@ _nand_probe: there are lines

…Bad eraseblock 768 at 0x000006000000…

…ubi0 error: _stext: the layout volume was not found

ubi0 error: failed to attach mtd0, error -22…

…uncorrected errors found in page 49152! (increased to 1)

yaffs: read_oob failed, chunk 46080, mtd error -74

yaffs: block 721 is bad…

…WARN: missing /reserved-memory/panics…

or this normal?

Maybe. Some hardware malfunction causing issues in the kernel is exactly kinda thing watchdog was designed to catch.

All I know is I’m pretty sure the band-aid of disabling watchdog ain’t going to fix anything. I always use watchdog and never seen some false positive.

Back in v7.10.2 there was an OOM error related to queue that forcefully reboots the router, its supposedly fixed in v.7.14.3 but then by that I already reduce the queue memory parameter and reducing complexities as much as possible so I cannot ascertain that the update fixes the issue. Can you rollback to version before v7.10.2 and see whether the issue persists?

Update to 7.15
Worked for 3 days and rebooted…
LOG:
router was rebooted without proper shutdown, probably kernel failure
kernel failure in previous boot


Support is not responding