I’m running 7.1rc4 on an RB4011 since a while.
At some point I noticed that it reboots periodically (really always somewhen after 3,5 - 4 hours runtime).
Unfortunately I do not have much more information than that. I obviously could provide a supout but the forum says to report issues here and not to support@.
Overall my configuration is not too complex. The device is just my central home network router and capsman device for two additional hAP-ac^2. Those are running 7.1rc4 as well but do not crash. They have almost no configuration besides running as CAPs.
Currently I do not have logs from just before the crashes.
Any pointers how to debug further?
The config is fine - its obviously being set by ntp - but the curious point the other poster was making is that the boot time is always the same to ~2 days ago, all your ‘reboot without proper shutdown messages’ default back to oct 01 23:57:49 - is this when you upgraded/installed 7.1rc4? Sounds like some kind of memory leak and nothing is really saving on the underlying system so resets back to that point like it was refresh.
Have you tried a proper reboot cycle? Let it shutdown gracefully and come back up..
Alternatively I’d say do a net-install fresh install of rc4 and use a recent backup to restore config. Take a supout and sent it to support though in case its an error with the upgrade process or similar.
No idea why after boot it falls back to that old date.
I performed a clean shutdown (actually more than one) in the meantime yesterday.
Also I installed 7.1rc4 a while ago not only at Oct 1st.
I’m also not totally sure but I think the problem appeared not immediately after the 7.1r4 update. I noticed it the first time after I installed a few extra packages. Not sure if I didn’t notice before or it really started after that.
I installed container, zerotier and user-manager because I wanted to start playing with those. I didn’t touch anything for those yet. First thing when I noticed my reboot issue I disabled and eventually removed those packages one after the other to see if it helps but it did not.
A supout I have sent today to support and a netinstall I can try (as soon as people around me can live a while w/o internet connection )
After an reboot I have a warning in Terminal that the time has been adjusted by the NTP-client.
Normal on a reboot the that is 1970-01-01 00:00 and that was on v6.
V7 could just like a RaspberryPI save the last know date and time when doing a reboot and use that as start from time. However if you boot your device weeks later that save data is well out of date.
It could be the same here that the last known time is saved and reused on boot in the time before a NTP sync or the user manually setting date and time.
In your case it seems that the saved time is never (re)set because the router hard crashed.
That was exactly the point I was making with my earlier remark.
Going on a stretch … the crash might be related to this internal time not advancing as it should ? Some code not reading the ‘real’ time, or something related.
Speculation, I know.
For fun did a reset on one of my mAPLite’s running 7.1rc4
[holvoetn@mAPLite91] > log print
16:21:10 system,info router rebooted
16:21:13 interface,info wireguard link up
16:21:15 script,warning NETWATCH - WG Down
16:21:15 system,info led trigger changed
16:21:16 wireless,info 2C:C8:1B:47:0B:3D@wlan1 established connection on 2452000,
SSID HomeLab
16:21:17 dhcp,info dhcp-client on wlan1 got IP address 192.168.92.27
16:21:23 system,critical,info ntp change time Oct/03/2021 14:21:23 => Oct/03/2021
14:21:41
16:21:55 system,info,account user holvoetn logged in from 10.255.255.1 via winbox
16:22:00 system,info led trigger changed
16:22:00 script,info NETWATCH - WG Up
16:22:22 system,info,account user holvoetn logged in from 10.255.255.1 via telnet
[holvoetn@mAPLite91] >
There is an NTP correction but that’s only the time it took to reboot.
So why does the device of OP go back to Oct 1st ?
Something to try: go back to a previous version (downgrade) and then upgrade again to 7.1rc4 ?
Only the analysis of supout might tell what’s happening …
I’m starting to think that the container package is to blame. I had it installed on my hAP-ac2, it was rebooting all the time, but, seems that just removing does not fix it, just a full config reset.
I’ll test my theory on my RB5009 that is not in “production” yet
Yes, above I wrote that it feels like my problem started after I installed container, zerotier and user-manager. But removing those did not change anything. So this experience seems to match.
on the hAPac2 in the end I had to do a netinstall, because it was crashing too much, but there is another issue with the hapac2 that is l2MTU, that also make it crash, so, I had 2 issues that were locking up the device.
I just installed it on my RB5009 to test, lets see
When reboot it will always sync just as when it crashes. The difference is that on a reboot the time is not that far apart, this because of when you reboot the time is written.
On a hard crash the current time is not written and is much more in the past. I assume Mirotik left this kind of ‘canary’ in there to detect normal reboots or crashes.
Big difference, crash an small difference, a manual reboot/triggered reboot.
My RB5009 is up for 18h after installing the container package, but it is arm64.
I’ll install on my hAP-ac2 (that uses the same arch than RB4011) to see if it crashes…
I understand where you are going with your thinking but it is not logical.
Why on subsequent crashes always going back to Oct 1st even though OP said he did downgrade/upgrade after Oct 1st ? Upgrading also involves rebooting at least twice, no ? Package and firmware.
I did another test this morning when leaving home: took my mAP, simply pulled the plug and took it with me to work. I think that classifies as an unexpected shutdown.
Upon reboot 2 hours later, the log file correctly showed this unexpected event and upon NTP-start a correction of just a bit under 4 hours. More then almost 2 hours which my commute took this morning but not a difference of days.
That four looks to a pivotal point here. The crashes are about fours apart and your last time set also went back four hours. Then maybe the updating of the time triggered the crashes of TS.
Just an update because I tried to do a netinstall of the device a few times now.
I really don’t get netinstall to do anything. I followed the manual really very much in detail. I brought the device into netboot via /system routerboard settings set boot-device=try-ethernet-once-then-nand and had prepared netinstall. When I try to install, it shortly tells me “updating” but no progress bar or anything. Goes into “ready” state quickly. After a reboot everything is back as before.
I tried that often and now I’m scared to not be able to recover my device when doing the next step using the reset button to reset and enter netboot.
Tried on a spare RB2011 doing the same and it worked flawlessly with the same laptop and setup.
Will continue trying but need to have some Plan B to have connectivity… (to be continued)
BTW, support reacted but apparently no idea. Asked for a serial console connection which I cannot provide easily due to missing hardware.
I don’t know if this is relevant to this discussion but I wanted to post it here anyway in case it leads to something useful.
After upgrading to 7.1rc4 it takes my RB4011GS somewhere between 6 and 18 hours to corrupt itself to the point that it must be reset and rolled back to rc3 before it starts working again.
For some reason it stop routing DNS traffic to my DNS server. It also seems to cause port flapping. The base routing out to the Internet still works. Very strange behavior and I was unable to capture anything that helped me pinpoint what was going on. I have gone through this a few times now and I feel that i have proven that it happens after the upgrade to rc4 so I won’t be doing that again. I guess I will give it another go in the next rc if the relase notes have anything resembling an acknowledgement and fix.