CCR2004 - random crashes watchdog no IP address

CCR2004-1G-12S+2X - random crashes watchdog no ip address


i have had this issue for over a year. It seemed to go away for 4 months or so with little to no reboots then bam its back and 3 do it within 5 hours. all routers are ccr 2004, 1 is brand new updated to 6.49, others are on 6.48. the only thing i was linking this to was OSPF bug but that was just based on reading i was doing and no proof. the only thing in the logs is (system,error,critical ) router was improperly shutdown by watchdog timer. like i said sometimes 2 or 3 times in a day then not for a month. but this is just wreaking havoc on my uptime.

i’m at a loss does anyone have any ideas? I want to get better logs by hooking up to console can we enable advanced logging to dump what is going on? i cant shut the watchdog off and let it hang until i can get to it that’s insane.

I posted this once but it didnt show up for some reason. I am getting reboots the only log i get is critical, error, system improper shutdown improperly by watchdog timer. watchdog is enabled with default settings with no added ip address. based on reading i was wondering if this was an ospf bug but not sure. I have tried 6.49 code and 48,old hardware and new hardware. what is weird is the problem seems to go away, then comes back with a vengeance and will hit 2 to 3 units within a day. sometimes no reboots for 30 days sometimes 1 device in 3 days.

this is driving me insane. how do i get more logs via console? is there advanced logging, is there a solution? this is wreaking havoc on my uptime! if anyone can help please let me know
ccr 2004 - office crash.png

First thing is to configure other logging destinations … memory gets cleared with reboot and any potential log entries from before reboot which might shed some light are gone.

that was why i had asked about console logging - can i get it to dump to physical console port and do i need to enable any advanced logging or extend timeouts on the console? this issue is rampant all over the place with tons of people having same issue. Has there been a core issue for the ccr2004 reboots from the watchdog?




Not sure about console logging … I’d go with logs written to disk files. With 10 log files (rotated as they fill in) and 1000 lines per log file logs will consume around 1MB of precious disk space in total (100kB per file). Depending on number of events logged such setup might contain more than one month worth of logs entries.

Logging to internal disk does add some disk writes, but with your rate of reboots you’ll be running extensive disk logging only for a few days at most. I guess.

Any more info on this… Having a Similar problem with a CCR2116-4S+
Reboot from watchdog timer.

did it about 8 times in a row.. and then stabilized .

got so bad i had to rip and replace - now i’m getting a reboot " probably power related" on the 2216. makes zero sense -its just the core router something isnt stable and they dont know how to log it.


We also had a CCR2004 that also would do random chain reboots everyone few weeks with the “watchdog” error message. Upgraded it to 7.7 and problem seemed to worsen (watchdog reboots every few hours). So I replaced router with a CCR2116-12G-4S+ running version 7.7. Now it’s not chain rebooting; but rebooting once or twice daily with same “watchdog” error. My configuration uses a Bonding interface with two SFP+ ports as the members. Bonding is 802.3ad LAGing over two stacked switches. And then we run vlans (about 12) under the bond interface. Other than that, nothing too crazy (ospf and some light natting). Anyone else experiencing this problem and running bonding? I’ve had a few other sites with the exact configuration and ended up downgrading to CCR1036’s and problem solved. I’m just disabled one of the slave interfaces; just to see if everything starts to go stable. I’ve purchased quite a few CCR2116 and even some CCR2216’s for my edge. Lots of nice power with these newer routers. But I’m terrified to use them in core/edge infrastructure until I know this problem is resolved.

Did anyone attempt disabling Connection tracking in the FW ?
As long as your not using anything requires it : Nat… etc.. etc..
Seems that my router is tracking all the connections that it really has no need to watch for.. I’m just using this to route public traffic.
I see it watching 40-50k sip connections all the time. I just want it to pass the packets, no need to look at them.. let the destination devices firewall deal with it..

I’m going to disable it and see if it makes a difference..


Thoughts.

we have had CCR2116-12G-4S+ and the same issue
the router reboots itself due to watchdog timer.
I have updated the software and firmware all the way up to 7.7 and still experience the same issue. CPU cores goes to 100% then watchdog reboots the router.

did you disable connection tracking?

When I reinstalled on of my routers I had the same problem and after netboot reinstallation (took me lot of time) this watchdog problem was gone. You may also disable watchdog.


You can enable ping Watchdog by specifying an IP address and you can disable the software Watchdog by unsetting the Watchdog Timer option.

In the log it shows that there is no DNS resolution. It could be a wrong DNS, a wrong IP or no gateway or any networking problem.

If there is not IP on your router and no valid gateway it cannot ping and will reboot. You need to unset watchdog timer and set IP+gateway and fix your network problems. When this is fixed, set watchdog timer again.

Hope this helps.

at this point I am trying everything, so I currently have ip > firewall > connection > tracking disabled

Setting watchdog timer to zero will disable watchdog. Then fix your networking problems.

There are no networking issues.

The watchdog is resetting the router because the router software glitches. and becomes entirely unresponsive, multiple CPU cores at 100%.
I say software glitches because I had already tried swapping the router with another identical model and the issue persisted.

If the watchdog is disabled, it will leave the router in the hung state; and WILL require you to manually reset the router. so I strongly advice against disabling watchdog.

Are you also using bonding interfaces?

I tried disabling the bonding port and it didn’t seem to make a difference. So I downgraded the CCR2116 back to 7.6 and so far no reboots for a few days. On version 7.7, it was rebooting 2-3 times daily and sometimes chain-rebooting (2-3 times in a row). So 7.7 made rebooting MUCH worse. If 7.6 is anything like with the CCR2004, I suspect it will start rebooting much less frequently. So far uptime of almost 2 days with no “watchdog reboots”. I haven’t tried disabling the watchdog timer; worried about locking the router up indefinitely. Also someone mentioned disabling connection tracking; unfortunately, I am doing some light NATing on this router. So I have to leave it on.

Be sure the router is secured as well.. Just a FYI..

We have seen these stop rebooting once we disable connection tracking.

Following. Brand new 2216 with 7.7 and it reboots multiple times. I do have connection tracking on since the router has some NAT on it. The config is a clone of a CCR1036 that has never rebooted in 2 years. As soon as that config was put on the 2216, reboots started within a couple hours.

Same problem.. had a CCR1036 working fine.. same config on a CCR2116 and it reboots…
Disabled connection tracking and it has been stable for 3 days.

Is it possible that the ROS is having a problem with the ARM64 processor
The CCR1036 is a Tile processor.. maybe that is where Mikrotik should start their debugging.
I don’t believe that there is a “CONFIG” problem. I Have been emailing SUPOUT files to Support for 2 monthes.. and they keep telling me to upgrade to the newest firmware.
So I updated to 7.7 and it still rebooted till I disabled Connection tracking.. Crossing fingers that it stays stable.