no reboot for a very very long time and I use it (v7.17.2) as a eBGP router with many peers…
The CPU inside the CCR2004 (Annapurna Alpine) is about a decade newer than the 3.x kernel used in RouterOS v6
add to that heavy customizations made to that kernel by Mikrotik, and it was never stable
With RouterOS V7 there are no such issues.
I have several CCR2004’s in production, both 12S+2XS and the newer 16G-2s+ variants
Including some that were confirmed unstable on 6.48.x/6.49.x, and were upgraded to v7
When running RouterOS V7, CCR2004 runs as reliably as any other modern ARM Mikrotik product
awesome!
Thanks @flapviv and @guipoletto for the updates!
Sorry to dig this fossil of a thread up again…
I realize that there were multiple different underlying causes contributing to “the” CCR2004 watchdog reboot problem. There are multiple entries addressing “stability” issues with CCR2004 spread out over various ROS releases. And then there was a later regression (around roughly 7.6 I think or thereabouts) with almost identical symptoms but that actually wasn’t 2004-specific, but was instead some kind of conntrack-related issue that would show up on all ARM64 devices.
I still have a particular interest in what the actual, final solution was to the problem that seemed to last throughout 6.49.x and then was finally fixed in early 7.x. The reason I continue to have an interest in this is because we still have some CCR2004s running on 6.49.x that we cannot upgrade at this time, and some subset of those are experiencing the reboot issue. But others are not (and some have crazy uptimes…hundreds of days…).
It would be nice to know the final true underlying cause of this problem that plagued this CCR model for months and months ended up being, so that we can know what features or config or use-cases to avoid until we are in a position where we can upgrade to 7.x on these. So far, when comparing the various ones we have that are stable vs. those that are not, I have not been able to find any glaringly obvious commonalities amongst the ones that behave the same, or glaringly obvious differences between those that behave differently. There are wildly different configs (routing protocols used, conntrack on/off, queues or no queues, filter/nat/mangle rules or no, bridges or no, VLANs or no, PPP tunnels or no, etc.), # of ports used, the link rates of those ports or what SFPs they are populated with, serial # ranges of affected vs. unaffected units, etc. Just when I think I might have found something, I find another unit that is an exception to the “rule” I think I discovered.
So…does anybody happen to have a pulse on what the actual issue was with these?
I do not believe the above quotes give an accurate account of the issue. Unfortunately, this urban myth was repeatedly spread by multiple posters here (beyond the two examples I cherry-picked above), until it was accepted as the gospel truth by virtually everyone. MikroTik themselves have never actually come out and claimed this.
This seems unlikely to be the explanation because 1) the new 64-bit Linux 5.x kernel in ROS7.x did not inherently solve the problem out of the gate; the final fix only appeared in 7.2; 2) @sergejs implies that the problem in 6.x is extremely similar in nature but that just the particular implementation of a fix would need to be different, and even teases the possibility of an eventual back-port of the fix to ROS6 (which of course they never did); 3) prior to the 7.2 fix coming out, they were still soliciting feedback from users of BOTH 6.49.x AND 7.1 release candidates about continued crashes, which tells us that even THEY didn’t believe that the problem was inherently with an “unstable” older kernel version, much less with the 32-bitness of it / use of ARMv7 instructions.
So: what was the REAL reason / underlying cause?
I don’t know if it is actually related, but 2 days ago I noticed something very strange:
I was working in the same room where my ccr2004-1g-12s+2xs is located and suddenly fans went full speed. (Only one PSU was connected at the thime). Fault led turned red. All link leds were down. I immediately checked my network connection with my phone. I could reach all my VLANs. I had internet connection the whole time.
After about the duration it usualy takes for the device to boot, everything became normal again. I fired up Winbox and looked at the uptime. It said several days (since the last time I rebooted after upgrading to 7.18. I had a look at the logs but nothing suspicious. Not even something about a reboot.
Any idea what that was?
Upgraded to 7.19.1 today and upgraded firmware as well.
Sounds like some process that handles hardware like LEDs, fans and so had a crash. I read of an issue with certiain SFPs than corrupt the I2C-bus (as it seem shared some how) in the device and causes all kind of strange behavior.
hmn interesting … can you recall where you read that? I only use original MikroTik SFPs, except for an SFP ONT (Zyxel PMG3000-D20B) - maybe it is related to that one …