We started deploying 2004s into our network and have issues with one we are trying to add into our bgp core.
It rebooted every 10-14 days so we took it out and replaced in with another one. We can ran memory test and got memory errors on the first one.
This was 12 days ago…
Today the new rebooted same way making lots of issues in OSPF 0.0.0.0 to the extent we had to reboot 2 other routers running 6.45.9 to make traffic resume again. Another CCR1016 got so corrupted that /export compact, snmp etc dident work.
The second CCR2004 (6.47.1) is connected with console cable now and doing remote memory test we get errors on this one too. Broken batch or something else? Feels like too much coincidence perhaps?
Error in address=0x00000000C0004768, W=0xC0004768 R=0x00000000 X=0xC0004768
Error in address=0x00000000C000476C, W=0xC000476C R=0x00000000 X=0xC000476C
Error in address=0x00000000C0004770, W=0xC0004770 R=0x00000000 X=0xC0004770
Error in address=0x00000000C0004774, W=0xC0004774 R=0x00000000 X=0xC0004774
I’m having lockups on one of our two 2004s as well. It’s an edge router with 4 bgp sessions (no full tables). About every 36 hours, it locks up where we can’t access it via winbox, ssh and snmp stops. Sometimes it passes traffic through it, other times no traffic will pass.
We got a console server on it now. Last lockup reported nothing at all in the console but I still had console access. When I tried to get it to generate a supout.rif file via console, that failed but immediately after, it came back to life and I could generate one via winbox.
Our crash 2 days ago it was passing traffic but we lost all access to it including console. Had to pull power to reboot it.
What are you all logging to “echo” in hopes of getting useful info in case of a crash? We have “critical, warning, health, system and event” echoing and nothing was on the console at all for our last crash.
We are seeing same issues with one of 8 CCR2004s: uninvited reboots 1-2 weeks apart. Each one is running 6.47.1, each one is running BGP, so it would seem that it may be a hardware problem. SFP28 is in use on all, so it’s also not the reason.
I am shipping one 2004 back under RMA and our other one reboots every 1 -2 weeks. Was on 6.47 and I just put it on 6.47.2 two days ago. Both are running BGP and OSPF.
Mikrotik is running some special debug packages on one of our routers. Its either a software bug or something deep in the hardware since they mentioned involving the CPU vendor.
Without the debug packages nothing came on console at crash time, so hoping for a new crash soon so this can be resolved. It has currently been up 7 days.
We have 24 2004s in boxes so would be nice to solve this issue…
we are running the same debug firmware on ours. Every one of our deployed 2004s is having some sort of problem (either random reboots or crashes or both).
We are seeing the same problems on two CCR 2004 our of 10 deployed. No Connection tracking enabled. Support says that unless we have the debug package installed with the console, there’s no way to catch the problem. Today it happend again on the same units with version 6.47.2: it looks like an hardware issue to me, we will deploy the debug and console and see if we can help finding the problem.
We just had our 2004 crash with the debug firmware installed. The last line of the console output is:
[admin@AUW-LOOKOUT-EDGE-02] > LOOPER: read_raw read failed: EOF
died with signal
Nothing before that for hours. After the crash, we got 2 physical link up/down messages in console (about 2 minutes after the crash). Nothing else. Router will not respond to console input and we can’t log into it. Nor is it passing traffic. I have sent the debug log to support with our open ticket with them. Hopefully this console message is useful to them and they can get this fixed.
That is related. It was added to help support troubleshoot this. We are running that firmware at the request of Mikrotik to gather more information when it crashes. Has not done anything yet to stop the crashing.
We also have issues with CCR2004, there is no BGP/OSPF in our case. Reboot fully random, sometimes few times per day, sometimes once in two weeks.
CPU load do not exceed 6%, average load 20-25Mbps with rare spikes to 50Mbps. ROS: 6.47.4.
We love MT devices, …but those reboots are horrible.
We are running 6.48beta48 on some of the 2004s that was rebooting, It seems to have solved the reboots, but it seems that we now face an issue where the routing protocols stops working instead.