Community discussions

 
User avatar
kuz8
just joined
Topic Author
Posts: 7
Joined: Sun Mar 02, 2014 10:08 am
Location: Boston, MA

bootloop on CCR1036-12g-4s (almost 5 years old) [SOLVED]

Fri Jan 04, 2019 1:00 am

Hi all,

In addition to my very slowly flowing discussion with official support, I guess I'll post it here.

I had this first CCR1036 in a remote datacenter for almost 5 years, purchased in Feb 2014, it was set up for 2 partitions, part0 with 6.38.x (previous verified stable), and part1 flagged Active with 6.43.2 (latest stable then in Sep 2018), everything was stable when suddenly it rebooted and I found it booted to "next" partition which was part0. During a maintenance window on a weekend in Nov 2018, I've booted to part1, upgraded it to then latest stable 6.43.4, to find it not answering to pings 4.5 days later on Thursday night. Importantly I always upgrade firmware to the matching one next step after it boots to the upgraded packages. Call to datacenter remote hands to power-cycle, leaving off for a minute multiple times didn't help, since then it started to bootloop. Finally I had someone coming with newer spare CCR1036. The faulty CCR1036 was bootlooping for 5 days in the row, my colleague swapped it to the new one, restored config and since then the new replacement one runs happily for already 30 days with the same config on 6.43.8.

I've got the faulty CCR1036 and noticed that it fails to boot right after powering on, it may make quite a few attempts until it finally gets past "starting kernel" on the router screen, and "Starting..." message from kernel in console.

Arturs C. from support suggested following things which I followed, plus more tests I did in addition, which I thought may provide a different point of view at the problem, however it is still occasionally bootlooping:
1. repartition it to 1 partition
2. format nand
3. netboot
4. try both latest and stable with matching firmware

When netbooting it was bootlooping referring to "corruption" like so, until at some attempt it got into the netboot kernel:
loading kernel... kernel not found or data is corrupted
trying bootp protocol... OK
Got IP address: 192.168.88.3
resolved mac address 84:8F:69:BB:22:3B
transfer started ............................ transfer ok, time=1.97s
setting up elf image... OK
jumping to kernel code


RouterBOOT booter 6.43.8

CCR1036-12G-4S

CPU frequency: 1200 MHz
  Memory size: 4096 MiB
    NAND size: 1024 MiB

Press any key within 2 seconds to enter setup..

loading kernel... kernel not found or data is corrupted
trying bootp protocol..... OK
Got IP address: 192.168.88.3
resolved mac address 84:8F:69:BB:22:3B
transfer started ............................ transfer ok, time=2.01s
setting up elf image... OK
jumping to kernel code


RouterBOOT booter 6.43.8

CCR1036-12G-4S

CPU frequency: 1200 MHz
  Memory size: 4096 MiB
    NAND size: 1024 MiB

Press any key within 2 seconds to enter setup..

loading kernel... kernel not found or data is corrupted
trying bootp protocol... OK
Got IP address: 192.168.88.3
resolved mac address 84:8F:69:BB:22:3B
transfer started ............................ transfer ok, time=2.07s
setting up elf image... OK
jumping to kernel code
Or when I reformatted flash, downgraded to long-term/bugfix, with the default config, it still bootloops like below, until finally a successful attempt. It may bootloop for quite a while and never find that lucky turn.
RouterBOOT booter 6.42.10

CCR1036-12G-4S

CPU frequency: 1200 MHz
  Memory size: 4096 MiB
    NAND size: 1024 MiB

Press any key within 2 seconds to enter setup..

loading kernel... OK
setting up elf image... OK
jumping to kernel code


RouterBOOT booter 6.42.10

CCR1036-12G-4S

CPU frequency: 1200 MHz
  Memory size: 4096 MiB
    NAND size: 1024 MiB

Press any key within 2 seconds to enter setup..

loading kernel... OK
setting up elf image... OK
jumping to kernel code


RouterBOOT booter 6.42.10

CCR1036-12G-4S

CPU frequency: 1200 MHz
  Memory size: 4096 MiB
    NAND size: 1024 MiB

Press any key within 2 seconds to enter setup..

loading kernel... OK
setting up elf image... OK
jumping to kernel code


RouterBOOT booter 6.42.10

CCR1036-12G-4S

CPU frequency: 1200 MHz
  Memory size: 4096 MiB
    NAND size: 1024 MiB

Press any key within 2 seconds to enter setup..

loading kernel... OK
setting up elf image... OK
jumping to kernel code
Starting...
Starting services...
MikroTik 6.42.10 (long-term)
MikroTik Login:
I was sending supout taken during first successful boot after series of failed boots, but support didn't find anything pointing to the issue.

Also I've noticed that unlike smaller cheaper hAP ac / wAP ac / crs317 / crs326, neither of my CCR1036 and CCR1072 even have a line "bad blocks" in System / Resources.
How do I find out if it has issues with flash?

So the question is - any chances diagnosing what's wrong with it? Given the age I may need to retire it, but would like to know if I can proactively foresee such issue happening to my other routers, it's clearly missing any human-readable explanation of the issue.
Last edited by kuz8 on Fri Jan 11, 2019 2:22 am, edited 5 times in total.
 
User avatar
pcunite
Forum Veteran
Forum Veteran
Posts: 741
Joined: Sat May 25, 2013 5:13 am
Location: USA

Re: bootloop on CCR1036-8g-4s (almost 5 years old)

Fri Jan 04, 2019 1:03 am

So the question is - any chance diagnosing what's wrong with it? Given the age I may need to retire it, but would like to know if I can proactively foresee such issue happening to my other routers ...

Look at replacing the power supply?
 
User avatar
kuz8
just joined
Topic Author
Posts: 7
Joined: Sun Mar 02, 2014 10:08 am
Location: Boston, MA

Re: bootloop on CCR1036-8g-4s (almost 5 years old)

Fri Jan 04, 2019 1:15 am

Look at replacing the power supply?
Thank you, one more experienced person I chatted with before, said the power supply failure is the most frequent problem with CCRs, but surprisingly official support never ever gave such clue. PSUs seem to be $26 here in the US, will try to order it and will report on results.

With high probability it is - mine behaves just like on this video - the voltage drops to zero just at the moment it decides to boot again - https://www.youtube.com/watch?v=Ylh_Wsd1gTc and the left capacitor belly is bumped:
capa-2019-01-03_18-30-14.png
I wasted so much time with no relevant clue in this direction from support...
You do not have the required permissions to view the files attached to this post.
 
User avatar
pcunite
Forum Veteran
Forum Veteran
Posts: 741
Joined: Sat May 25, 2013 5:13 am
Location: USA

Re: bootloop on CCR1036-8g-4s (almost 5 years old)

Fri Jan 04, 2019 2:06 am

... the power supply failure is the most frequent problem with CCRs ... PSUs seem to be $26 here in the US, will try to order one and will report on results.

Ugh, yeah, let us know how it goes.
 
User avatar
kuz8
just joined
Topic Author
Posts: 7
Joined: Sun Mar 02, 2014 10:08 am
Location: Boston, MA

Re: bootloop on CCR1036-12g-4s (almost 5 years old) [SOLVED]

Fri Jan 11, 2019 2:20 am

damn that capacitor and those who designed this dumb PSU (vs smart PSU like Dell has which has logic and firmware updates and can tell if it has issues), looks like with new power supply it's stable again.

Who is online

Users browsing this forum: arisk, mada3k and 22 guests