Community discussions

MikroTik App
 
Bomber67
Member
Member
Topic Author
Posts: 383
Joined: Wed Nov 08, 2006 10:36 am

CCR1072 recovered after NAND failure - good as new?

Thu Feb 24, 2022 10:00 am

Couple of weeks ago my main router CCR1072 rebooted without any known reason and sent me an autosupout file - which I did not took any notice from (probably bad "descision").
Yesterday I discovered that somehow the admin password was cleared, made me fear it had been compromised somehow.
I had also lost access to writing or deleting files from it, including performing a backup. I read that this might come from some kind of NAND failure.
Ran a /system check-installation which passed without errors.
Then I rebooted it, but it failed to come up. Max fans and an endless loop with the text "Loading kernel" in the display and a brief flash in red "Failure" LED before it beeped and tried to reboot again.
Replaced it for a spare CCR1036 and brought it to my desk where I managed to recover it using Netinstall.

Now it looks "normal" but what are the changes that it does not suffer from any kind of permanent damage?
Is this all about something going messy in the software, or can the hardware be affected?

I don't want to re-install it just to end up in the same mess after a while.
 
Bomber67
Member
Member
Topic Author
Posts: 383
Joined: Wed Nov 08, 2006 10:36 am

Re: CCR1072 recovered after NAND failure - good as new?

Thu Feb 24, 2022 7:26 pm

First thing first, connect via console cable and look at it before netinstall. I suspect from ram instead of nand.
Ok, but file system is in NAND/NVRAM, right? Doesn't that point towards problems there?
RAM is purely (volatile) working memory and as long as it ran (for 12 days since reboot) it functioned well.
Just my $0,02
 
r00t
Long time Member
Long time Member
Posts: 672
Joined: Tue Nov 28, 2017 2:14 am

Re: CCR1072 recovered after NAND failure - good as new?

Fri Feb 25, 2022 12:28 am

NAND memory does have some unused blocks for repairing/replacing failed sectors.
This happens on write, when attempting to write bad block, it will be redirected to reserved memory and replaced with good block instead.
And everything will work just fine, as long as the corruption is not widespread and there are enough good blocks available to replace all failed bad blocks.
If everything is working then it's probably OK to have this CCR returned back to full service.

In general NAND memories are quite full of bad blocks right out of factory, as manufacturers can pretty much sell every chip they make as long as it still does have ANY capacity left.
Say they make 8GB chip, but 99% of blocks is bad... no problem!..Just sell it as 64MB (as it still have ~82MB of good blocks). Operating system just have to deal with this,
there are bad block lists etc. Basically only guarantee is to have a few blocks good at the beginning (for bootloader) and that's it. Rest of the chip may look like a Swiss cheese...
 
User avatar
chechito
Forum Guru
Forum Guru
Posts: 2989
Joined: Sun Aug 24, 2014 3:14 am
Location: Bogota Colombia
Contact:

Re: CCR1072 recovered after NAND failure - good as new?

Fri Feb 25, 2022 2:03 am

beware of bad ideas punishing the storage like running the dude or ludicrous logging or scripts constantly creating files

those are specially bad for this small 128mb storage
 
Bomber67
Member
Member
Topic Author
Posts: 383
Joined: Wed Nov 08, 2006 10:36 am

Re: CCR1072 recovered after NAND failure - good as new?

Fri Feb 25, 2022 9:05 am

beware of bad ideas punishing the storage like running the dude or ludicrous logging or scripts constantly creating files

those are specially bad for this small 128mb storage
Just firewalling/shaping and PPPoE concentrator.
Dude runs on RB1100AHX2 Dude Ed. with MSata
 
Bomber67
Member
Member
Topic Author
Posts: 383
Joined: Wed Nov 08, 2006 10:36 am

Re: CCR1072 recovered after NAND failure - good as new?

Fri Feb 25, 2022 9:08 am

NAND memory does have some unused blocks for repairing/replacing failed sectors.
This happens on write, when attempting to write bad block, it will be redirected to reserved memory and replaced with good block instead.
And everything will work just fine, as long as the corruption is not widespread and there are enough good blocks available to replace all failed bad blocks.
If everything is working then it's probably OK to have this CCR returned back to full service.

In general NAND memories are quite full of bad blocks right out of factory, as manufacturers can pretty much sell every chip they make as long as it still does have ANY capacity left.
Say they make 8GB chip, but 99% of blocks is bad... no problem!..Just sell it as 64MB (as it still have ~82MB of good blocks). Operating system just have to deal with this,
there are bad block lists etc. Basically only guarantee is to have a few blocks good at the beginning (for bootloader) and that's it. Rest of the chip may look like a Swiss cheese...
Thanks for your info.
When I start from scratch after a Netinstall, is 100% of the NAND then considered as "good", meaning that bad blocks are detected and rendered unusable along the way as it is used?
Can I do some comprehensive check to see what ist the real status of the NAND now, before programming and employing the router once again?
 
r00t
Long time Member
Long time Member
Posts: 672
Joined: Tue Nov 28, 2017 2:14 am

Re: CCR1072 recovered after NAND failure - good as new?

Fri Feb 25, 2022 9:41 pm

Yes, that how it should work. After netinstall, it should be that all written blocks were replaced if needed and so all written data should be good.
This process will continue on any NAND write operation in the future in case bad block is found.
There is bad block counter in ROS (under system/resources), but I'm not 100% sure if this info is persistent across ROS re-installs or not.

Who is online

Users browsing this forum: Cloudtechiq [Bot], erlinden, gigabyte091 and 25 guests