Community discussions

MikroTik App
 
BRMateus2
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 73
Joined: Thu Oct 26, 2017 11:18 pm

RB951G-2HND Reboot issues and system corruption

Thu Oct 26, 2017 11:37 pm

This might be an duplicate, because my first topic doesn't show...

Hello, so I bought the RB951G-2HND in september and it was my first one, before I had an TP-Link running DD-WRT which freezes after some days and the default firmware was even worse.
Everything perfect, f* fast router and features that I loved, acting as an advanced home router, until I rebooted it (6.40.4) by the Winbox (3.11) menu after an upgrade from (6.35~), and it simply boot-froze, so I did an Netinstall as of below method (for any newcomer) and reconfigured without restoring backup. Restarted again and everything ok, fast, rules and QoS seems optimized.

After that, the energy went down one time and it booted perfectly.
Other day (1 week later), the energy went crazy (light pole fuse broke) and got from 127 volts to 80 volts for 1 hour 2a.m. (I measured it before, high probability), it maintained working without reboots.
Rebooted it the next day, as I tought the eletricity went out other than being at lower voltage. Nothing in the logs that warns. Rebooted perfectly.
So after 7 days without any problems, I decided to reboot to refresh all the data and memory and caches, and them it did again, boot-froze and had to do another netinstall.

Boot-froze: it beeps one time, (doesn't twice-beep an second time, meaning it didn't boot properly).

So I did the netinstall again and now I restored my last backup, because I would hate to config. everything again (DDoS deny rules, tarpit, etc), and it restored with issues: it rebooted with some slowdowns in the Winbox, I selected supout tool and started it and it freeze and logout me, so I decided to reboot again after the supout bug and restore-and-automatic-reboot, and it stopped accepting Winbox login, so I had to turn off by removing eletricity and putting again, it rebooted and worked, logins ok, everything from the backup ok and restored, supout file 300kB. So them I decided to make another supout as the first try could bugged with the force reboot, started, got 100%, file size (504kB). From before, I could not get ANY data other than the backup, not even any supout.

Follows the supouts, to bugfix, if possible:
https://drive.google.com/open?id=0BxFQ1 ... 3RZZ3hhaXM


**********Netinstall
For fresh format, using netinstall:
Server IP 192.168.88.3 and MAC 255.255.255.0 and Gateway 192.168.88.1
Netinstall Boot Server Client to 192.168.88.1
Press RESET while booting the MikroTik, until beeps or shows in netinstall or light change state.
That's all.
 
User avatar
pukkita
Trainer
Trainer
Posts: 3051
Joined: Wed Dec 04, 2013 11:09 am
Location: Spain

Re: RB951G-2HND Reboot issues and system corruption  [SOLVED]

Fri Oct 27, 2017 11:05 am

Looks like your NAND is gone, you'd better write support, short of netinstalling it again, resetting it to no defaults and reconfiguring it looking if it holds fine this time.

Better than using a .backup, you could make an export, so that you can just copy & paste the config on this, or any router (CLI commands). This allows you to copy & paste sections individually for better troubleshooting in the event of the setup failing.
 
BRMateus2
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 73
Joined: Thu Oct 26, 2017 11:18 pm

Re: RB951G-2HND Reboot issues and system corruption

Fri Oct 27, 2017 4:48 pm

Many thanks for your answer! I sent an email right now to support@mikrotik.com entitled "[Bug Ticket request] RB951G-2HND Reboot issues and system corruption".

I love that router, would like to fix if its possible. Looked right now at bad blocks count, its 0.3% after the netinstall (before it was 0.0%)..... pretty strange as its new; manufacture error?
Sector writes since reboot (12hs uptime): 7 577
Total sector writes: 157 583
Lets hope for the best.
 
User avatar
pukkita
Trainer
Trainer
Posts: 3051
Joined: Wed Dec 04, 2013 11:09 am
Location: Spain

Re: RB951G-2HND Reboot issues and system corruption

Fri Oct 27, 2017 7:02 pm

Lots of power off/power on and bad electricity supply can corrupt the NAND format or damage it, specially if you're writing constantly to it (do you have graphs active?).

If you're experiencing such electricity supply unstability, you'd better either get an UPS at least for the router... this router can be powered both via its DC jack, and PoE In, there are lots of small and affordable micro UPS.

As the router supports being powered from 9-30V, even a 9V cell could power it for some time so that you can shut it down, or at least "absorb" the micro blackouts.

For a hassle free, and possibly a wise investment in your situation, I'd get a mUPS.
 
BRMateus2
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 73
Joined: Thu Oct 26, 2017 11:18 pm

Re: RB951G-2HND Reboot issues and system corruption

Fri Oct 27, 2017 7:45 pm

Thank you for the suggestion! Yes I did use Graphing, today I disabled the disk-write and I'm graphing only to RAM;
It happens to bug electricity every one to three months, its not that common; as its summer here and raining period, MikroTik suffered only one electricity cut and one low-voltage electricity.

Is there a way to use an pendrive as system partition? The RB has USB port which might even work with external hard disk more reliable than its own NAND filtering circuitry.
 
User avatar
pukkita
Trainer
Trainer
Posts: 3051
Joined: Wed Dec 04, 2013 11:09 am
Location: Spain

Re: RB951G-2HND Reboot issues and system corruption

Sat Oct 28, 2017 12:50 pm

No, USB external disk cannot be used as system partition, and I'm afraid it cannot be used for graphs storage either.

External storage can be used for web proxy cache, samba sharing, etc.
 
BRMateus2
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 73
Joined: Thu Oct 26, 2017 11:18 pm

Re: RB951G-2HND Reboot issues and system corruption

Sat Oct 28, 2017 3:35 pm

So I disabled all writes and even the 24hs DHCP leases; I might had bad luck and System was stored in bad blocks, because today I tested twice, reboot and it booted ok, shutdown and it booted too.

Lessons learned: never write data in an interval less than 24hs to "any" NAND, or never write at all. It's an luck-game which you lose with bad NAND.

DHCP leases set off (it was 24hs)
Graphing set from 5 minutes (I thought that was the capture interval too) to 24hs and I disabled all graphing writes to disk; I hope the setbox works, because there is no "never" option in the write interval like DHCP;

I'm graphing to RAM I hope.

Nothing else was writing to disk. Did an test, sent an bogus file to NAND which filled it and got no increase in bad blocks (stayed 0.3%) and then I deleted that file.

If nothing from support conclusion helps this solid thread, It's solved then.

Edit---(11:50/30/10/2017 or 201710301150 ISO Time GMT -3) 201710301350 UTC
I am editing this, because adding a new post is not worthy (why bump this topic again?)
Support answered there were no logged crashes at supout.rif

So, as after disabling all possible writes to disk and after some reboots testing in random intervals,
Uptime 10:00hs
Sector Writes since Reboot: 277
Total Sector Writes: 223050 (remember that I sent an 110MB file just for testing, it got like 100000 writes, with no change in bad block count)

I did an reboot right now, as to test if it crashes, and it booted perfectly.
The reason of this topic and concerns where:
*Two boot-lockups happening for no apparent reason, both after reboots with nothing in logs warning anything and 0.0% bad blocks.
*Did not know what was causing that, could be anything as I was not experienced with such symptoms before and neither had powerful devices with flash inside, only Hard Disks.

Possible solution, I have tested many times after the original post:
*Disabled all writes to disk, even DHCP leases, graph to RAM or never graph.

Possible cause after answers:
*Bad NAND from factory OR bad luck with system writes just into undetected bad blocks, as after netinstall, bad blocks finally arised to 0.3% which is inside safe interval, but who knows the rest?

Who is online

Users browsing this forum: elhiilkpym, Google [Bot], Rudolph123123 and 69 guests