I submitted this issue to support@mikrotik.com i think 3 times now, and have never heard back from them on the issue.
After running a firmware update, in fact any firmware past 6.42.x causes multiple devices to lock up, and causes the DHCP server to not start, the CPU is at 100% and barely passes any traffic. The logs show unable to start a program. Note: only 4 out of hundreds so far, but it’s still devices that should be working well.
The issue at this point is that Sonar cannot connect to these devices because they’re a lower firmware version not using the new authentication methods. Which i’m assuming is the root of this problem. I wonder if mikrotik can help us fix these or get them RMAd.
Have you already tried to use netinstall to format and reinstall them to the current version with blank configuration, then re-configure them to your needs?
Don’t use backup/restore. At most do a /export first and then use that to re-configure them manually or by pasting sections of the /export that you first have examined for any strange things.
When there is old firmware on it and it misbehaves, it most likely has been hacked and there now is malware running on it.
Yes, just tried this now. Issue seems to persist.
These units were definitely not hacked. No possible way for that to occur. Access lists etc. and access monitoring logs show no unauthorized accesses.
make sure there’s nothing attached to the serial console if the device has one.
this was a very hard issue to track down: we had devices with long serial cables connected to the router and the other end of the cable (~10m long) wasn’t connected to anything. it picked up some EM noise and constantly trickled the serial port thus the serial console.
serial console is an interrupted device in RouterOS, so it will bully the cpu all the time. do a /sys resource irq print to see what you have there:
[admin@hgw] > /sys resource irq print
Flags: ro - read-only
# IRQ USERS CPU ACTIVE-CPU COUNT
0 3 usb1 auto 0 65
1 4 switch0 auto 0 3 179 276 774
2 16 beeper auto 0 797
3 ro 19 serial auto 18
4 112 ts auto 0 0
the values next to serial should be low. if it is high and increasing rapidly, you have the same issue as we had.
then you just need to disconnect the cable…
From the days we worked with Unix systems and serial terminals I remember well that a long RS232 cable can sometimes “echo” everything.
There is capacitance between the RX and TX lead and this is enough to couple everything sent to the port back to the RX, especially at high baudrates.
When this occurs, it can happen that an intiial banner is received back and treated as a response to the login prompt, this results in an error message, and that again is used as the next login.
In such cases you should see log messages about invalid logins.