I'm sorry for the length of this post but details are important in this question. What I am about to describe is very strange but I have checked it repeatedly:
I have a link: RB493G===RB433AH<-----/------>RB433AH===RB493G. The 493s are router PPPoE concentrators and the 433s are mounted with XR-5 radios from Ubiquiti.
Everything running beautifully for more than a year with upgrades through 4.xx.
A month ago I pushed out RouterOS 5.5 to all of them. The link went down. Had to drive to the sites and power cycle the units to get the links to come back up.
A week later the link drops and I drive to the sites and *all four* RBs had done what appeared to be a shutdown at the *same* time. Now then I log to a syslog server. They were not shutdown because I get a different signature in my logs when I shut one down. They just stopped talking on all their ports and the lights were OFF. Power cycle and they come back up.
Of course that is weird but it gets even more interesting... So for the last week they have been turning themselves off (well they have the same physical appearance as one that has been shutdown). Change all the passwords because someone is messing with me? Not. Changed the passwords. It is not a compromised password because it continues.
Couple days ago I downloaded Rev 4.17 to all four. All goes well for a day and then the link is down again and the units have the appearance of being shutdown. Today I go out there and reset the configuration of all of them and reconfigure with a backup.
And I discover that they are shutting themselves down, not rebooting, shutting themselves down. I thought that clearing the config and reloading it would probably fix whatever happened during the upgrade process. It didn't. I have watched one of them shut it self down repeatedly when it was not connected to the network. I have watched them do a shutdown after a reset configuration. I have watched them shutdown when I told them to reboot from Winbox and I have watched them simply shutdown for no apparent reason at all. The only way I could duplicate the problem would be to download the RouterOS sources and fix them to cause this problem and then install them or perhaps to discover a flaw in the IP stack that would let me craft a packet that would cause a shutdown. The Linux IP stack is pretty mature and I'd be very surprised to hear that someone has discovered such a flaw but even that would not explain the one that shuts itself down even when off the network.
Clearly we are going to replace them but I need to know if anyone else has seen anything like this happening with these particular units? Is there possibly some magic combination of firmware upgrades that might lead to anything like this?
Thanks in advance, and I wish I could get to another of the training sessions.
Mike Erskine
(You remember me I'm CALEA guy that stuck antennas in a foam cup when you were testing 802.11n cards )