We recently moved to utilizing Mikrotik’s RB5009’s in alot of locations. Both indoor (RB5009UG+S+), and outdoor (RB5009UPr+S+) units. Some do POE, some do not. The network is pretty straightforward. Just a WAN in, and then CGNAT to a bridge for LAN out. We have had non stop issues with random ones having a port lock up on them. By lock up, I mean LAN is there, shows full gig, shows good connectivity, but will not pass traffic. Rebooting the 5009 will restore, but also just moving the problematic cable to another port will work too. I currently have on that I was able to have a technician at, and move the cable from port 8, to port 6. Traffic started working again. Port 8 does show LAN disconnected on the interface, but looking at the detailed status of it, it is still registering 1gbps LAN, but will not pass traffic to anything (so therefore, completely locked). Log files show nothing (I have enabled extra logging on some devices but haven’t caught any relevant logs yet). Most are in the default firmware of 7.8, but I have upgraded others to I believe 7.12, or 7.14 and the issue has remained. I have attempted disabling port, re-enabling. New cables, disabling and re-enabling the bridge, removing that port completely from bridge, and re-adding. For POE out devices experiencing the issue, I have checked the voltage, changed cabling on them, killed POE etc. It seems no amount of change except a reboot will fix this. I have this issue consistently on 6 devices, and have approximately 40 of them deployed in the field. Any insight, or someone who has experienced a similar issue would be great. I have not been able to find anything similar in the forums.
Can you share the config, just to rule out anything on that part?
/export file=anynameyoulike
Remove serial and any other private info and post in between code tags by using the </> button.
Does this work for you?
Sept2024.rsc (13.1 KB)
The things I noticed is that you have two bridges, please remove bridge1-Public, it serves no purpose.
Are you sure you want to have all ports on the same bridge?
Could there be a loop in the network? Could you provide a network diagram?
My first guess would be introducing spanning tree protocol:
https://help.mikrotik.com/docs/display/ROS/Spanning+Tree+Protocol
And check the logging.
Last question: will it remain working after connecting to a different port, or will the problem occur again after some time?
For right now, yes all ports will be in a bridge (we are working through changing networks to ospf, routed etc). I did try loop detect throughout the network and did find nothing, and if there was one, I do notice it very easily. This isn’t behavior of a loop. This example I showed is the more complex config compared to others that experience the same issue. Others that experience the same issue are simply a powerbox, set as a direct bridge, no routing etc. The network diagrams themselves are pretty basic at the moment. we have WAN in to a Cisco ASR → RB5009 → wireless out to a series of towers. I will note, this is the core RB5009 at this site, but our more common failure location is in a chain of towers about 5 hops away from the core, and it happens randomly. I will add though, this does only happen on backhaul feeds. I’ll look in to STP for the mikrotiks. I know we are running that on our ASR, and then the Mikrotik is just using the default RSTP.
To answer your last question, I am unsure if it will carry over to another port. This is the first instance where I have had a tech 10 minutes away, rather than 12 hours, and could move to another port. So, time will tell on that one.