Possible hardware issue/loop CCR1009-7G-1C-1S+

Hi all,

I have been struggling with an issue that one of my company’s CCR1009-7G-1C-1S+ has for quite a while now.
Every few days at random the network disconnects for a few seconds, and the CCR shows a “ether0: bridge port received packet with own address as source address (b8:69:f4:e0:77:7a), probably loop” entry in the log. The mentioned MAC is the MAC of the bridge port connected to that location’s LAN segment. Now this sounds pretty straight forward, so at first I went to the physical location of the router and switches connected to it to make sure there are no physical loops. Upon not being able to detect a physical loop, I searched the forums for related posts by other users.

Solutions mentioned:

  • Make sure the port is running on RSTP (it already was)
  • Disable neighbor discovery (it is now only detecting our other CCRs in other locations connected through L2TP)
  • Update rOS & firmware (it’s running on 6.49.7 stable)
  • Check the ARP table for a duplicate MAC
  • Check the DHCP lease list for a duplicate MAC
  • MAC ping the loop address (times out)

In an attempt to capture what happens right before the loop, I configured a mangle rule to sniff TZSP all network traffic to a laptop connected to the LAN and have it run Wireshark.
Now what seems to happen before/during the disconnect is that the network gets flooded by a few million ARP requests over the span of a few seconds. Is it possible for ARP requests to flood a network so badly it disconnects for a few seconds? In the screenshot the 192.168.51.246 address belongs to a HIKVISION camera. The peculiar thing however, is that during each flood the device sending the requests changes. In a different Wireshark output, the device making the requests was a RBwAPR-2nD&R11e-LTE. After removing the IP requested during the flood from the table of the LTE, a different device was sending the requests consecutively.

I know that posting the export could help immensely, but since it is a company router I would like to avoid that if possible. Even if using the hide-sensitive parameter.
If there is any other info I can provide, I would gladly do so, even if this means making several screenshots.

Are there any other steps I can take to get to the bottom of what is causing this?
Screenshot 2023-12-14 110023.png

I will share information from my experience. Maybe something from everything described will help you.
I have had similar situations and facilities have also had mikrotik equipment.
The most common errors were:

  1. created too many public NAT connections to video cameras that did not have enough secure access
  2. problem with firewall rolls. The default rolls have been deleted and replaced with incomprehensible entries
  3. hacked IP cameras that did not have secure access, resulting in flood and traffic congestion in the internal network. It is recommended to use a vpn connection instead of NAT directly to the cameras.
  4. a switch that had a defect, but it still seemed to work. There were also inexplicable network outages
  5. a problem with the network cable, which caused incomprehensible interruptions from time to time. The mains cable was pulled in for a temporary repair and was not fixed. A person had pushed over an office chair on wheels while driving, and this defect was hard to see because it was on the very edge of the table, where there was a paper basket next to it. This pinched wire caused real problems in the network.
  6. A UPS unit that had a problem with its internal circuitry and was restarting something in itself every now and then. 2 switches were connected to this UPS… This of course affected the entire LAN.

It’s a broadcast storm it’s either malicious or you have a network clash.

The first obvious question is you have VLANs why are the cameras on the same VLAN as everything else???
Much easier to diagnose and put queue rate limits on stuff if it isn’t all in the same VLAN.

The tick will allow access between the VLANS but you get to put firewall rules on traffic between the VLANS in your case flood protection might be useful :slight_smile: