Hi all,
I have been struggling with an issue that one of my company’s CCR1009-7G-1C-1S+ has for quite a while now.
Every few days at random the network disconnects for a few seconds, and the CCR shows a “ether0: bridge port received packet with own address as source address (b8:69:f4:e0:77:7a), probably loop” entry in the log. The mentioned MAC is the MAC of the bridge port connected to that location’s LAN segment. Now this sounds pretty straight forward, so at first I went to the physical location of the router and switches connected to it to make sure there are no physical loops. Upon not being able to detect a physical loop, I searched the forums for related posts by other users.
Solutions mentioned:
- Make sure the port is running on RSTP (it already was)
- Disable neighbor discovery (it is now only detecting our other CCRs in other locations connected through L2TP)
- Update rOS & firmware (it’s running on 6.49.7 stable)
- Check the ARP table for a duplicate MAC
- Check the DHCP lease list for a duplicate MAC
- MAC ping the loop address (times out)
In an attempt to capture what happens right before the loop, I configured a mangle rule to sniff TZSP all network traffic to a laptop connected to the LAN and have it run Wireshark.
Now what seems to happen before/during the disconnect is that the network gets flooded by a few million ARP requests over the span of a few seconds. Is it possible for ARP requests to flood a network so badly it disconnects for a few seconds? In the screenshot the 192.168.51.246 address belongs to a HIKVISION camera. The peculiar thing however, is that during each flood the device sending the requests changes. In a different Wireshark output, the device making the requests was a RBwAPR-2nD&R11e-LTE. After removing the IP requested during the flood from the table of the LTE, a different device was sending the requests consecutively.
I know that posting the export could help immensely, but since it is a company router I would like to avoid that if possible. Even if using the hide-sensitive parameter.
If there is any other info I can provide, I would gladly do so, even if this means making several screenshots.
Are there any other steps I can take to get to the bottom of what is causing this?
