Hi all,
we have a problem in one of our networks at the moment. We use a CCR1036 (6.43.1) as pppoe-server which terminates ~1000 sessions. Behind the router we have several CRS switches, CCRs (mainly in bridged mode, no routing), DSLAMs, and sadly also ubnt radio networks. To seperate customers traffic from mgmt traffic we use some vlans.
Now to the problems:
A few days ago we plugged one of the DSLAMs off its uplink switch and plugged it into a new uplink switch. So we expected that our customers behind that DSLAM would have a service interruption of <1min. Of course the ~250 pppoe session of the affected customers got lost at the CCR1036 for that minute. So far as expected… But the CCR started to kick ALL pppoe sessions! Over a period of a few minutes all sessions were gone! After a reboot things started to work fine again. So the result was not a service interruption of <1min for 250 customers but some minutes for all behind that CCR because of the reboot. In the logfile you see hundreds entries like “PPPoE connection established from [caller MAC]” but no “[user] logged in” entries.
A few days later a branche of the ubnt network went down. => a bit less than 100 pppoe sessions got lost. => Same behavior at the CCR; all sessions kicked until reboot. We did not have that issue in earlier days but I can’t tell when exactly this behavior started because we try to avoid loosing a bunch of sessions at a blow
Has anyone experienced the same behavior of pppoe server?
Now to the main problem:
A lot of MT devices behind the CCR (but also the CCR itself) write to the log:
[interface]: bridge port received packet with own address as source address, probably loop
Those messages occure several times a day at the CCR (but sometimes just one times…) and sporadic at I think all MT devices (but also non-MT devices). I guess it depends on the traffic amount through the respective bridges. In most cases the bridges contain vlan interfaces. We observe the loop messages at all different vlan bridges. (normaly a device has at least a bridge for mgmt which contains mgmt vlan interfaces and a bridge for customer traffic which contains customer access vlan interfaces. The pppoe server runs on the customer access bridge). Since we observe those messages at all points in the network its hard to find the error origin. Is it a real loop or could it be another error? We check device for device if there’s an error in the bridge configuration which causes the loop but until now we didn’t find anything… The log messages tell “probably” loop; what else could lead to the situation that bridges at any point in the network complain about receiving frames with own scr.MAC?
In the context of my research for trouble shooting I got confused by different statements concerning the MAC adresses of bridges and their slave interfaces: In RouterOS, when I create a bridge and put interfaces in it, the bridge gets the same MAC address as the first physical interface to come up. But I have read that bridges should have a different MAC address as their slave devices. In one case someone had also loop error messages until he changed the mac of the bridge. But I can’t imagine that the default behavior of RouterOS brdige implemantation leads to faulty configurations?! ![]()
Also can it be a problem if various bridges on one device have the same MAC? In my understanding it shoud not…
Conclusion: I need help please how to handle the loop errors and: Could the strange behavior of the pppoe-server be a symptom of the loop problem? Thank you in advance for any help and hints!!
Michael