Hi,
I’m fairly new to ROS so I’m guessing there is a gap in my understanding, and would appreciate someone explaining to me where I’m going wrong. I have a CCR2004-16G-2S+.
I have a BRIDGE with a BOND (802.3ad) as a member port, that in turn has two ETH interfaces as slaves, feeding upstream to a Unifi USW-8. All settings mostly as default (inc. ARP & HW Offload). When the LAG came up I was seeing insane amount of packet loss. Monitoring the BOND I would see flags (C) and (D) flapping. Disabling one of the slave ETH’s alleviated the packet loss.
BRIDGE <--> BOND <--> ETH01 <--> USW-8
<--> ETH02 <-->
I first built a basic BOND with two ETHs, with all default settings (ARP Enabled), and everything worked fine. No packet loss. Then I created the BRIDGE and added the BOND to it, swapped the IP address from the BOND to the BRIDGE, and the packet loss returned.
When I disabled HW Offload for the bridge port (BOND), everything returned to normal.
Would someone please explain to me what’s happening in the switch chip with the config I had, and why disabling HW Offload fixes the problem? Also is there a way I can correctly re-configure the stack to take advantage of HW Offload?
I guess I have the same problem. After bonding is established, when I add it to the bridge, rx is fine, but tx is causing problems. traffic is saturated. for example;
eth0=100mbps
eth1=30mbps
working as Since eth0 works at 100mbps on two ethers, it affects the total traffic when it reaches the maximum traffic.
CCR2004-16G-2S+ with one bridge and two bond as two member ports (sfp+1&2, eth1&2).
When hw offload in enabled on the bond made by eth1+eth2 I notice massive packet loss from both bond.
I started to notice today without any conscoius change in the configuration regarding bonds, vlan, and so on.
first built a basic BOND with two ETHs, with all default settings (ARP Enabled), and everything worked fine. No packet loss. Then I created the BRIDGE and added the BOND to it, swapped the IP address from the BOND to the BRIDGE, and the packet loss returned.
could the problem be in the underlying bridge stp operation? since stp operation is the opposite from bond/lacp operation?
guess I have the same problem. After bonding is established, when I add it to the bridge, rx is fine, but tx is causing problems. traffic is saturated. for example;
eth0=100mbps
eth1=30mbps
hmm… what kind of bond mode did you create?
CCR2004-16G-2S+ with one bridge and two bond as two member ports (sfp+1&2, eth1&2).
When hw offload in enabled on the bond made by eth1+eth2 I notice massive packet loss from both bond.
i don’t have ccr. from the datasheet - what is the maximum throughput?
maybe the cpu got overrun by the bond logic and the amount of traffic passing through?
— edit
have you tried to make 2 bonds with the same port speed? ie 2x 2 100mbps ports? or 1x 4 100mbps ports
Have not seen your config, but do you have disabled RSTP on the CCR2004-16G-2S+ bridge? Can you test if packet loss disappears when RSTP is enabled?
Edit:
LACPDUs use MAC destination address 01:80:C2:00:00:02, which are non-forwardable by STP-compatible bridges. My guess is that “protocol-mode=none” together with HW offloading on 88E6393X, 88E6191X, 88E6190 switch chips does not correctly receive them and LACP negotiation fails. With default “protocol-mode=rstp” these non-forwardable packets are always redirected to the CPU.
i think these OP’s @unlikely and @iamgavinj maybe correct - it seems adding bond interface to a bridge causing some packet lost. but, i am not really sure about their config/performance lost
@wiseroute thanks, but it appears we might be discussing a separate topic here. Changing the “hw=yes/no” setting on CHR bridges won’t have any impact. The slight variation in kilobits per second and the occurrence of tx queue drops are likely a result of reaching the maximum limit of the 1Mbps interface.
agreed. vm lab doesnt represent true hardware performance.
but - if you take a closer on those tx-rx error/drops between the bridged bonded interface vs plain bonded one, i think the error/drop would be quite significant in heavy traffic load. again, i just doing a balance-rr, i dont know about other variation results.
It seems that I am also affected by this problem, with an 802.3ad bonding interface on the CCR2004-16G-2S+ router.
I would not say there is a “massive” packet loss, but there appears to be packet loss for sure.
It looks like sporadically there is something that delays the traffic by some 500-700ms, e.g. visible when I ping a fixed address.
Probably when this happens and there is enough load, the result is packet loss due to buffer overflow?
I observe when working through the day on a VNC desktop connected via the router, that sometimes the session stalls for a brief moment then resumes.
I also got some complaints about “dropped calls” by users of WiFi calling (indoor the 4G coverage is poor so users of mobile phones use that).
It looks like the traffic sometimes quenches for a short period.
I have enabled RSTP on the bridge, no difference. But before that, the 802.11ad status alread was showing OK on both sides (router and Aruba switch).
Any progress on this matter, @EdPa?
I have for now removed the bridge, so the bonding interface is used directly (with VLAN interfaces on top). That also removes the hw accelleration.
As the office is now closed it is more difficult to validate that it is resolved.