we are having a setup (as attached). At 2 locations we are having a cisco router & are having BGP peering between both, via the Mikrotik CRS317-1G-16S+RM devices at both ends.
The WAN link is a 10G link whereas the two links between the two miktotik’s are of 1G capacity each.
Both the mikrotik are running on routeros-arm-6.43.8 version. Port configured are as under:
SFP1 - WAN link going to Cisco at both ends.
SFP2 - 1GB link
SFP3 - 1GB link
bonding1 is configured having slaves sfp2 + sfp3 ports at both ends.
bridge1 is configured having ports bonding1 + sfp1
bonding mode = balance-rr
Transmit hash = Layer2 and 3
PROBLEM DESCRIPTION:::
1> when both the links are running simultaneously, traffic is balanced over the two links (sfp2,sfp3) but lot of packet drops are observed for the internet traffic & even the latency goes high.
Secondly the CPU of both the Mikrotik devices goes above 80-85% at all the times.
2> when we change the bonding mode to 802.3ad & hash policy as Layer2, All the traffic goes over one link only. No balancing at all.
3> When we configure the bonding mode as 802.3ad & hash policy as Layer2, keeping one of the two links (sfp2 or sfp3) disabled, all the traffic goes over one link, with no packet drops or latency & even the CPU process remains constant at minimal 1%.
Any suggestions on how to utilize the Mikrotik devices at both ends, so that all the traffic is balanced over both the links, & CPU process also doesn’t go high with no latency or packet drops??
any suggestions would be helpful.
Hello,
can anybody shed some light on the above mentioned issue. The world is slowly moving from other high end switches & routers to Mikrotik but for us if this issue is not resolved, the mikrotik’s image will be at stake. We are seriously thinking of migrating most of our costlier devices like cisco to Mikrotik but looking at the multiple issues which are not getting resolved, we are not able to take a step forward. Kindly help us please.
What is happening
Your router are doing bonding and bridging by software, thats the reason your CPU goes so high.
As your traffic is going from only one point to other, the MAC-addresses of all traffic will be the same so if you use layer2 hash, only one path will be choosen.
Improving a little
Better you choose layer2+layer3 as hashing method so it will be more balanced if traffic has multiples IP addresses as source and destination.
Best thing to do
Consider using a CRS3xx. It’s bonding is hardware offloaded. Also when you create a bridge with sfp+1 and bonding1 it will be hardware offloaded.
If you choose a CRS3xx, The built-in switch chip will always use Layer2+Layer3+Layer4 for transmit hash policy, changing the transmit hash policy manually will have no effect.
Thank you for your response & valuable time Mr.Jprietove…
Since in the beginning i was not aware of the fact that the “balance-rr” bonding mode doesnt support Hardware Offloading, had tried it & as a result the CPU was going high.
Later when i did some more research on the Mikrotik CRS3xx device, i came to know that only two of the bonding modes (balance-xor & 802.3ad) support Hardware Offloading feature after RouterOS v6.42, i tried a different mode “balance-xor” with hash policy selected as Layer-3-and-4 & everything went well for around 30 minutes.
For those 30 minutes, all the traffic of around 1.3gbps was balanced over both the links, with no packet drops or Latency & even the CPU usage was at a mimimal 1%. I do not know what happened after 30 minutes that both of my Mikrotik CRS317’s got isolated & i was unable to ping/access them. All the traffic dropped from 1.3gbps to around 120Mbps that too on just a single link. I rebooted the Mikrotik two times, still was not able to connect to the device. At Last i had to factory reset the configuration after which in 10-15min i was able to connect it through winbox & reconfigure it with 802.3ad mode on just one single link.
Can you please throw some light on what might have happened in those 30 minutes & later..? what did i miss or where did i go wrong in the config..?
Awaiting for your valuable suggestions.
I’ve been using CCR1016 with bonding in balance_rr with 1.7 Gbps traffic for more than one year, software based (not hardware) and CPU hardly goes more than 5-6%.
It would be usefull to know if you are using RouterOS or SwitchOS, which RouterOS/SwitchOS version are you using, an export of your config. Maybe your CRS were under attack… But with more info it is hard to know.
Only 802.3ad and balance-xor modes are switch chip accelerated. When you select balance-rr you are hitting CPU performance limit.
And 802.3ad is not balancing between multiple links, because most likely you have only one stream running.
Yes, this came to my understanding after i did some research, that only 802.3ad & balance-xor modes are HW Offloaded.
But what happened when i used balance-xor mode with Layer-3-and-4 hash policy, the traffic got balanced over both the Links for around 30 minutes, with no packets loss or latency & CPU remaining at mimimal 1-2% only & after around 30 minutes both of my Mikrotik CRS317 devices got isolated with no access to them.
I am still wondering what might have gone wrong with the new bonding config. It is supposed to work on balance-xor right?
should i give it a try with 802.3ad mode with Layer-2-and-3 hash? i am worried & afraid to do any changes since both the devices are in production environment & almost 1gbps traffic of my customers which is currently running through it will be affected if something goes wrong or not as expected. kindly suggest!
Leaving aside balance-xor, will i be able to Load balance traffic on both the links using “802.3ad” & “Layer-2-and-3” with low latency & no packet drops…???
Also I hope the CPU wont go high as 80%.
I tried with 802.3ad bonding mode & with Layer-2-and-3 hashing, traffic was still going on one link only. When i disconnected the interface on which all the traffic was flowing, the traffic did not shift to the other link. However i was getting some warning message logs in the other end Mikrotik device which said bonding 1: bridge port received packet with own address as source address (cc:2d:e0:a3:99:a1). Probably loop.
we are still unable to utilize one of the links & load sharing, & still running the traffic on only one link… Kindly help please!!
I want to quick fix my traffic is increasing over 1g, and its mostly incoming, between these routers what is best to use? balance rr/ the routers are 1100x4 and CCR and CCR and CCR1036. I tried balance rr and there were some jitters. not sure why could be cause of PPPoE. or bonding.
Hello! Could you please tell me if you resolved your issue ?
Here we have CRS317-1G-16S+RM and I’m bonding a CCR1036 with two SFP+
Our CCR1036 is getting 10Gbps+ aggregate traffic, fastpath activated.
Since we created the bonding interface with balance-rr, layer 2-3, CCR cpu elevated from ~22% to ~40%
I think this is a lot to consume just with the bonding interface
What should I do ?
CRS317 is using SwOS 2.10
LAG is not created at the CRS side… LAG interfaces is in passive mode and with por isolation marked with those two ports used by the LAG
But traffic is working well, no packet loss and very well balanced.
My concern is about CPU usage at the CCR side
When using 802.3ad:
The ARP link monitoring is not recommended, because the ARP replies might arrive only on one slave port due to transmit hash policy on the LACP peer device. This can result in unbalanced transmitted traffic, so MII link monitoring is the recommended option.
Hi, I am trying to make 100% bonding between two ccr work but when I try it, the bandwidth drops drastically, I will be making a wrong configuration between the two CCRs
Can you advise as to any issues when you use round-robin load balancing of packets? Did you use SFP ports model? I want to use CCR2004 which has SFP slots only and I have seen that bonding with failover and failback does not work with anything unless I use broadcast method as link monitoring fails on the SFP ports which require auto negotiation to be turned off. I tried both ARP and MII.