strange bonding latency problem

Hi tik gurus! How’re you guys?!

A customer of mine got a very strange problem in a particular installation that is formed by three independent separated point to point NV2 links (build with WDS) bounded together by other twos routerboards that do the layer 2 bonding (round robin bonding) of the three physical ethernet interfaces that connect to the APs at every point. After this, the logical interface “bonding1” is also inserted in the list of the ports of the bridge at every point. I mean all the equipments of this “diagram” work at layer 2 and 1, there’s no ip routing.

The global real layer 4 throughput is very satisfactory, we can reach without any problem 500mbps TCP.

The problem is that ONLY when there’s no heavy activity on the link latency and jitter are a little bit high (not very high, pings from the to directly connected test PC are like 20 or 30 ms changing its values very frequently). When we’ve build the ptp we’ve tested that latency was the same on all of them and we choose very separated frequency for every ptp and we install every couple of antenna in a reasonable distance from the other couple.

But when the customer do some heavy traffic load on the network, like migrate virtual machine generating ~100mbps, latency and jitter simply go VERY WELL (3 o 4 ms).

http://youtu.be/5V4v4V_XlG0

In this video you can see the demonstration of this absurd problem. Anyone have got a clear explanation of this behaviour?!

ping

That behavior doesn’t sound absurd. It sounds like the algorithm in the bonding code is waiting for X amount of data or a timeout before sending. A single ICMP is not enough data to trigger an immediate send. When there is a high volume of traffic on the link, the ICMP traffic is carried along more quickly since there is other traffic filling the buffers and the bonding algorithm is never waiting for a timeout to send. Try adjusting the size of your ICMP and watch the affects.

I slso se this sometimes if nv2 have almoust no traffic. CCQ drop to 2-3 and ping get higer. U se the same behavior after reconnecting the link. (really low ccq untill traffic is present over the interface)

This is not “scientificly” tested, but just an imput to “test” (check CCQ)

@JJCinAZ your theory sounds good, but why this behaviour doesn’t appear when i do classic 802.3ad bonding within mikrotik board a cisco switches? The real story is that here in this bonding i use the roundrobin version and i’ll be the cause. But i don’t think so.

@samsung172 In fact all the ptp wireless links are done with NV2, and your theory seems to be a little bit more possible. Anyone know why NV2 have got this behaviour? Imagine if in a ptp wireless nv2 link pass a unique SIP session… with this jitter quality will be horrible… isn’it?

Any suggestions from tik staff?

Best Regards,

Well same theory but the buffering may be occurring in the NV2 code where it does packet aggregation. Im not in front of a router now but you could try disabling that for a test.