Community discussions

MikroTik App
 
User avatar
NathanA
Forum Veteran
Forum Veteran
Topic Author
Posts: 829
Joined: Tue Aug 03, 2004 9:01 am

Very strange (pre-v7) ECMP bug?

Mon Nov 07, 2022 12:36 pm

I can't believe that this has been around for as long as it has, and nobody ever noticed it? I tested from latest 6.49.x back to RouterOS 5.x, but it probably has existed for much much longer?

In short, if the route to reach a particular recursively-looked-up ECMP gateway disappears, and if the particular gateway in question is the LAST one that shows up in the gateway list for an ECMP route, all of the recursively-looked-up routes will go inactive.

To reproduce:
`
/interface bridge add name=bridge1
/interface bridge add name=bridge2
/interface bridge add name=bridge3

/ip address add address=192.168.1.1/24 interface=bridge1
/ip address add address=192.168.2.1/24 interface=bridge2
/ip address add address=192.168.3.1/24 interface=bridge3

/ip route add dst-address=192.168.0.1/32 gateway=192.168.1.2,192.168.2.2,192.168.3.2
/ip route add dst-address=192.168.100.1/32 gateway=192.168.0.1 target-scope=40
`
Now let's look at the routes, and see that everything is active and working:
`
[admin@MikroTik] > /ip route print detail 
Flags: X - disabled, A - active, D - dynamic, 
C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme, 
B - blackhole, U - unreachable, P - prohibit 
 0 A S  dst-address=192.168.0.1/32 gateway=192.168.1.2,192.168.2.2,192.168.3.2 
        gateway-status=192.168.1.2 reachable via  bridge1,192.168.2.2 reachable 
               via  bridge2,192.168.3.2 reachable via  bridge3 
        distance=1 scope=30 target-scope=10 

 1 ADC  dst-address=192.168.1.0/24 pref-src=192.168.1.1 gateway=bridge1 
        gateway-status=bridge1 reachable distance=0 scope=10 

 2 ADC  dst-address=192.168.2.0/24 pref-src=192.168.2.1 gateway=bridge2 
        gateway-status=bridge2 reachable distance=0 scope=10 

 3 ADC  dst-address=192.168.3.0/24 pref-src=192.168.3.1 gateway=bridge3 
        gateway-status=bridge3 reachable distance=0 scope=10 

 4 A S  dst-address=192.168.100.1/32 gateway=192.168.0.1 
        gateway-status=192.168.0.1 recursive via 192.168.1.2,192.168.2.2,
               192.168.3.2 bridge1,bridge2,bridge3 
        distance=1 scope=30 target-scope=40
`
Now let's try to disable the IP belonging to either bridge1 or bridge2, which will cause the connected routes for 192.168.1.0/24 and 192.168.2.0/24 to disappear. Or, heck: disable both of them. Everything will still be good, failing over exclusively through 192.168.3.1 via bridge3:
`
[admin@MikroTik] > /ip address disable [find interface~"bridge(1|2)"]
[admin@MikroTik] > /ip route print detail 
Flags: X - disabled, A - active, D - dynamic, 
C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme, 
B - blackhole, U - unreachable, P - prohibit 
 0 A S  dst-address=192.168.0.1/32 gateway=192.168.1.2,192.168.2.2,192.168.3.2 
        gateway-status=192.168.1.2 unreachable,192.168.2.2 unreachable,
               192.168.3.2 reachable via  bridge3 
        distance=1 scope=30 target-scope=10 

 1 ADC  dst-address=192.168.3.0/24 pref-src=192.168.3.1 gateway=bridge3 
        gateway-status=bridge3 reachable distance=0 scope=10 

 2 A S  dst-address=192.168.100.1/32 gateway=192.168.0.1 
        gateway-status=192.168.0.1 recursive via 192.168.3.2 bridge3 distance=1 
        scope=30 target-scope=40
`
But since 192.168.3.1 is the last ECMP gateway in the list for reaching 192.168.0.1, if we disable the IP address on bridge3 and ONLY that IP address, even though the IPs on bridge1 and bridge2 are active and the connected routes exist, the entry for 192.168.100.1/32 recursively via 192.168.0.1 goes inactive!:
`
[admin@MikroTik] > /ip address enable [find]
[admin@MikroTik] > /ip address disable [find interface=bridge3]
[admin@MikroTik] > /ip route print detail 
Flags: X - disabled, A - active, D - dynamic, 
C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme, 
B - blackhole, U - unreachable, P - prohibit 
 0 A S  dst-address=192.168.0.1/32 gateway=192.168.1.2,192.168.2.2,192.168.3.2 
        gateway-status=192.168.1.2 reachable via  bridge1,192.168.2.2 reachable 
               via  bridge2,192.168.3.2 unreachable 
        distance=1 scope=30 target-scope=10 

 1 ADC  dst-address=192.168.1.0/24 pref-src=192.168.1.1 gateway=bridge1 
        gateway-status=bridge1 reachable distance=0 scope=10 

 2 ADC  dst-address=192.168.2.0/24 pref-src=192.168.2.1 gateway=bridge2 
        gateway-status=bridge2 reachable distance=0 scope=10 

 3   S  dst-address=192.168.100.1/32 gateway=192.168.0.1 
        gateway-status=192.168.0.1 recursive via 192.168.1.2,192.168.2.2 bridge1,
               bridge2 
        distance=1 scope=30 target-scope=40
`
?!?!?! It's still reachable via 192.168.1.1 and 192.168.2.1, so...why?

The same thing of course also happens if the underlying interface that the ECMP gateway is reachable over gets disabled or goes inactive, since that also causes the connected route to disappear.

Interestingly, this only happens if the underlying connected route is gone for the last ECMP gateway in the list. If the last gateway is merely deemed unavailable for some other reason such as e.g. check-gateway, then everything is fine!
`
[admin@MikroTik] > /ip address enable [find]
[admin@MikroTik] > /ip route add dst-address=192.168.101.3 gateway=192.168.3.2 check-gateway=ping 
[admin@MikroTik] > /ip route print detail 
Flags: X - disabled, A - active, D - dynamic, 
C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme, 
B - blackhole, U - unreachable, P - prohibit 
 0 A S  dst-address=192.168.0.1/32 gateway=192.168.1.2,192.168.2.2,192.168.3.2 
        gateway-status=192.168.1.2 reachable via  bridge1,192.168.2.2 reachable 
               via  bridge2,192.168.3.2 unreachable 
        distance=1 scope=30 target-scope=10 

 1 ADC  dst-address=192.168.1.0/24 pref-src=192.168.1.1 gateway=bridge1 
        gateway-status=bridge1 reachable distance=0 scope=10 

 2 ADC  dst-address=192.168.2.0/24 pref-src=192.168.2.1 gateway=bridge2 
        gateway-status=bridge2 reachable distance=0 scope=10 

 3 ADC  dst-address=192.168.3.0/24 pref-src=192.168.3.1 gateway=bridge3 
        gateway-status=bridge3 reachable distance=0 scope=10 

 4 A S  dst-address=192.168.100.1/32 gateway=192.168.0.1 
        gateway-status=192.168.0.1 recursive via 192.168.1.2,192.168.2.2 bridge1,
               bridge2 
        distance=1 scope=30 target-scope=40 

 5   S  dst-address=192.168.101.3/32 gateway=192.168.3.2 
        gateway-status=192.168.3.2 unreachable check-gateway=ping distance=1 
        scope=30 target-scope=10
`
Fortunately, ECMP is handled completely differently in v7, so this bug appears to finally be gone, but...what the actual heck...

For the record, IPv6 ECMP in v6.x and earlier also has the exact same problem.
 
User avatar
mrz
MikroTik Support
MikroTik Support
Posts: 7038
Joined: Wed Feb 07, 2007 12:45 pm
Location: Latvia
Contact:

Re: Very strange (pre-v7) ECMP bug?

Mon Nov 07, 2022 12:47 pm

FYI IPv6 ECMP in ROSv6 does not exist. RouterOS was able to handle ECMP Ipv6 routes, but actual forwarding can happen only over one gateway.
 
User avatar
NathanA
Forum Veteran
Forum Veteran
Topic Author
Posts: 829
Joined: Tue Aug 03, 2004 9:01 am

Re: Very strange (pre-v7) ECMP bug?

Mon Nov 07, 2022 1:31 pm

FYI IPv6 ECMP in ROSv6 does not exist. RouterOS was able to handle ECMP Ipv6 routes, but actual forwarding can happen only over one gateway.
`
You mean "packets with the same source address, destination address, source interface, routing mark and ToS are sent to the same gateway" (https://wiki.mikrotik.com/wiki/Manual:I ... MP)_routes), due to route cache. But it is still possible for a packet to be forwarded to a different "ECMP" gateway in the list of gateways for that covering prefix if any one of those are different, yes?

So like LACP, the "hashing" does not create perfect load-balancing. But that still does not explain this bug...

EDIT: I just re-read your post and realizing you were talking only about IPv6; my mistake. My original post gave IPv4 examples for the bug in question though.

Who is online

Users browsing this forum: Renfrew and 62 guests