Community discussions

MikroTik App
 
User avatar
ibutton77
just joined
Topic Author
Posts: 7
Joined: Sat Sep 20, 2008 1:31 am
Location: Bend, OR, USA
Contact:

OSPF ECMP changes from ROS6 -> ROS7?

Wed May 10, 2023 9:24 pm

I've had a network that uses ECMP to load balance traffic along multiple OSPF paths where the cumulative cost adds up to the same value.
As a simplified example, if we have the following 3 sites with MT routers all running ROS6, then traffic from A->C and C->A get load balanced as we desire over both paths:
Where A<->C is set up with a cost of 20
A<->B cost = 10
B<->C cost = 10
A-----C
 \   /
  \ /
   B
But as of upgrading A (the head end in this example) to ROS7, A->C traffic instead refuses to ECMP by somehow adding a "tie-breaking" step of "number of hops".
In this example, all A->C traffic takes the A->C route.
If A<->C cost increases to 21, then all A->C traffic flows through B.
If A<->C cost is kept at 20, and either A<->B OR B<->C cost is dropped by one to 9, then again all A->C traffic flows through B.

We have so far worked around this by creating a new VLAN through the A->B->C path so that that path *looks* like a single hop, and then ECMP begins to load balance again.
But we have a large network with dozens of site/routers and this quickly becomes a maintenance nightmare.

So what has changed in ROS6 -> ROS7 that's interfered with ECMP working in a rational way?
 
SignumFera
just joined
Posts: 10
Joined: Tue May 23, 2023 2:41 pm

Re: OSPF ECMP changes from ROS6 -> ROS7?

Mon May 29, 2023 9:47 am

removed unnecessary post
Hi

This bit me a while back. It seems like MT updated the default redistribute metric in ROS7.

Consider the below LSA's on the receiving end on an ECMP setup. Note that the LSA id 4 is from a ROS6 device and id 5 from a ROS7 device.
[admin@MT7-1] > routing/ospf/lsa/print where id=100.68.0.0
Flags: S - self-originated, F - flushing, W - wraparound; D - dynamic 
 4  D instance=default type="external" originator=100.66.0.2 id=100.68.0.0 
      sequence=0x80000003 age=190 checksum=0x32B9 body=
        options=E
        netmask=255.255.255.0
        forwarding-address=0.0.0.0
        [b]metric=20[/b] type-1
        route-tag=0

 5  D instance=default type="external" originator=100.67.2.5 id=100.68.0.0 
      sequence=0x80000001 age=627 checksum=0x4FAB body=
        options=E
        netmask=255.255.255.0
        forwarding-address=0.0.0.0
        [b]metric=1[/b] type-1
        route-tag=0
[admin@MT7-1] > /ip/route/print where dst-address=100.68.0.0/24
Flags: D - DYNAMIC; A - ACTIVE; o, y - COPY
Columns: DST-ADDRESS, GATEWAY, DISTANCE
    DST-ADDRESS    GATEWAY            DISTANCE
DAo 100.68.0.0/24  100.67.1.3%ether1       110     
Dropping the redistribute metric on ROS6 side (because it is easier than going through a filter on ROS7 to get the defaults to sane values) to match ROS7 values, results in ECMP working again
[admin@MT6-1] > /routing ospf instance set default [b]metric-bgp=1[/b]
 ...
[admin@MT7-1] > routing/ospf/lsa/print where id=100.68.0.0
Flags: S - self-originated, F - flushing, W - wraparound; D - dynamic 
 4  D instance=default type="external" originator=100.66.0.2 id=100.68.0.0 
      sequence=0x80000004 age=19 checksum=0x718C body=
        options=E
        netmask=255.255.255.0
        forwarding-address=0.0.0.0
        [b]metric=1[/b] type-1
        route-tag=0

 5  D instance=default type="external" originator=100.67.2.5 id=100.68.0.0 
      sequence=0x80000001 age=929 checksum=0x4FAB body=
        options=E
        netmask=255.255.255.0
        forwarding-address=0.0.0.0
        [b]metric=1[/b] type-1
        route-tag=0
[admin@MT7-1] > /ip/route/print where dst-address=100.68.0.0/24
Flags: D - DYNAMIC; A - ACTIVE; o, y - COPY; + - ECMP
Columns: DST-ADDRESS, GATEWAY, DISTANCE
     DST-ADDRESS    GATEWAY            DISTANCE
DAo+ 100.68.0.0/24  100.67.1.3%ether1       110
DAo+ 100.68.0.0/24  100.67.1.2%ether1       110
Hope this help you.

Kind regards
Last edited by BartoszP on Mon May 29, 2023 11:16 am, edited 1 time in total.
Reason: removed excessive quotting of preceding post; be wise, quote smart, save network traffic
 
wiseroute
Member
Member
Posts: 352
Joined: Sun Feb 05, 2023 11:06 am

Re: OSPF ECMP changes from ROS6 -> ROS7?

Tue May 30, 2023 5:22 am

hello ibutton77

from your diagram, i don't see anything for the ospf to have ecmp? am I missing something? πŸ€”

let us say, all links are ether with default cost of 10.

a - c = 10
a - b = 10
a - b - c = 20

i think no matter how you reduce the a - b or b - c, it won't make it apple to apple. and that would be misleading.

that won't be a problem for ospf to have ecmp - if your layout was a - b - c - d. with all 10.

maybe, that is maybe - two bridged tunnels on b, from ab and bc - could give you ac ecmp. i think that is what you did, hence you were introduced to your next problem: how about if there are many sites to config?
 
User avatar
ibutton77
just joined
Topic Author
Posts: 7
Joined: Sat Sep 20, 2008 1:31 am
Location: Bend, OR, USA
Contact:

Re: OSPF ECMP changes from ROS6 -> ROS7?

Fri Jun 02, 2023 5:52 am

from your diagram, i don't see anything for the ospf to have ecmp? am I missing something? πŸ€”

let us say, all links are ether with default cost of 10.
Or, we could say that A<>B is 10, B<>C is 10, and A<>C is 20 like I did in my initial post. 😁

To re-iterate, ROS6 forms ECMP just fine in the above example.
And in all other examples where different paths with potentially different number of hops still add up to the same total costs.

I have been able to reproduce this new ROS7 behavior on the test bench with 4 different kinds of hardware, on every version of ROS7 up to the latest @ 7.9.1, and whether every router in the test is running ROS7 or whether it is a mixture of 6 & 7.
The ROS7 routers refuse to ECMP two paths with different hop numbers regardless of cost settings.

maybe, that is maybe - two bridged tunnels on b, from ab and bc - could give you ac ecmp. i think that is what you did, hence you were introduced to your next problem: how about if there are many sites to config?
Yes that is exactly what I've had to do to work around the problem so far, given that the offending router is a hardware upgrade that can't do less than ROS7.

And we do have many sites to config, and I do want to avoid creating a spaghetti mess of tunnels just to try to stay ahead of the problem. πŸ˜‚

I'd rather get the "ECMP is impossible unless number of hops is identical" bug out of the way instead, and I'm digging my heels against any more ROS7 deployments until that gets resolved.

πŸ€” Though if any other folks have ideas for more maintainable workarounds, I am all ears. It doesn't have to work "just like it used to" as long as it can in fact work in some maintainable manner or another.

For example: We've got an ongoing project to try to vet all of our delivery paths for >1504 L2MTU, then maybe MPLS-TE could play a part.

Someone I was chatting with thought that maybe iBGP could work as a stop-gap. Cue thousand-yard stare 😧
 
User avatar
ibutton77
just joined
Topic Author
Posts: 7
Joined: Sat Sep 20, 2008 1:31 am
Location: Bend, OR, USA
Contact:

Re: OSPF ECMP changes from ROS6 -> ROS7?

Fri Jun 02, 2023 6:01 am

Dropping the redistribute metric on ROS6 side (because it is easier than going through a filter on ROS7 to get the defaults to sane values) to match ROS7 values, results in ECMP working again
Thank you SignumFera, though I'm not sure I follow what your describing or what solution you're suggesting?

1. I've never had to deal with redistribute metrics before, likely just one of the values it has been previously safe to leave untouched
2. The ECMP error I am describing does happen if all routers involved run ROS7 as well. Can't change the metric of a ROS6 router in a setting where there aren't any ROS6 routers right? πŸ˜…

Unfortunately I lack a ton of experience troubleshooting OSPF (or SIP as another fine example) from its log output, so I would definitely benefit from a more detailed breakdown of what you are showing from your example. 😯
 
wiseroute
Member
Member
Posts: 352
Joined: Sun Feb 05, 2023 11:06 am

Re: OSPF ECMP changes from ROS6 -> ROS7?

Fri Jun 02, 2023 9:18 am

@ ibutton77
then maybe MPLS-TE could play a part.
congratulations.... you have entered your next step in the service provider track πŸ‘πŸ»πŸ˜‚

for the ecmp path,
1. you could read some articles about eigrp for unequal ecmp.

2. for the time being (ospf way), either you can go with mpls te or you could use bandwidth rsvp thing ---> the last one, i'm sure that that was the cause for you to do ecmp in the first place πŸ€”πŸ˜‚

- redistribution? from where? static?

3. doing the hard *vlan tunnel* way still could be feasible option - as long as probably you don't need full meshed setup. I'm imagining about 100 sites πŸ˜‚ ---> of course, you should
use that dude nms to configure multiple sites in one single place πŸ‘πŸ»

have a good day πŸ‘πŸ»πŸ˜‰
 
wiseroute
Member
Member
Posts: 352
Joined: Sun Feb 05, 2023 11:06 am

Re: OSPF ECMP changes from ROS6 -> ROS7?

Fri Jun 02, 2023 9:56 am

@ ibutton77

aaa... yes. please wait. i think @signumfera has the point of using filter to set the metric.

but i don't have time to lab it up.

if you can do some search on ospf filters parameters - just try to grap this idea :

1. search inbound route to c, from b advertisment.
2. check whether you can modify its metric?
3. if you can modify and plan that filter on inbound ospf filter, then you should have ecmp.

just a thought πŸ€”
 
wiseroute
Member
Member
Posts: 352
Joined: Sun Feb 05, 2023 11:06 am

Re: OSPF ECMP changes from ROS6 -> ROS7?  [SOLVED]

Fri Jun 02, 2023 3:58 pm

@ ibutton77

well, i dont know whether this will work for your setup, but looks like nice workaround? on v7.6
[admin@a] > ip route/print
Flags: D - DYNAMIC; A - ACTIVE; c, s, o, y - COPY; + - ECMP
Columns: DST-ADDRESS, GATEWAY, DISTANCE
#      DST-ADDRESS    GATEWAY        DISTANCE
  DAc  10.0.11.0/30   vab                   0
  DAc  10.0.12.0/30   vac                   0
  DAo+ 10.0.13.0/30   10.0.12.2%vac       110
  DAo+ 10.0.13.0/30   10.0.11.2%vab       110
  DAc  172.16.1.1/32  lo0                   0
  DAo  172.16.1.2/32  10.0.11.2%vab       110
  DAo+ 172.16.1.3/32  10.0.12.2%vac       110
0  As+ 172.16.1.3/32  10.0.11.2           110
a, lo0 1.1/32
b, lo0 1.2/32
c, lo0 1.3/32

the original route from 1.1 to 1.3 was via iface vac. then i injected with that static route from a to c via b, using the same metric 110.

ping results
[admin@a] > ping count=3 172.16.1.3 src-address=172.16.1.1
  SEQ HOST                                     SIZE TTL TIME       STATUS                
    0 172.16.1.3                                 56  64 17ms770us 
    1 172.16.1.3                                 56  64 2ms812us  
    2 172.16.1.3                                 56  64 6ms269us  
    sent=3 received=3 packet-loss=0% min-rtt=2ms812us avg-rtt=8ms950us 
   max-rtt=17ms770us 
traceroute from a to c with different interface
[admin@a] > /tool/traceroute 172.16.1.3 src-address=172.16.1.1 interface=vab
Columns: ADDRESS, LOSS, SENT, LAST, AVG, BEST, WORST, STD-DEV
#  ADDRESS     LOSS  SENT  LAST    AVG  BEST  WORST  STD-DEV
1  10.0.11.2   0%       5  3.3ms   3.6  2.3   4.9    0.9    
2  172.16.1.3  0%       5  11.7ms  6.2  2     11.7   3.8    
but i thought those 2 tests dont fit to picture the ecmp test.

note that while doing this lab for you, the routers crashed when i injected that static route. i dont know why - but maybe you could lab it first. 3 routers abc only.


hope this helps.
 
User avatar
gmsmstr
Trainer
Trainer
Posts: 982
Joined: Fri Jun 04, 2004 2:22 am
Location: St. Louis, MO
Contact:

Re: OSPF ECMP changes from ROS6 -> ROS7?

Fri Jun 02, 2023 4:00 pm

This is a KNOWN and reproduced bug in RouteroS 7.9.1 (min), and they (MikroTik) knows about this and confirmed.

Using OSPF costs only, there is a NO ECMP across v7.9.1 routers when the cost is the same on a path. :(

They could not provide an ETA to fix this.
 
wiseroute
Member
Member
Posts: 352
Joined: Sun Feb 05, 2023 11:06 am

Re: OSPF ECMP changes from ROS6 -> ROS7?

Fri Jun 02, 2023 4:11 pm

@gmsmstr

aaa... thank you for the confirmation.

i dont know, changing those interface path cost made some of the ospf routes became inactive. is that bug as well?
 
User avatar
gmsmstr
Trainer
Trainer
Posts: 982
Joined: Fri Jun 04, 2004 2:22 am
Location: St. Louis, MO
Contact:

Re: OSPF ECMP changes from ROS6 -> ROS7?

Fri Jun 02, 2023 4:23 pm

Nothing is inactive, it just chooses one route to install vs ECMPing them. Even though the OSPF metric is the same.
 
User avatar
ibutton77
just joined
Topic Author
Posts: 7
Joined: Sat Sep 20, 2008 1:31 am
Location: Bend, OR, USA
Contact:

Re: OSPF ECMP changes from ROS6 -> ROS7?

Fri Jun 02, 2023 8:50 pm

Using OSPF costs only, there is a NO ECMP across v7.9.1 routers when the cost is the same on a path. :(
To add clarity though, per my testing ROS7 will form ECMP iff two paths have the same number of hops.

When they have different numbers of hops (but the same total OSPF cost), that's when ROS7 seems to cause problems using the hop number as a counter-productive tie-breaker.

This behavior differs from ROS5 & ROS6 which would ECMP paths with the same total cost regardless of the number of hops. That behavior was actually useful in practice, whereas breaking a same-cost tie would be far simpler to do by adjusting a cost somewhere if that's what an admin actually desired to happen.

For example my frustrating workaround of plumbing fresh vlans over longer paths to make them look like one hop is enough to allow ROS7 to play ball.. but the workaround is all of fragile, hell to maintain, significantly increased surface area for error, etc etc. ☹
 
wiseroute
Member
Member
Posts: 352
Joined: Sun Feb 05, 2023 11:06 am

Re: OSPF ECMP changes from ROS6 -> ROS7?

Sat Jun 03, 2023 2:28 am

@ ibutton77

For example my frustrating workaround of plumbing fresh vlans over longer paths to make them look like one hop is enough to allow ROS7 to play ball.. but the workaround is all of fragile, hell to maintain, significantly increased surface area for error, etc etc. ☹
well, i think you don't need to plumb vlans across network now. see my last post - i have made a simple workaround for your problem. but you need to test it out in your lab first.

ecmpπŸ‘πŸ»

Who is online

Users browsing this forum: Google [Bot] and 8 guests