Community discussions

MikroTik App
 
User avatar
astounding
Member Candidate
Member Candidate
Topic Author
Posts: 121
Joined: Tue Dec 16, 2008 12:17 am

Riddle Me This, OSPF (buggy?)

Thu Jul 21, 2011 3:51 am

I set up a test network consisting of 4 RB750GL boxes. All switching and bridging was removed so ethernet ports were NOT connected to other ethernet ports internally (by bridge or switch). Then I made a ring using the four boxes, A, B, C, and D, connecting ports 1 and 5 on each as follows:
...[D]e5<=>e1[A]e5<=>e1[B]e5<=>e1[C]e5<=>e1[D]e5<=>e1[A]...
Then I set up IP networks on each link:
10.20.30.0/30 between A eth5 (.1) and B eth1 (.2)
10.20.30.4/30 between B eth5 (.5) and C eth1 (.6)
10.20.30.8/30 between C eth5 (.9) and D eth1 (.10)
10.20.30.12/30 between D eth5 (.13) and A eth1 (.14)
Additionally, a non-connected loopback bridge was created on each--by nonconnected I mean that in the "/interface bridge ports" section, there were ZERO entries--thus the bridge did not actually bridge anything. And a /32 was assigned to each device's loopback bridge:
10.20.30.64/32 to A
10.20.30.65/32 to B
10.20.30.66/32 to C
10.20.30.67/32 to D
Next OSPF was set up on each device as follows (the config below is from box D--change the router-id to the loopback for each box and change the two /30 networks listed in the "/routing ospf network" section to the two /30 networks that the specific box uses to connect to the two neighbors:
/routing ospf instance
set default disabled=no distribute-default=never in-filter=ospf-in metric-bgp=auto metric-connected=20 \
    metric-default=1 metric-other-ospf=auto metric-rip=20 metric-static=20 name=default out-filter=ospf-out \
    redistribute-bgp=no redistribute-connected=as-type-1 redistribute-other-ospf=no redistribute-rip=no \
    redistribute-static=no router-id=10.20.30.67
/routing ospf area
set backbone area-id=0.0.0.0 disabled=no instance=default name=backbone type=default
/routing ospf interface
add authentication=md5 authentication-key=fooBARbaz authentication-key-id=1 cost=10 dead-interval=8s disabled=no \
    hello-interval=2s instance-id=0 interface=ether1 network-type=point-to-point passive=no priority=1 \
    retransmit-interval=5s transmit-delay=1s use-bfd=no
add authentication=md5 authentication-key=fooBARbaz authentication-key-id=1 cost=10 dead-interval=8s disabled=no \
    hello-interval=2s instance-id=0 interface=ether5 network-type=point-to-point passive=no priority=1 \
    retransmit-interval=5s transmit-delay=1s use-bfd=no
/routing ospf network
add area=backbone disabled=no network=10.20.30.12/30
add area=backbone disabled=no network=10.20.30.8/30
Everything worked exactly as expected with one exception: Box D and A would NOT form an OSPF neighbor relationship. No matter what I tried, removing IP address, adding them again, removing OSPF configuration items, adding them back, rebooting devices, changing to alternate ethernet ports. NOTHING would work. Inevitably, box A would show state "Init" for the A-to-D link in the "/routing ospf neighbor" section, but D would show nothing, no entries at all.

I made a snapshot of the configuration and then manually reset box D to factory settings. Then using the exported configuration, I configured box D again EXACTLY HOW IT WAS CONFIGURED THE FIRST TIME. Let me emphasize that. It was an EXACT DUPLICATION of the original NON-WORKING configuration.

But it worked. Resetting to factory worked. Rebooting had not worked.

So.....

WHY???

What hidden configuration item is there that the RouterOS command line cannot see (using "print" and "export" commands) that somehow changed?

What OSPF state exists hidden from administrative control that BREAKS THINGS even when the configuration is 100% correct and 100% accurate and SHOULD WORK???

This is a huge problem when devices are deployed in the field and a tested, working configuration is created and tested in the lab, but fails upon deploying to the remote devices. Then the non-working devices require a truck to roll and a visit to manually reset them on premises. Then the tested configuration starts working. Sorry, but this is unacceptable.

Can I reproduce this? Who knows. What caused this bizarreness anyway? I have no idea. I can't reproduce it. If it happens again, I certainly will generate a support file BEFORE resetting to factory.

Sadly, this isn't the first time RouterOS has behaved in a quirky manner for me. I just wish I'd documented the oddness in times past and created support files. This hard-to-reproduce seemingly-random misbehavior reputation RouterOS has with my fellow coworkers has them always telling me to give up and go Cisco. (But I like these Routerboard boxes... I just want them more consistently reliable.)

Any ideas? Any suggestions beyond capturing support files and reporting bugs should I run into this in the future?

Puzzled and frustrated,
Aaron out.
 
User avatar
astounding
Member Candidate
Member Candidate
Topic Author
Posts: 121
Joined: Tue Dec 16, 2008 12:17 am

Re: Riddle Me This, OSPF (buggy?)

Thu Jul 21, 2011 3:54 am

I must express appreciation to MikroTik for making MTUs on OSPF interfaces get ignored by default now. At least it seems that way, as during my testing, at times I had MTU mismatches yet OSPF worked just fine (the aforementioned problem excepted).

Thanks, MikroTik! (Next test of MTU ignoring will be RouterOS <=> Cisco...)

Aaron out.
 
User avatar
astounding
Member Candidate
Member Candidate
Topic Author
Posts: 121
Joined: Tue Dec 16, 2008 12:17 am

Re: Riddle Me This, OSPF (buggy?)

Thu Jul 21, 2011 3:58 am

I should have mentioned all four boxes are running 5.5.

Aaron out.
 
fewi
Forum Guru
Forum Guru
Posts: 7717
Joined: Tue Aug 11, 2009 3:19 am

Re: Riddle Me This, OSPF (buggy?)

Thu Jul 21, 2011 4:20 am

If it ignores MTU mismatches that's horrible, even more so if it's quietly (at least seeing that turned on as a "feature" is a nice big red flag). If you have mismatched MTU values between OSPF peers sooner or later stuff will break. If you're unlucky it's later, and you're stuck troubleshooting something perplexing.
 
User avatar
astounding
Member Candidate
Member Candidate
Topic Author
Posts: 121
Joined: Tue Dec 16, 2008 12:17 am

Re: Riddle Me This, OSPF (buggy?)

Thu Jul 21, 2011 4:56 am

If it ignores MTU mismatches that's horrible, even more so if it's quietly (at least seeing that turned on as a "feature" is a nice big red flag).
You've got a point there. If it silently ignores it without a specific setting specifying that I want it ignored, that is bad.
If you have mismatched MTU values between OSPF peers sooner or later stuff will break.
I've got several places where I deliberately have MTU mismatches. They're all link networks with no endpoint hosts. And these midpoint networks are running jumbo-sided MTUs (>1500 bytes). End-point hosts that communicate across it are all on networks with ordinary 1500 byte MTUs. Having a larger MTUs, some of which are mismatched (deliberately) in the pathway has caused me absolutely zero problems.

WARNING: I don't recommend it, however, because, yes, MTU mismatches can cause some real head scratching when one runs into issues. (Or in other words, in general I agree with you, but in specific cases, it can be done safely if you're very careful and know what you're doing.)

Having the option to ignore MTU mismatches for OSPF does come in handy when one is readjusting MTUs across one's network and one doesn't want OSPF sessions to die while the work is going on.

Aaron out.
 
fewi
Forum Guru
Forum Guru
Posts: 7717
Joined: Tue Aug 11, 2009 3:19 am

Re: Riddle Me This, OSPF (buggy?)

Thu Jul 21, 2011 5:18 am

By the sounds of it that's a different case though. I'm not advocating the same MTU be used across the entire path. OSPF is supposed to - as per the RFC anyways - throw a warning only when two peers have different MTUs to each other (so if two sides of a p2p link have different MTUs, or when different links on the same broadcast network have different MTUs). Routers with interfaces in networks with different MTUs are not an issue, and shouldn't be.

If you have link networks that only routers are on that's fine when both routers on that link have the same MTU. You have an issue when two routers, e.g. 1.1.1.1/30 and 1.1.1.2/30, connect to each other and the MTUs mismatch and this doesn't cause the adjacency to fail. At some point you may run into an LSU or DBD packet being bigger than the MTU the peer can support, so it discards it, and the database become unsynchronized (and then adjacency fails due to missed ACKs).

This is particularly problematic because in a lab or small network the packets are probably rather small. Then the network grows over the years (or it gets put into production from the lab) and suddenly something that worked initially fails, which can be rather surprising when several years have passed.
 
User avatar
astounding
Member Candidate
Member Candidate
Topic Author
Posts: 121
Joined: Tue Dec 16, 2008 12:17 am

Re: Riddle Me This, OSPF (buggy?)

Thu Jul 21, 2011 3:08 pm

I'm sorry I even brought up the MTU thing, as I was hoping this thead would be about the OSPF issue. Oops. It's a non-issue for me. But since I did mention it, I appreciate the reasons MTU mismatches should be avoided being mentioned in case of a future reader or the thread thinking otherwise.

Aaron out.
 
fewi
Forum Guru
Forum Guru
Posts: 7717
Joined: Tue Aug 11, 2009 3:19 am

Re: Riddle Me This, OSPF (buggy?)

Thu Jul 21, 2011 4:20 pm

Yeah, sorry I was hijacking. My apologies.

Unfortunately I have not much to add to the original topic. It shouldn't have happened, but without being able to reproduce it I doubt they'll give a bug report much credence.

Who is online

Users browsing this forum: No registered users and 22 guests