OSPF Unstable high peer count (6.45.1 - CCR1016/CCR1036)

Thu Jul 04, 2019 5:27 pm

Hi All,

I was testing OSPF (and MPLS/LDP) on the CCR1016, CCR1036, CCR1072.

The idea is:
CCR1072 only has 7 ports. I need 10G connectivity to all 80 of my POP areas. To do this, in the datacenter I have 2x layer 2 switches trunked together with 100G ports. Then connected to two (each switch to one) CCR1072. The connection between the layer 2 switches and CCR1072 would be 2 LAG/bonds of 30G each.

Each POP contains a single CCR1072, connected to 6x CCR1016/CCR1036 at customer locations.

All connections are dark fiber / DAC for same rack.

Before implementing this I wanted to test how the CCR1072 or CCR1036 would handle multiple vlans + ospf + mpls on a single physical interface, and it turns out some OSPF sessions randomly disconnect.

The test setup is a CCR1036 connected to a CCR1016 with DAC cable on sfpplus1.

I have attached the full configuration files for both test routers.
Before looking at the config:
- CCR1016 and CCR1036 are connected directly using a DAC. No layer 2 switch in between.
- Yes, routes should not be redistributed 'from connected' - they should be in the network table. For testing purposes, this should not matter.
- Yes, the loopback address has 2 IPv4 addresses, this is because LDP does not work (is not stable) on an address that is used as a loopback(/32) and a point to point link (/32)
- Yes, 1500 routes is 'a lot', but not really.
- Yes, 150 ospf neighbors on a single interface (with VLANs) feels like a lot but for this setup it will be required which is exactly what I'm testing.

Anyone have any idea why the OSPF neighbors drop?

edit - on router reboot or OSPF instance down/up, all neighbors come back up. test instance has 0 traffic flowing through it.
