PPPoE disconnection over VPLS/MPLS and OSPF

Hi, i have a huge Fiber Optic network with Mikrotik 2011 routerboards. How i could know why the pppoe clients are dissconecring from de RouterOS Server?
I can see a lots of these dissconection in the log but i can’t know why is the reason. May be a MTU problem?. I dont’ know where i must start looking for.




In some cases, there is no more ethernet availables for customers in the box and we add a switch layer 2 (non admin)

Some help will be appreciated!.

Thanks.

Are all the pppoe clients PCs? Do you have any mikrotik device acting as a pppoe client? Does it disconnect frequently too?

Can you isolate if these disconnections are more frequent on a certain network branch?

What brand/model L2 switches are you using?

By your symptoms I would look for L2 problems: bad switch, bad cable/fiber run, bad port, bad SFP…

Something that can aid you into diagnosing network health is smokeping, have a look at https://www.youtube.com/watch?v=lZfhO_jTv84

Are all the pppoe clients PCs?

No, There are almost 80% of routers in customers homes.

Do you have any mikrotik device acting as a pppoe client? Does it disconnect frequently too?

none!!

Can you isolate if these disconnections are more frequent on a certain network branch?

The PPPoE servers are VLANS or VPLS. But i can’t find a pattern of disconnection. The disconnection is in all the network areas.

What brand/model L2 switches are you using?

There are 3com and HP.

By your symptoms I would look for L2 problems: bad switch, bad cable/fiber run, bad port, bad SFP..

if these would be true, the symptoms could be in one VLAN or one VPLS, but no in the all the network.

And MTU?.. what would be the corrects numbers for my network ?.

Thanks.

That will depend on your hardware capabilities and your settings. This presentation may help understanding: http://mum.mikrotik.com/presentations/US13/kirnak.pdf

Thanks!,

I’m questioning myself if this disconnections are really made by MTU because the are no complains for the customers. I see this disconnections in the RouterOS log, but they don’t call me.

Another question is, if i need use VLANS in this topology. What could be the benefits? (i am thinking in use this network for IPTV in the future).

Best regards.

They just turn off their modems. Thats why it disconnects…

Just a question - what version of ROS are you running?

There was a bug fixed in 6.37.2 with dynamic simple queues (commonly used with PPPoE) where the deletion of a queue (caused by a customer disconnecting) could, on a heavily loaded router, cause a CPU spike that would lead to a cascade effect where other customers would get disconnected as well.

@mducharme, i am using the version 6.37.1.

If you use VPLS take care of the max mtu capacity of your media converters. Sometime ago we’ve used vpls to delivery pppoe-servers on remote sites but we switch the network to layer-3 and segregate the pppoe-servers because our converters can not pass l2mtu larger than 1504, so, mpls-mtu configured to 1504 turned In a lot of fragmentation on vpls tunnels.

Try to change your mpls mtu to 1504 on all routers and use the default 1480 on pppoe-servers. Also, take a look at packets on vpls tunnels to know if there are any fragmentation. Almost certain that your problem is caused by mtu.

@sneeep,
Where i can find the fragmentation packets value?.

I have had discovered this: in Dude i have a lot of outages in the CRS125 switch. And not ever but sometimes the CPU is in 100%.
This could be the reasons of the VPLS tunnels Links down?

thanks for your support,

Try to observe the values of overall stats on your vpls interfaces, if there are so much 64 or 65-127 packets is supposed that you have some fragmentation.
Your CRS are acting like a router. Do you have another box to test in the place of this CRS ? Try to run tool>profile on CRS when cpu are in 100%. The heavy load on cpu maybe in fact can drop your tunnels…

srry, overall stats on the physical router interfaces ***

i don’t have another box to test. But i can try to get one borrowed.

Bellow, the attach with CPU information, and Overal stats.




Too much load on Routing protocol, probably this heavy load are the cause of your vpls disconnects.
One question about your design, you put all switch ports at slave and set master on ether1, the routers after the switch are making ospf adjacency with your CRS ? Or, you are using point to multipoint with your router on top of the design ?

If you can use your CRS to only switch, this can heal the heavy routing load problem.

I don’t have an good english but, practice leads to perfection.

One question about your design, you put all switch ports at slave and set master on ether1, the routers after the switch are making ospf adjacency with your CRS ? Or, you are using point to multipoint with your router on top of the design ?

The CRS had the default configuration, i added the bridge “loopback”, and then the OSPF settings, and the IP-addresses. In theory with these, the MPLS could passthrout by the switch itself withouth any settings, but they were not do it, and VPLS was not link up.
But then, if i put MPLS interfaces and ID settings, everything goes normal.

What is the way to setup the Switch like a Switch (and not router) with OSPF and MPLS Routing.? Is this posible.
(I have another CRS125 for backup purposses, and i can test it in lab).

My native language is Spanish, but i am learning too.

How have you setup the CRS? Are all ether ports slaves (using switch chip) or on software bridges?

Have you checked the CRS Firmware? It should act transparently in L2.

I’d upgrade it to latest bugfix (6.37.4) and issue a reset to no defaults testing afterwards.

I don’t know if this possible because i’ve used ospf+mpls/vpls a long time ago and don’t need to have a switch on network.

What you think about put your CRS to only switch wirespeed on all ports and, make an point to multipoint adjacency of your core router with the other routers after the switch. It’s supposed to work, i’ve never tested an mpls/vpls scenario with point to multipoint, but, if you can, switch on wirespeed on your CRS will drop a lot of cpu-load.

@pukkita,

It should act transparently in L2.

Are you saying that the CRS switch should transport the “OSPF and MPLS” in transparency way and without any settings?.
But.. i am with a lot of devices configurated with differents IP addresses and routing. I must change all this ? OMG

Look up to my network again please, can you tell me (if CRS run like a transparent switch) what i should change ?

Thanks for your support.

@oxigeno20, you need to delegate an /24 ( example ) for your interface that are pointed to the switch, put the switch all in L2 and change the ospf to “point to multipoint” in your routers that make adjacency with the core router, it should work

Are you saying that the CRS switch should transport the “OSPF and MPLS” in transparency way and without any settings?.
But.. i am with a lot of devices configurated with differents IP addresses and routing. I must change all this ? OMG

Look up to my network again please, can you tell me (if CRS run like a transparent switch) what i should change ?

Yes, any true transparent switch allows that.

You need to do basically what sneeep posted; lay out a proper subnet between the x86 core router and the 2011A’s and change interafce addressing to accomodate that, or go the lazy way and make the x86 “impersonate” the CRS:

1.- Put the CRS in hardware switch
2.- Connect three more interfaces from the x86 to the switch
3.- Assign 192.168.101.1/30, 192.168.102.1/30 and 192.168.103.1/30 addresses to those interfaces

Forget about the switch being there and imagine there are only cables linking directly the x86 to the 2011A’s.