I have 13 wireless PoPs and a small fiber to the home deployment (about 100 customers on the FTTH). All 13 wireless pops consist of several AP's all connected via Ethernet to a CCRSwitch. Port one of the CCRSwitch is always the Backhaul radio. Near our NOC is a tower were all of the wireless comes to. On the tower is 9 backhauls and 5 AP's that all connect to a CCRSwitch on the ground. Port 1 of this CCR switch is connected to another CCRSwitch in our NOC via fiber. The CCRSwitch in the NOC had 3 ports in use. Port 24 is connected to the CCR switch at the tower so all wireless traffic comes into port 24. Port 22 is connected to the CCR PPPoE server so all the wireless PPPoE comes and goes there. Port 23 on the NOC switch is connected the 10.10 port on the core router so all the Management traffic from all the radios and switches and other devices comes and goes here.
Our small FTTH deployment is a calix gpon system that connects via fiber directly between the calix C7 and our Mikrotik PPPoE server so no Mikrotik switches between the fiber customers and the PPPoE server.
This use to work wonderfully, we had very happy customers and almost no issues with the network itself. Then our almost 10 year old x86 routerOS PPPoE server died...
Our 10'ish year old x86 routerOS PPPoE server was running RouterOS 6.3something when it died. Anyway the replacement is a CCR1036-12G-4S-EM so the backup config from the old x86 couldn't be loaded onto the new server and I had to configure it from scratch (something I hadn't done for 10'ish years). Because of some major connection issues we were having (that turned out to be caused by a faulty port on a core switch) one of the many things we did trying to fix the problem was upgrade all the mikrotik routers and switches on the network to 6.42.3. Most of them had previously been running 6.3something.
Anyway, for 10 years the old x86 PPPoE server ran with a 1492 MTU and we never changed the factory default MSS Clamping setting on the Canopy, 450i and ePMP radios we installed. It was a great network, everything ran great and I had 800ish very happy PPPoE customers. Then, after installing the new PPPoE server and upgrading all the mikrotik stuff to 6.42.3 everything sucks now... constant PPPoE and connectivity problems.
First thing we discovered was that noone could reach 1/2 the internet. They could load MSN, Ebay, Gmail, Google, Youtube just fine but Netflix, YahooMail, Fast.com, lots of other websites wouldn't load at all, hulu and amazonprime websites would load but movies/shows would never load a single frame.
Turning on MSS Clamping on the customer radios , a setting we had not messed with in the 10 years of doing PPPoE, fixed it... mostly. Seems that turning MSS Clamping on would fix the problem for hours, or days but eventually the problem would return. However with MSS Clamping on, rebooting the customer radio would fix it.. again for a while.
So, it looks like something between 6.3.3'ish and 6.42 broke MTU discovery ?
Our fiber to the home customers (only about 100 right now) were not affected by this. They all come into a Calix C7 which connects straight into the PPPoE server , they have their own PPPoE server configured on the PPPoE server ( PPPoE server has 2 PPPoE servers configured, one for FTTH and bound to of one the SFP ports and the other for the wireless customers and bound to ethernet port 4 so each can not see the others PPPoE server).
We were also having issues with many of the wireless radios connecting to PPPoE and instantly dropping. The PPPoE server logs would show they were authenticating and then instantly hanging up.
We changed the MTU to 1480 (from the 1492 it had always been) and suddenly the people on the fiber started having the same "can't reach 1/2 the internet" problem. Rebooting their pppoe client (usually a netgear/belkin router) would fix it until it happened again.
So, we set the MTU/MRU to 1492 on the fiber PPPoE settings and set the Wireless PPPoE server to 1480 . We had several VPN users that were having a great many disconnects every day after the big Mikrotik upgrade and changing the 1492 to 1480 seems to have reduced , but not eliminated, the number of times their VPN disconnects every day.
I made a lot of use of the whole Master / Slave port grouping on the mikrotik switches and that was illuminated with the 6.42. I never messed with the bridge settings on the switches but I'm seeing the upgraded switches all have a bridge now with all the ports in it and one seemingly random port called "root" . The root port is never the backhaul port, it seems to have been designated as root by some random process and doesn't appear to be a setting I can change.
Anyway, just trying to figure out why, after updating to 6.42, I had to turn on MSS Clamping and change the MTU/MRU to 1480 from 1492 to stop the "can't access 1/2 the internet" problem and why VPN users are now being disconnected several times a day (many times a day if I set MTU/MRU back to 1492) on any devices passing through a mikrotik switch . On the FTTH there are not tik switches (or any other switches) between the customer pppoe client and the PPPoE server and the MTU/MRU remains at 1492 (we actually have problems on the FTTH if we lower the MTU/MRU) have no problems. I can't speak to MSS Clamping because the fiber customer's PPPoE Client is their own device, mostly netgear and belkin routers that, as far as I know don't have a setting to enable/disable MSS Clamping and it is set to "default" on the PPPoE server.