CCR Frozen for electrical disconnection

Hello everyone,
first of all I want to apologize for the English if there is any translation problem, I live in a small town with a lot of electricity problems, currently I have a CCR 1036 as a PPPoE server and I get to have about 2000 active sessions, the problem is that there are power cuts in some neighborhoods and this causes around 400 or 500 users to disconnect at the same time. During this event, the CCR gets stuck and can stay up to 2 or 10 minutes while processing so many disconnections. I would like to know Is there any way to prevent the CCR from getting stuck since I have set the time to live high and it doesn’t seem to work if it is a physical disconnection, thank you very much for the help

First you need to protect the CCR from power outages by utilizing a Uninterrupted Power Supply otherwise known as a UPS …

A good quality UPS with AVR that will protect your CCR, Switches and ISP gear starts arround $1,000 or more depending on how many hours of uptime you expect from the UPSfor all the gear you want to protect … the more uptime the costlier the UPS will be…

You can start by checking the following link Schneider Electric’s fully integrated uninterruptible power supply

Hello friend, I have protection in my header, the problem is in the user’s houses, the power goes out and comes back causing too many requests to disconnect and connect to the CCR at the same time and it gets stuck for this reason

Is your CCR doing only PPPoE, or is it by chance also doing the NAT for all those customers?

Hello, only PPPoE

Ok… then I do not know. I remember a lecture where someone explained why using “masquerade” with lots of PPPoE clients being masqueraded causes problems when many sessions disconnect, because for each disconnect it will walk the entire NAT table and delete all sessions for that client. That would overload the CPU.

The log shows a number of interfaces that go down one by one, all the routes going down and all the IPs of each client, until it disconnects all of them, it does not react again and reconnect them because the power comes and goes in a second

The generic issue is that each interface going down takes some CPU resources, to remove sessions and routes, and at some point the load may even become so high that connections that were still alive start failing as well due to timeouts while the CPU is busy with all the cleanups. Then an avalanche effect occurs.
I have no personal experience with it, only wanted to mention it as you may use search to find other articles.