CCR 1036 12G 4S - Low traffic

Hello guys, could anyone help me here?

From some weeks ago I’ve got some problems with one of my CCR’s.

I have about 1000 clients in each one (Using Hotspot) and about 800 clients connected simultaneously (Rush hour) which is normally when the problem happens.
Basically the problems is that the traffic starts to drop in a constant (From 400MB until 70/90) and the clients can’t connect or loggin with their credentials in hotspot back on. The CPU% is never over 10%. The acces to the interface works normally using The Dude and the traffic gets back to normal after a reboot.

Does anyone has any idea about what is going on here?

Thanks

The problem have just happened again. I could notice this time that the current was arround 1.1290 mA while my other CCR was running well with 1.350 mA.

We have changed the CCR which was crashing to another city with way less clients, we noticed that the CCR which was crashing here in our main network stopped to crash while the CCR which was in the smaller city and have never had this issue started to crash exactly like the old one. Any ideas about what is going on?


Thanks

Check power supply caps


Sent from my SM-A520W using Tapatalk

check cpu usage on each core

you can do it in system resources CPU

using tools profile you can see the source of CPU usage on total basis or per core basis

At the momment I’ve some peaks of DNS in some of them, but nothing irregular. If the problem occurs again, I will try to check at the momment.

Thanks man.

As @chechito says, check profile when it happens. Remember that even though your CPU is “only” hitting 10%, you have 36 cores making up 100% and if one of those is running at full steam as RouterOS likes to single thread things there is a high chance you are maxxing out your capabilities with that device.

Cde, when you say power supply caps, are u refering to the caps inside of the CCR or to my nobreaks and batteries? (I have both CCR in this shelter connected to the same nobreak)

If is something realted to my nobreaks, batteries or power plug don’t you think the problem would happen every time that the CCR reachs exactly some point of usage? We have days which we worked over the peaks that we have last time the problem happened.

And if you are talking about the caps inside of the CCR, the problem would start to happen in the another shelter, where we have moved the CCR that had the issue for the first time?

I’m not sure if this is right but I’m kind of ignoring a power issue here because I think a power supply issue would be something more like 0 or 1 problem, It would not occur randomly. Does it make sense?

Thanks.

Actually I haven’t pay attention to that since I keep my eyes on “overal” CPU usage on the main screen. I ll pay attention to that, thanks man.

exactly, that’s the point

to complement I expose an example:

actually you can have 2-3 cores at 100% due to some problem and slowing down the entire router, the other 33-34 cores totally idle and the total cpu usage will be between 8 to 11 %

Hey guys, thank you for all the support.

We have figger out what is goins on here. You are right about the cores, at some point one core get 100% load and the traffinc starts to drop.

We noticed that usage was related to DNS and our max UDP packet size was over bigger than we need. We set it properly now.

Anyway we are keeping an eye on it to see if the problem happens again.

Thanks all of you.

Interesting that you have the exact same CCRs as us. We have a few hundred PPPoE sessions terminated on ours and one of our engineers suspected that DNS load was causing intermittent problems.

We decided to just move DNS onto dedicated resolvers (we just set up a couple of linux vms), which according to the stats are handling ~100 requests a second on average.

Lucasrbs,
Same problem here, 1 core hitting 100% CPU and traffic starts to drop when using Hotspot (also around 800 users).
When i disable hotspot CPU lowers immediately.

What was your solution exactly?