From some weeks ago I’ve got some problems with one of my CCR’s.
I have about 1000 clients in each one (Using Hotspot) and about 800 clients connected simultaneously (Rush hour) which is normally when the problem happens.
Basically the problems is that the traffic starts to drop in a constant (From 400MB until 70/90) and the clients can’t connect or loggin with their credentials in hotspot back on. The CPU% is never over 10%. The acces to the interface works normally using The Dude and the traffic gets back to normal after a reboot.
Does anyone has any idea about what is going on here?
The problem have just happened again. I could notice this time that the current was arround 1.1290 mA while my other CCR was running well with 1.350 mA.
We have changed the CCR which was crashing to another city with way less clients, we noticed that the CCR which was crashing here in our main network stopped to crash while the CCR which was in the smaller city and have never had this issue started to crash exactly like the old one. Any ideas about what is going on?
As @chechito says, check profile when it happens. Remember that even though your CPU is “only” hitting 10%, you have 36 cores making up 100% and if one of those is running at full steam as RouterOS likes to single thread things there is a high chance you are maxxing out your capabilities with that device.
Cde, when you say power supply caps, are u refering to the caps inside of the CCR or to my nobreaks and batteries? (I have both CCR in this shelter connected to the same nobreak)
If is something realted to my nobreaks, batteries or power plug don’t you think the problem would happen every time that the CCR reachs exactly some point of usage? We have days which we worked over the peaks that we have last time the problem happened.
And if you are talking about the caps inside of the CCR, the problem would start to happen in the another shelter, where we have moved the CCR that had the issue for the first time?
I’m not sure if this is right but I’m kind of ignoring a power issue here because I think a power supply issue would be something more like 0 or 1 problem, It would not occur randomly. Does it make sense?
actually you can have 2-3 cores at 100% due to some problem and slowing down the entire router, the other 33-34 cores totally idle and the total cpu usage will be between 8 to 11 %
Interesting that you have the exact same CCRs as us. We have a few hundred PPPoE sessions terminated on ours and one of our engineers suspected that DNS load was causing intermittent problems.
We decided to just move DNS onto dedicated resolvers (we just set up a couple of linux vms), which according to the stats are handling ~100 requests a second on average.
Lucasrbs,
Same problem here, 1 core hitting 100% CPU and traffic starts to drop when using Hotspot (also around 800 users).
When i disable hotspot CPU lowers immediately.