Community discussions

 
miq
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 63
Joined: Fri Nov 06, 2009 3:18 am

CCR1036-8G-2S+ - 100% cpu usage

Fri Oct 28, 2016 1:14 pm

Hello.
We are migrating our clients from linux router (dhcp + iptables + iproute + ipmark and fiew other strange names ;)) to CCR1036-8G-2S+ with pppoe. At this moment about 200 clients are migrated to MT, about 200 Mbps flow during MT.

We create profile for each client:
/ppp profile add name=10.10.84.150 local-address=10.10.87.250 remote-address=network-pppoe idle-timeout=5m use-mpls=default use-compression=default use-encryption=default only-one=yes change-tcp-mss=default use-upnp=default rate-limit=1126k/16896k dns-server=1.2.3.4,4.3.2.1
/ppp secret add name=amakh1 service=pppoe password=amakh1 profile="10.10.84.150" remote-address=10.10.84.150 comment=10.10.84.150
Each client has his own speed defined in profile. It's mean each client has his own simple queue.

In firewall it's 55 filter rules(access, connlimit and ddos trap) , 54 nat rules (src-dst nat for public address). No mangle/L7 other rules.

At this moment some of our clients has problems with infected cameras/DVR/other stuff - they can produce milions connections in the some time. On linux router we have fiew connlimit and recent modules, and it's enough - router only log information about blocked connections.

Yesterday we have this situation with clients connected to CCR1036-8G-2S+ via PPPOE. In one second cpu usage grown to 100%, and after minute router was rebooted by watchdog. It happens 8 times during 10 hours. During this situation console and winbox was unavailable. Once I could see 80% usage of firewall in /tool profile.

Because I knowed about infected things I create two traps. Connlimit:
chain=forward action=jump jump-target=block-connection src-address=10.10.80.0/21 log=no log-prefix="" 
chain=forward action=drop src-address-list=connlimit log=no log-prefix="" 
chain=block-connection action=return connection-limit=!2000,32 log=no log-prefix="" 
chain=block-connection action=add-src-to-address-list address-list=connlimit address-list-timeout=10m log=yes log-prefix="" 
and ddos protection:
chain=forward action=jump jump-target=block-ddos connection-state=new log=no log-prefix="" 
chain=forward action=drop connection-state=new src-address-list=ddoser dst-address-list=ddosed log=no log-prefix="" 
chain=block-ddos action=return dst-limit=100,100,src-and-dst-addresses/10s log=no log-prefix="" 
chain=block-ddos action=add-dst-to-address-list address-list=ddosed address-list-timeout=10m log=yes log-prefix="" 
chain=block-ddos action=add-src-to-address-list address-list=ddoser address-list-timeout=10m log=yes log-prefix="" 
(thx @chupaka for idea)

And this rules works great. But not helps when 10 client make thousends connections - Miktotik is going to the ground.

At first I check routeros version - it was old (about 6.33.rc.4) - so I upgrade to lastest stable. It helps, but only a partial - CPU still growing to 80 - 99 %. but router it's still alive ;).
Second - in all profiles I set change-tcp-mss to no - and it looks like good thing, because router working 14 hours without reboot and big cpu usage.

I know simple queues it's not so good idea, but I read in routeros 6 simple queue works better. And with this router 200 queues shouldn't be a problem. So what's happens? What can I do to eliminate this problem? Linux router can handle this problem (attackers) with 1000 users and 1 Gbps, and why MIkrotik only with 200 users and 200 Mbps don't? Maybe it's not because of connactions, but some other reason?

I can't replace simple queues with HTB in short time, because MT it's integrated with our customer panel.

THX for any ideas.
 
miq
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 63
Joined: Fri Nov 06, 2009 3:18 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Fri Oct 28, 2016 1:44 pm

And other question.
Each client profile makes one simple queue with queue-type: default-small. Default-small has pfifo kind. maybe it should be psq type, but I don't know it's matters with queue simple?
 
miq
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 63
Joined: Fri Nov 06, 2009 3:18 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Fri Oct 28, 2016 9:20 pm

Anybody? Mikrotik team? It's not cheap device, it shouldn't do thing like this :(
 
Kozmess
just joined
Posts: 8
Joined: Sat Oct 29, 2016 12:40 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Sat Oct 29, 2016 2:59 am

Do you use IP's in queues? If so- try to use interfaces, not ip's. It is much easier for Mikrotik look only at interface of packets, not in EACH packet's header and IP's in it.
 
miq
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 63
Joined: Fri Nov 06, 2009 3:18 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Sat Oct 29, 2016 1:35 pm

I think i talk to myself ;)
I have an idea - it's possible to make a firewall loop with address list?
Today I tried block attack to my client - lot of hosts around the world trying to connect to 53 udp port on clients router. He has src/dst nat:
chain=dstnat action=dst-nat to-addresses=10.10.82.155 dst-address=1.2.3.4
 chain=srcnat action=src-nat to-addresses=1.2.3.4 src-address=10.10.82.155 out-interface=bridge-vlan2
At first I create simple rule:
chain=forward action=drop protocol=udp dst-address=10.10.82.155 dst-port=53 log=no log-prefix=""
and it works! So because I have special chain to create idividual rules, I trying add client to this chain. Here is this rule:
chain=forward action=jump jump-target=client-rules src-address-list=client-rules log=no log-prefix=""
Clients ip are already on client-rules list. But as you see there is a filter src-address-list=, so my client doesn't match. So I also set dst-address-list=client-rules - and suddenly CPU usage growing to 100%. Fortunately console was still available, so I can disable this rule. And cpu usage back to normal in fiew seconds.

@Mikrotik team, can You look at this case? It's possible, because when problem appeared first time I did something with other address list.
 
miq
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 63
Joined: Fri Nov 06, 2009 3:18 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Sat Oct 29, 2016 5:07 pm

Do you use IP's in queues? If so- try to use interfaces, not ip's. It is much easier for Mikrotik look only at interface of packets, not in EACH packet's header and IP's in it.
PPPOE create queues itself, with interface name as target for example <pppoe-login>. Please look at attachment.
w.png
You do not have the required permissions to view the files attached to this post.
 
Kozmess
just joined
Posts: 8
Joined: Sat Oct 29, 2016 12:40 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Sat Oct 29, 2016 6:44 pm

.....
chain=forward action=jump jump-target=client-rules src-address-list=client-rules log=no log-prefix=""
Clients ip are already on client-rules list. But as you see there is a filter src-address-list=, so my client doesn't match. So I also set dst-address-list=client-rules - and suddenly CPU usage growing to 100%. Fortunately console was still available, so I can disable this rule. And cpu usage back to normal in fiew seconds.

@Mikrotik team, can You look at this case? It's possible, because when problem appeared first time I did something with other address list.
Dont use "src-address-list", try to use interface list of clients instead.
Or atleast add interfaces to your rule that defenently dont need to be analized for "src-address-list" with !interface-list
 
miq
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 63
Joined: Fri Nov 06, 2009 3:18 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Sun Oct 30, 2016 9:45 am

.....

Dont use "src-address-list", try to use interface list of clients instead.
Or atleast add interfaces to your rule that defenently dont need to be analized for "src-address-list" with !interface-list
Thx, but I know syntax and understand this problem :) But I tried to figure out how this situation it's connected with high cpu usage.
 
Kozmess
just joined
Posts: 8
Joined: Sat Oct 29, 2016 12:40 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Sun Oct 30, 2016 10:39 pm

It's all about mikrotik's analysis of packets.
When using queue using only IP's rules(plus some firewall rules) on 1009 it begin to lose packets on > 40K pps on my config
...and CPU even is not 100%
If i change queues to limit interfaces - no problem.
It's some kind of hardware\firmware limit i think
 
miq
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 63
Joined: Fri Nov 06, 2009 3:18 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Sun Oct 30, 2016 11:58 pm

Ok. But why with 200 users this device works perfectly stable during month, and use about 1- 2 % cpu, and suddenly in one second cpu usage going to 100%? And it's not depends of traffic or pps. And I can reproduce problem by setting some options.
 
Kozmess
just joined
Posts: 8
Joined: Sat Oct 29, 2016 12:40 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Mon Oct 31, 2016 1:50 am

Ok. But why with 200 users this device works perfectly stable during month, and use about 1- 2 % cpu, and suddenly in one second cpu usage going to 100%? And it's not depends of traffic or pps. And I can reproduce problem by setting some options.
Try to find some regularity. Maybe an error in rule for one specific client.
In my sutuation It was all about rules using IP's only, so mikrotik need to analyze EACH packet going trought. Even one rule(ip specific only) make It to analyze ALL trafic and packets to find their IP's In headers.
Another problem is a "loop" inside router using wrong rules, so packets could be analyzed more that 1 time
P.s. could you look In profile while it's 100%? Are there any specific rules In firewall for anti-DDOS?
 
miq
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 63
Joined: Fri Nov 06, 2009 3:18 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Mon Oct 31, 2016 11:53 am

Ok. But why with 200 users this device works perfectly stable during month, and use about 1- 2 % cpu, and suddenly in one second cpu usage going to 100%? And it's not depends of traffic or pps. And I can reproduce problem by setting some options.
Try to find some regularity. Maybe an error in rule for one specific client.
In my sutuation It was all about rules using IP's only, so mikrotik need to analyze EACH packet going trought. Even one rule(ip specific only) make It to analyze ALL trafic and packets to find their IP's In headers.
Another problem is a "loop" inside router using wrong rules, so packets could be analyzed more that 1 time
P.s. could you look In profile while it's 100%? Are there any specific rules In firewall for anti-DDOS?
When CPU usage going to 100% I couldn't do anything, winbox and console freezing - it happens in one second. And I found some regularity - it happens when I do something with address list, and jump target. So I think it maybe some internal loop.
 
miq
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 63
Joined: Fri Nov 06, 2009 3:18 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Mon Oct 31, 2016 1:13 pm

And it's happen again. Other client open thousands connections to remote ports, CPU grown to 100%, console and winbox freeze. Mikrotik, this stuff costs $1000. This shouldn't happen.
 
Kozmess
just joined
Posts: 8
Joined: Sat Oct 29, 2016 12:40 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Mon Oct 31, 2016 8:21 pm

And it's happen again. Other client open thousands connections to remote ports, CPU grown to 100%, console and winbox freeze. Mikrotik, this stuff costs $1000. This shouldn't happen.
Did you look at Profile? Who's eating your CPU?
Could you give config of that part?
 
miq
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 63
Joined: Fri Nov 06, 2009 3:18 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Mon Oct 31, 2016 11:19 pm

And it's happen again. Other client open thousands connections to remote ports, CPU grown to 100%, console and winbox freeze. Mikrotik, this stuff costs $1000. This shouldn't happen.
Did you look at Profile? Who's eating your CPU?
Could you give config of that part?
As I write in first message: "During this situation console and winbox was unavailable. Once I could see 80% usage of firewall in /tool profile". It happens really quick, and even if I catch this situation "on live" I haven't time to do anything.

Here are all rules in my /ip firewall filter:
 0    ;;; permanent ban
      chain=input action=drop src-address-list=block log=no log-prefix="" 

 1    ;;; Drop invalid connections
      chain=input action=drop connection-state=invalid log=no log-prefix="" 

 2    ;;; Allow limited pings
      chain=input action=accept protocol=icmp limit=/5s,2 log=no log-prefix="" 

 3    ;;; Drop excess pings
      chain=input action=drop protocol=icmp log=no log-prefix="" 

 4    ;;; Accept established connections
      chain=input action=accept connection-state=established log=no log-prefix="" 

 5    ;;; Accept related connections
      chain=input action=accept connection-state=related log=no log-prefix="" 

 6    ;;; Ssh accept list
      chain=input action=accept protocol=tcp src-address-list=ssh-accept dst-port=22 log=no log-prefix="" 

 7    ;;; Winbox access list
      chain=input action=accept protocol=tcp src-address-list=winbox-accept dst-port=8291 log=no log-prefix="" 

 8    ;;; Proxy, lack of payment
      chain=input action=accept protocol=tcp src-address=192.168.80.0/21 dst-port=81 log=no log-prefix="" 

 9    ;;; Proxy no payment
      chain=input action=accept protocol=tcp src-address=192.168.80.0/21 dst-port=82 log=no log-prefix="" 

10    ;;; Snmp 
      chain=input action=accept protocol=udp src-address=111.222.202.131 log=no log-prefix="" 

11 XI  ;;; test chain disabled
      chain=input action=log dst-address=!255.255.255.255 in-interface=bridge-vlan2 log=no log-prefix="DROP INPUT" 

12 XI  ;;; test chain disabled
      chain=input action=log dst-address=!255.255.255.255 in-interface=bridge-vlan10 log=no log-prefix="DROP INPUT" 

13    ;;; Drop everything else
      chain=input action=drop log=no log-prefix="" 

14    ;;; Permanent ban
      chain=forward action=drop src-address-list=block log=no log-prefix="" 

15    ;;; one of clients, has special chain. He has a lot of attacks
      chain=forward action=drop protocol=udp dst-address=192.168.82.155 dst-port=53 log=no log-prefix="" 

16    ;;; Special clients, has infected devices, and make lot of attacks
      chain=forward action=jump jump-target=client-rules src-address-list=client-rules log=no log-prefix="" 

17    ;;; connlimit
      chain=forward action=jump jump-target=block-connection connection-state=new src-address=192.168.80.0/21 log=no log-prefix="" 

18    ;;; connlimit
      chain=forward action=drop connection-state=new src-address-list=connlimit log=no log-prefix="" 

19    ;;; Antiddos
      chain=forward action=jump jump-target=block-ddos connection-state=new log=no log-prefix="" 

20    ;;; Antiddos
      chain=forward action=drop connection-state=new src-address-list=ddoser dst-address-list=ddosed log=no log-prefix="" 

21    ;;; SMTP connlimit
      chain=forward action=drop tcp-flags=syn connection-limit=5,32 protocol=tcp src-address=192.168.80.0/21 dst-port=25 log=yes log-prefix="SMTP connlimit" 

22    ;;;  SYN logging
      chain=forward action=log tcp-flags=syn protocol=tcp src-address=192.168.80.0/21 log=yes log-prefix="SYN-FORWARD" 

23    ;;; Forward to 111.222.202.64/26
      chain=forward action=accept src-address=0.0.0.0 dst-address=111.222.202.64/26 log=no log-prefix="" 

24    ;;; Forward to 111.222.202.64/26
      chain=forward action=accept src-address=111.222.202.64/26 dst-address=0.0.0.0 log=no log-prefix="" 

25 XI  ;;; spammers protection disabled when problems happen
      chain=forward action=jump jump-target=spammers protocol=tcp src-address=192.168.80.0/21 dst-port=25,578,465 log=no log-prefix="" 

26 XI  ;;; spammers protection disabled when problems happen
      chain=forward action=drop protocol=tcp src-address-list=spammers dst-port=25,465,587 log=no log-prefix="" 

27    ;;; Local dns access
      chain=forward action=accept protocol=udp src-address=192.168.80.0/21 dst-address=192.168.80.0/21 dst-port=53 log=no log-prefix="" 

28    ;;; Local dns access
      chain=forward action=accept protocol=udp src-address=192.168.80.0/21 dst-address=111.222.202.0/23 dst-port=53 log=no log-prefix="" 

29    ;;; Access to  111.222.202.0/23 from 192.168.80.0/21
      chain=forward action=accept protocol=tcp src-address=192.168.80.0/21 dst-address=111.222.202.0/23 log=no log-prefix="" 

30    ;;; Block cheaters
      chain=forward action=drop in-interface=bridge-vlan10 out-interface=bridge-vlan2 log=yes log-prefix="Block manual configuration" 

31    ;;; No payment - no internet
      chain=forward action=drop dst-address=!111.222.202.0/23 src-address-list=nopayment log=no log-prefix="" 

32    ;;; connlimit
      chain=block-connection action=return connection-limit=!2000,32 log=no log-prefix="" 

33    ;;; connlimit
      chain=block-connection action=add-src-to-address-list address-list=connlimit address-list-timeout=10m log=yes log-prefix="" 

34    ;;; Antiddos
      chain=block-ddos action=return dst-limit=100,100,src-and-dst-addresses/10s log=no log-prefix=""
35    ;;; Antiddos
      chain=block-ddos action=add-dst-to-address-list address-list=ddosed address-list-timeout=10m log=yes log-prefix="" 

36    ;;; Antiddos
      chain=block-ddos action=add-src-to-address-list address-list=ddoser address-list-timeout=10m log=yes log-prefix="" 

37 XI  ;;; spammers protection - return public address
      chain=spammers action=return src-address-list=public log=no log-prefix="" 

38 XI  ;;; spammers protection
      chain=spammers action=return connection-limit=!5,32 log=no log-prefix="" 

39 XI  ;;; spammers protection
      chain=spammers action=add-src-to-address-list address-list=spammers address-list-timeout=1d log=yes log-prefix="" 

40    ;;; Rule for special client
      chain=client-rules action=drop protocol=tcp src-address=192.168.82.240 dst-port=23 log=no log-prefix="" 

41    ;;; Rule for special client
      chain=client-rules action=drop protocol=tcp src-address=192.168.82.240 dst-port=2323 log=no log-prefix="" 

42    ;;; Rule for special client
      chain=client-rules action=drop protocol=tcp src-address=192.168.80.20 dst-port=23 log=no log-prefix="" 

43    ;;; Rule for special client
      chain=client-rules action=drop protocol=tcp src-address=192.168.80.20 dst-port=2323 log=no log-prefix="" 

44    ;;; Rule for special client
      chain=client-rules action=drop protocol=tcp src-address=192.168.80.160 dst-port=2323 log=no log-prefix="" 

45    ;;; Rule for special client
      chain=client-rules action=drop protocol=tcp src-address=192.168.80.160 dst-port=23 log=no log-prefix="" 

 
jarda
Forum Guru
Forum Guru
Posts: 7603
Joined: Mon Oct 22, 2012 4:46 pm

Re: CCR1036-8G-2S+ - 100% cpu usage

Mon Oct 31, 2016 11:43 pm

You should not blindly apply any set of rules you have seen written by anyone many years ago. You have to think and make only those rules you really need. Also use optimisations that are available in newer versions of ros...(fasttrack, raw table, multistate rules). See the statistics and rethink the order in order not to check silly rules in the beginning uselessly each time and so on...
 
miq
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 63
Joined: Fri Nov 06, 2009 3:18 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Tue Nov 01, 2016 1:41 pm

You should not blindly apply any set of rules you have seen written by anyone many years ago. You have to think and make only those rules you really need. Also use optimisations that are available in newer versions of ros...(fasttrack, raw table, multistate rules). See the statistics and rethink the order in order not to check silly rules in the beginning uselessly each time and so on...
There are only needed rules. Only one chain copied from manual it's antiddos rule, and it's placed in right position in my opinion. Except rules which filter dos attack from and to my clients all rules shows expected stats. If you see some big mistake in this rules, and it's generate the problem please show me where i'm thinking wrong.
 
jarda
Forum Guru
Forum Guru
Posts: 7603
Joined: Mon Oct 22, 2012 4:46 pm

Re: CCR1036-8G-2S+ - 100% cpu usage

Tue Nov 01, 2016 1:53 pm

You have to check the rule processing statistics first. Dropping invalids and allowing pings before accepting established, related at one step could be significant. Dropping in raw can help too. And fasttrack makes miracles.
 
User avatar
Chupaka
Forum Guru
Forum Guru
Posts: 8309
Joined: Mon Jun 19, 2006 11:15 pm
Location: Minsk, Belarus
Contact:

Re: CCR1036-8G-2S+ - 100% cpu usage

Tue Nov 01, 2016 2:59 pm

I'd also check
/ip fi connection tracking print
looking at total-entries and max-entries

if there are many entries you probably should move ddos detection to RAW firewall table
Russian-speaking forum: https://forum.mikrotik.by/. Welcome!

For every complex problem, there is a solution that is simple, neat, and wrong.

MikroTik. Your life. Your routing.
 
miq
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 63
Joined: Fri Nov 06, 2009 3:18 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Tue Nov 01, 2016 6:36 pm

You have to check the rule processing statistics first. Dropping invalids and allowing pings before accepting established, related at one step could be significant. Dropping in raw can help too. And fasttrack makes miracles.
I haven't problem with input connections but with forward. And I found something about fasttrack: "The idea is to "fasttrack" some specific machine without slowing it's traffic for processing. Let's say you have a network of users, you have firewall and queues for them. But then you have a VIP customer (or your own PC) that you will not filter or slow down, and you want the best available speed for it. This is the situation for fasttrack." (http://forum.mikrotik.com/viewtopic.php?t=96302#p479816). So I think fasttrack isn't for me.
 
miq
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 63
Joined: Fri Nov 06, 2009 3:18 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Tue Nov 01, 2016 7:12 pm

I'd also check
/ip fi connection tracking print
looking at total-entries and max-entries

if there are many entries you probably should move ddos detection to RAW firewall table

Good idea (raw table) , and I will do it. But as I wrote when problem occurs in one second console and winbox become unavailable. And I think connections number it's not a problem, please look at attachments. Today somethings happen right before 2 pm. I see disconnected client in log, and holes in graph. And right after in logfile happens information about block ddos - up to 5 pm. And only at beginning something wrong was with router - after first shock it can handle all connections without any problems.
conntrack.png
mem.png
cpu.png
And at least - when I catch problem "on live" conntrack never was too big. That's why I'm looking at order of rules in antiddos chains.
You do not have the required permissions to view the files attached to this post.
 
miq
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 63
Joined: Fri Nov 06, 2009 3:18 am

Re: CCR1036-8G-2S+ - 100% cpu usage

Wed Nov 02, 2016 10:14 pm

I'd also check
/ip fi connection tracking print
Chupaka, right after my last post I move rule:
chain=forward action=drop connection-state=new src-address-list=ddoser dst-address-list=ddosed log=no log-prefix="" 
before address list checking. And I did the same with connlimits chain. From this time I had 10 connlimit and 11 ddos actions - maybe it's coincident, or attacks wasn't so big, but I haven't any problems with router. As I say before I think there is some internal problem with address list and lot of connections in a short time.

Some packets stats (only with syn flag) after my changes:
connlimit - drop actions - 232k
connlimit - add to block list - 11
antiddos - drop actions - 5k
atiddos - add to doser/dosed - 6.

So I think it's better to permanent block some host, and check it once a time than checking it with any packed. It shouldn't be a huge problem, but as I say I think there is some performance problem with address list.

Who is online

Users browsing this forum: No registered users and 77 guests