We are migrating our clients from linux router (dhcp + iptables + iproute + ipmark and fiew other strange names ) to CCR1036-8G-2S+ with pppoe. At this moment about 200 clients are migrated to MT, about 200 Mbps flow during MT.
We create profile for each client:
/ppp profile add name=10.10.84.150 local-address=10.10.87.250 remote-address=network-pppoe idle-timeout=5m use-mpls=default use-compression=default use-encryption=default only-one=yes change-tcp-mss=default use-upnp=default rate-limit=1126k/16896k dns-server=22.214.171.124,126.96.36.199 /ppp secret add name=amakh1 service=pppoe password=amakh1 profile="10.10.84.150" remote-address=10.10.84.150 comment=10.10.84.150
In firewall it's 55 filter rules(access, connlimit and ddos trap) , 54 nat rules (src-dst nat for public address). No mangle/L7 other rules.
At this moment some of our clients has problems with infected cameras/DVR/other stuff - they can produce milions connections in the some time. On linux router we have fiew connlimit and recent modules, and it's enough - router only log information about blocked connections.
Yesterday we have this situation with clients connected to CCR1036-8G-2S+ via PPPOE. In one second cpu usage grown to 100%, and after minute router was rebooted by watchdog. It happens 8 times during 10 hours. During this situation console and winbox was unavailable. Once I could see 80% usage of firewall in /tool profile.
Because I knowed about infected things I create two traps. Connlimit:
chain=forward action=jump jump-target=block-connection src-address=10.10.80.0/21 log=no log-prefix="" chain=forward action=drop src-address-list=connlimit log=no log-prefix="" chain=block-connection action=return connection-limit=!2000,32 log=no log-prefix="" chain=block-connection action=add-src-to-address-list address-list=connlimit address-list-timeout=10m log=yes log-prefix=""
chain=forward action=jump jump-target=block-ddos connection-state=new log=no log-prefix="" chain=forward action=drop connection-state=new src-address-list=ddoser dst-address-list=ddosed log=no log-prefix="" chain=block-ddos action=return dst-limit=100,100,src-and-dst-addresses/10s log=no log-prefix="" chain=block-ddos action=add-dst-to-address-list address-list=ddosed address-list-timeout=10m log=yes log-prefix="" chain=block-ddos action=add-src-to-address-list address-list=ddoser address-list-timeout=10m log=yes log-prefix=""
And this rules works great. But not helps when 10 client make thousends connections - Miktotik is going to the ground.
At first I check routeros version - it was old (about 6.33.rc.4) - so I upgrade to lastest stable. It helps, but only a partial - CPU still growing to 80 - 99 %. but router it's still alive .
Second - in all profiles I set change-tcp-mss to no - and it looks like good thing, because router working 14 hours without reboot and big cpu usage.
I know simple queues it's not so good idea, but I read in routeros 6 simple queue works better. And with this router 200 queues shouldn't be a problem. So what's happens? What can I do to eliminate this problem? Linux router can handle this problem (attackers) with 1000 users and 1 Gbps, and why MIkrotik only with 200 users and 200 Mbps don't? Maybe it's not because of connactions, but some other reason?
I can't replace simple queues with HTB in short time, because MT it's integrated with our customer panel.
THX for any ideas.