CCR BUG: All ports flap simultaneosly with mangle route mark

I use peerapp and cachemara transparent caching solutions. One is basically a squid and the other is more like an IDS.
I have 4 sites running theses caches, all with CCR 1016. Recently I upgraded a RB1100ahx2 from one site (working fine) to CCR and got this too.

I setup firewall-mangle rules to route mark all tcp 80 and >1024 to a default route to the cache server.
The problem is, all running interfaces flap at the same time (go to down state). After a few seconds they come up again. Some take a little bit more time. It happens on irregular intervals, the more traffic the more often. Usually once an hour with 300-500 megabits. No other errors appear on the log and there are no drops or errors on the interfaces. The CPU is low, about 2-5%.

I am being unable to use my cache servers due to this bug. I will try to upgrade one site to a 1036 and see if it still happens.

Today, I tried disabling autonegotiation on my cache server interface (mikrotik side), after some time the interface got down and traffic stopped going to the cache and went to the gateway (check ping on the cache route), although no redirection was being made the CCR still flapped. If I disable the mangle rules the CCR stops flapping.
I was using 6.1 and am using 6.2 ROS, same problem. Happens on both cache servers (peerapp and cachemara).

Cachemara Site M:
4 chain=prerouting action=mark-routing new-routing-mark=cache passthrough=yes
protocol=tcp src-address-list=cache in-interface=!ether12
dst-port=80,2710,3310,5869,6500,6969

5 chain=prerouting action=mark-routing new-routing-mark=cache passthrough=yes
protocol=tcp dst-address-list=cache in-interface=!ether12
src-port=80,2710,3310,5869,6500,6969


Peerapp Site T:
4 X ;;; peer out
chain=prerouting action=mark-routing new-routing-mark=cache
passthrough=yes protocol=tcp src-address-list=cache in-interface=!ether18
dst-port=80,1024-65535

5 X ;;; peer in
chain=prerouting action=mark-routing new-routing-mark=cache
passthrough=yes protocol=tcp dst-address-list=cache in-interface=!ether18
src-port=80,1024-65535

Tried to underclock the CCR. Tried to use address lists and fixed src/dst. Changed interfaces, changed caching servers. Nothing works.

Please help, I dont know what to do anymore.

Hi.

I have the same issue with a CCR1016-12G. I have about 200-300 mbps and suddenly all traffic on all interface is down a few seconds and return normally.

I have a Webproxy on ether12 and some APs, servers, routers on anothers ports.

I use routing mark to redirect the http traffic to the proxy server.

I try with 6.0, 6.1, 6.2 and I have the same issue. With the previous rb1100ahx2 work fine.

I blamed to proxy server but if you have the same issue with CCR, could be something there.

M.

Are you storing the proxy cache in the NAND, a microSD, or an USB disk? Have you tried the latest v6.3?

Seems to me they both have external proxy servers.

They are external proxys

M.

I am having random port flaps too. All interfaces except the one connected to my main internet connection. Interestingly there is almost exactly 7 days in between flaps. I had some mangle rules that were marking packets to see netflix and youtube traffic. I have deleted those in order to see if my ports no longer flap.

7 days later and only one port flapped instead of all of them. The port that flapped is directly connected to my monitoring computer.

Same problem with version 6.4, Three interfaces active all of them go down at the same time and then after 1-3 seconds come up again.
The problem starts when the traffic goes over 400Mbit/s, it happens at non regular intervals
In the ccr router is running BGP, Mangle and NAT

Please not that in version 6.1 i did not have this problem. It started after upgrading to version 6.4

also the same problem here! I have a CCR1016-12G and router OS v6.4. It starts flapping today.
Preventative I disabled my three mangle rules and rebooted the device. I will monitor the behavior now and hope this will not happen again.

EDIT: ok, deaktivating my mangle rules havn’t helped! now i will disable the snmp service.


Best regards,
Patrick

since I disabled the snmp service, interface flapping has gone.
I will inform you, if something happens again.

best regards,
Patrick

I’m still having random single port flaps on one of the ports. I will try to disable SNMP as well and see what happens.

I am seeing the same thing. Not caught it enough to do a support file and send off. It happens randomly. My CCR is pushing close to 400 megs on peaks. I have disabled SNMP as well to see if we have an further complications. I updated to 6.4 last night.

Same here. As soon as we cross 300mbit, it starts to flap non stop. Does not happen on x86 platform. Using 6.6 and it is worse than 6.5.

Hi All,

We have been replacing all our 1100x2’s with CCR’s… same issue on a few CCR’s… random ALL PORTS FLAP. We replaced the flapping CCR’s with new CCR’s…same issue. I’m sorry to say but I’m losing my faith in MT… This issue needs to get fixed. We are running 6.6 on all our CCR’s. Nothing has changed from 1100x2 to CCR deployment…except ROS version, 1100x2 was on 5.28.

v6.7 should improve this situation, please upgrade and report back, thanks!

Ok.. We will try.. Thankyou.

I can definitely confirm that v6.7 fixed port flapping for a number of our customer’s CCR1016’s :slight_smile:

Yes… we confirm too.

we are up to 6.11 and still have flapping issues on our 1016-12g. port 9 is to a cisco switch (gigabit) port 6 is my laptop(gigabit) I monitor things with. this CCR has almost no traffic moving over it. it has a 100mb internet connection and only 2 computers on it at the moment doing web browsing.

ccr1016-12S-1S+
S-RJ01 - 5 pieces
mangle rules are 60 pieces
rules nat 27
1 pppoe client connection
9 l2tp client connections
Versions 6.45.8, 6.46.5, 6.47beta60.
ALL interfaces fall and after 1-2 seconds rise.
2 to 6 times per day
CPU utilization 1-2%

6 months in no way get rid of interface flapping.
As a test, I transferred the configuration to CRS236-24G-2S +.
The processor was heavily loaded, but no ports fell.
I bought a new CCR1009-7G-1C-1S +.
Exactly the same problem.
Turned in support of Mikrotik. The ticket is still open. There is no solution or support.
Please, help.

https://prnt.sc/s5p0ap
In addition to this connector, all routers see incorrectly - LC. And it should be RJ45.

https://prnt.sc/s5p5bq
For example, 2011 sees s-rj01 correctly and works fine