Community discussions

MikroTik App
 
User avatar
inteq
Member Candidate
Member Candidate
Topic Author
Posts: 281
Joined: Wed Feb 25, 2015 8:15 pm
Location: Romania

Dude causing massive packet loss/disruption of service

Fri Jun 11, 2021 4:37 pm

Hello,

Had some issues with a lot of routerboards causing internet service disruption/massive packet loss.
Randomly, the router would not be accessible for 10-30 seconds. No interface flopping logged.
Even weirder, a subnet behind the router will also lose connectivity when this happens.
A netwatch on the router pinging the ISP gateway every second logs problems at the exact time I am unable to access the location remotely.
I have tried several RB1100AHx4 The Dude and finally moved to CHR on a Supermicro server with Intel Xeon, ECC RAM and Intel 10Gb/s nics, inside a VM.
At first, I thought my ISP has issues, seeing so many Mikrotik routers and even CHR behaved the same.
For a time I just decided to not pay attention to this problem, until I decided to disable The Dude.
Lo and behold, the packet loss and service disruptions stopped.
Kept the Dude disabled for one week=no problems at all.
Enabled the Dude again for one week=several issues a day for the whole week.
Disabled the Dude again=no problems at all.
Now it seems very clear to me that the Dude is the cause of all this issues I am having, but at the same time, very few people are reporting this issue. Only one other user to be precise.
I have tried to limit the number of monitored devices, increased the pooling time, monitor only icmp without any other services, all without success.
The Dude database size is 8 Mb so very small.

I am thinking, if this is a Dude issue, a lot more topics would show up on a search, but then again, I cannot see any other possibility besides maybe the ISP seeing a lot of icmp at once, considering it a threat and disabling the connection temporary (which they do not admit to)

Anyone else is having massive packet loss while using the Dude?
 
KayBur
just joined
Posts: 18
Joined: Thu Apr 29, 2021 3:33 pm
Location: Springfield

Re: Dude causing massive packet loss/disruption of service

Mon Jun 14, 2021 12:55 pm

Maybe you need to update the dude. A very strange problem. Have you noticed the dude freezes before losing data packets?
 
loloski
Frequent Visitor
Frequent Visitor
Posts: 71
Joined: Mon Mar 15, 2021 9:10 pm

Re: Dude causing massive packet loss/disruption of service

Tue Jun 15, 2021 1:58 am

Hello,

Had some issues with a lot of routerboards causing internet service disruption/massive packet loss.
Randomly, the router would not be accessible for 10-30 seconds. No interface flopping logged.
Even weirder, a subnet behind the router will also lose connectivity when this happens.
A netwatch on the router pinging the ISP gateway every second logs problems at the exact time I am unable to access the location remotely.
I have tried several RB1100AHx4 The Dude and finally moved to CHR on a Supermicro server with Intel Xeon, ECC RAM and Intel 10Gb/s nics, inside a VM.
At first, I thought my ISP has issues, seeing so many Mikrotik routers and even CHR behaved the same.
For a time I just decided to not pay attention to this problem, until I decided to disable The Dude.
Lo and behold, the packet loss and service disruptions stopped.
Kept the Dude disabled for one week=no problems at all.
Enabled the Dude again for one week=several issues a day for the whole week.
Disabled the Dude again=no problems at all.
Now it seems very clear to me that the Dude is the cause of all this issues I am having, but at the same time, very few people are reporting this issue. Only one other user to be precise.
I have tried to limit the number of monitored devices, increased the pooling time, monitor only icmp without any other services, all without success.
The Dude database size is 8 Mb so very small.

I am thinking, if this is a Dude issue, a lot more topics would show up on a search, but then again, I cannot see any other possibility besides maybe the ISP seeing a lot of icmp at once, considering it a threat and disabling the connection temporary (which they do not admit to)

Anyone else is having massive packet loss while using the Dude?
I have less than 15 devices monitored with the dude all of them through SNMP with few gigs of internet connection and haven't seen so far the issues you have observed, though i hate to admit sometimes the dude trigger false positive that my distribution switch towards OLT was down most likely to miss poll on SNMP but no real issues was observed. my advice is if you suspected that this could be your upstream issue is to deploy cacti/zabbix/incinga or any other NMS that you are comfortable with just for a quick comparison
 
eddieb
Member Candidate
Member Candidate
Posts: 224
Joined: Thu Aug 28, 2014 10:53 am
Location: Netherlands

Re: Dude causing massive packet loss/disruption of service

Tue Jun 15, 2021 9:43 am

as Dude just polls devices and does not change anything, it is very unlikelikely that Dude monitoring is the real problem.
Devices "doing strange things" because they are polled might cause problems and the root cause must be found there.
To start send all you syslog to a syslog server and try to analyse what happens.
perhaps an STP issue or something else ?

Is your dude running on a dedicated monitor device or on the same device as you use as router/gateway ?

I have been running dude on a dedicated monitor device (RB750Gr3 now) for a couple of years now and did not see problems like yours.
Most hickups in my network are caused by SNMP traffic not arriving caused by some UDP congestion ...
Running 6.48.3 (stable) on :
CCR1009-8G-1S (2x ipsec/l2tp site-to-site, ipsec/l2tp roadwarrior, dhcpd, dns), CRS125-24G-1S, RB1100, RB962UiGS-5HacT2HnT (10pc), RB931-2nD, RB951, RB750GL ,RB2011UAS-RM, PWR-LINE-AP, RBwAPGR-5HacD2HnD, RB750Gr3 running dude
 
User avatar
inteq
Member Candidate
Member Candidate
Topic Author
Posts: 281
Joined: Wed Feb 25, 2015 8:15 pm
Location: Romania

Re: Dude causing massive packet loss/disruption of service

Tue Jun 15, 2021 10:49 am

The Dude is running on the router. Physical, VM, makes no difference.
 
eddieb
Member Candidate
Member Candidate
Posts: 224
Joined: Thu Aug 28, 2014 10:53 am
Location: Netherlands

Re: Dude causing massive packet loss/disruption of service

Tue Jun 15, 2021 12:08 pm

did you look at the resource use on that device ?

btw, I never run 2 totaly different functions on 1 device if that device is critical for production ...
a router is critical, a monitor should run on a different device .
Running 6.48.3 (stable) on :
CCR1009-8G-1S (2x ipsec/l2tp site-to-site, ipsec/l2tp roadwarrior, dhcpd, dns), CRS125-24G-1S, RB1100, RB962UiGS-5HacT2HnT (10pc), RB931-2nD, RB951, RB750GL ,RB2011UAS-RM, PWR-LINE-AP, RBwAPGR-5HacD2HnD, RB750Gr3 running dude
 
User avatar
inteq
Member Candidate
Member Candidate
Topic Author
Posts: 281
Joined: Wed Feb 25, 2015 8:15 pm
Location: Romania

Re: Dude causing massive packet loss/disruption of service

Tue Jun 15, 2021 1:09 pm

Yes, CPU and RAM barely used.
No comment regarding the "I never run 2 totaly different functions on 1 device"
 
eddieb
Member Candidate
Member Candidate
Posts: 224
Joined: Thu Aug 28, 2014 10:53 am
Location: Netherlands

Re: Dude causing massive packet loss/disruption of service

Tue Jun 15, 2021 1:23 pm

In that case, create a supout.rif direct after such a disruption and file a ticket to support@mikrotik.com ...
Running 6.48.3 (stable) on :
CCR1009-8G-1S (2x ipsec/l2tp site-to-site, ipsec/l2tp roadwarrior, dhcpd, dns), CRS125-24G-1S, RB1100, RB962UiGS-5HacT2HnT (10pc), RB931-2nD, RB951, RB750GL ,RB2011UAS-RM, PWR-LINE-AP, RBwAPGR-5HacD2HnD, RB750Gr3 running dude
 
KayBur
just joined
Posts: 18
Joined: Thu Apr 29, 2021 3:33 pm
Location: Springfield

Re: Dude causing massive packet loss/disruption of service

Thu Jun 24, 2021 12:59 pm

did you look at the resource use on that device ?

btw, I never run 2 totaly different functions on 1 device if that device is critical for production ...
a router is critical, a monitor should run on a different device .
By the way, a good point. Maybe, after all, the computer does not have enough power to handle all the processes. You need to either close unnecessary processes, or divide them into a couple of devices.

Who is online

Users browsing this forum: No registered users and 6 guests