Since the last upgrade to v6.47, my Dude client crashes on a number of devices when I mouse over them in the Map view (charts start to appear and then client disconnects and starts reconnecting)
The same thing happens with when I try to view those device’s details / settings screen – Other devices are clean and work normally.
It appears to be related to some of the device’s recorded chart data, because if I go into the “services” view, so no data is enumerated, and remove all the services on the device, then rediscover the services, that device will be fixed and work fine (at least for a bit).
I tried taking a Dude backup and using the sqlite3 DB’s “obj” INSERT statements on a clean, new Dude DB (as described in https://wiki.mikrotik.com/wiki/Manual:The_Dude_v6/Malformed_db_repair) to remove all of the historical chart data, but the problem returned shortly after, even with all the historical data removed and ‘fresh’ devices created in the DB.
I’ve also tried playing with the rollup values (Raw keep time / 10 min / 2hr / 1 day rollup times) but it doesn’t seem to have had any effect.
I have used the Dude for about 4 or 5 years now, and used to have a lot of historical data and a somewhat developed configuration for my environment. But given the recent problems I’m considering finding other monitoring solutions. I already trashed the old backup, so historical data doesn’t matter now, but I’d really like The Dude client to stop crashing, and to be able to continue using it normally, again.
Reported as well. Hope they will look into it. Current status simply means I can’t use it at all and I don’t know whether I should start looking for something else or not. Even with small bugs and lack of development, TheDude was much friendlier monitoring system than any other which I tried.
edit: According to support reply [SUP-20571], they were able to reproduce it (not surprising ) and they will look into it.
I believe I narrowed the problem down to certain Data Sources. I have a lot of SNMP data sources and I think the recorded data gets botched sometimes, is probably the cause.
I found a workaround, since I have autodiscover profiles covering 99% of the service, I can remove all of the services from the Services view and let them be autopopulated when discovery runs.
I added the remaining services back in by hand.
So far the clean slate seems to be working. I suspect it’s a matter of time until the crashes start again. I’ll try and see if I can narrow it down to a particular service, then. It definitely seems to be SNMP or more granular data, rather than the latency times.
So, the problem came back an hour or two later. But it wasn’t consistently reproducible, and just random / sporadic. And then some devices disappeared from the map. And some images, too. And then it seems like I started to hit every other bug and edge case that I’ve ever read about.
Deleting every service and adding them back in seems to work for a little while.
It does seem to have something to do with data from SNMP.
RouterOS devices always seem to break first in the Dude, and then high latency devices, and then others. Windows less often Linux or RouterOS, but they will still do it, too.
We are having same issue, roll back to 6.46.3 and the issue goes away, anything above 46.3 has this issue. I have been fighting this for 2 weeks, we are just using the older dude that cant read routers…
DUDE was broken in 46.4 when they changed DUDE stuff, no longer functions properly
I have the same problem as the OP. It exist from 6.47 and up to 6.47.2. Dude Client returns variety of errors:
connection closed
operation failed: 10053
operation failed: 10054
After downgrade to 6.46.6 again works like it should. I also narrowed it to drawing charts/service-history from old SNMP datasource. I hope it will be fixed soon.
After much research and testing it almost exclusively seems to be related to cpu/memory/disk/virt SNMP data sources, and possibly RouterOS ones, too.
I rolled back the Dude server to LTS (6.45.9), and had to switch all the RouterOS-integrated devices to SNMP, and anything monitored by remote agents to local dude server, and things seem to be OK now.
Would really like to see this addressed and some attention given to The Dude in an upcoming release…
We hope that we have managed to resolve this problem. Unfortunately, the upgrade is required not only on the server but also on all of the monitored devices.
All of the upcoming RouterOS public releases will contain this fix.
Best regards
Mārtiņš S.
I have asked, what is meant by upgrading all monitored devices (I have many monitored linux servers via SNMP which causes crashes too) and got this reply:
Hello
Yes, all of the RouterOS powered devices must be upgraded - Dude server and all of the monitored devices that are running RouterOS.
Best regards
Mārtiņš S.
Does not make much sense, why upgrade all RouterOS boxes, while the problem is obvoiusly on the Dude side, but okay.
I didn’t ask about this detail, but I assume, they are just pointing out the fact, that any TheDude Agent (any RouterOS device) must be same version as TheDude Server.
That is known requirement for TheDude Agents, but fortunately, if you don’t want to use Agents, you don’t need to upgrade monitored devices.
I am pretty sure that this is just wrong expression when they said “all of the monitored devices that are running RouterOS”, because TheDude client is crashing even if I look at non-routerOS device (therefore nothing to do with monitored device itself)
Obviously, once the update is released, I will test it and refer back what is the reality