When a remote device is down, Dude keeps polling this device to the point where near packet storms are present on our Trango APs. Perhaps some work is needed in the polling algorithm to avoid this issue? For example, when things are running normally, there is a max of up to about 20 packets per second of SNMP traffic on a particular outgoing interface at one of our tower locations. When a remote device connected to one of the four access points on this interface goes down, then the polling never gives up, but seems to get more aggressive to the point of over 2000 packets per second going out the interface to a device which is off line. This behavior often causes my Trango APs to lock up. Any suggestions?
What type of queries are you running to the AP? Which AP is it (5830/2500/900)? Dude shouldn’t be able to lockup an AP. I’d let David in Trango Tech Support know about the AP lockup with SNMP. There are some polling knobs to tweak in the latest V3 Beta that I had asked for.
However, your question about Dude should be addressed. I even see SNMP queries sent to devices that don’t even have any SNMP probes (v2.2)
I have spoken to David about this. His opinion is that we need to identify and fix the source of “packet storms” in our network. The symptom in the Trango AP is the “pipe” filling up with undeliverable packets. This is IMHO something that an AP should not allow to happen, but nevertheless…
The queries to the AP, in this case only pings, are not the issue. The problem occurs when another device, which is accessible THROUGH the Trango AP, goes down. At this point, Dude is polling the device, but of course the packets can’t be delivered. Those packets wind up clogging the Trango AP “pipe”. Monitoring the outbound interface to which these APs are connected shows the high packet rates, with Dude as the source and the down device as the destination.
It’s difficult to understand the behavior of either Dude or Trango APs in this situation. Why would the AP allow it’s buffer to fill with undeliverable packets to the point of near or even complete lockup? Why would Dude attempt to poll an unreachable device at such a high rate? Some combination of these two seem to cause a runaway situation. Probably the answer is that both need some “code tweaks” to break out of the loop…
I’ve ordered remote reboot devices as a workaround. Have also reduced the impact with queuing of SNMP packets, but it still happens too often. Thanks for your help.
We’ve been running dude w/ a hundred Trango nodes on a thousand node wireless network. We definitely noticed how Dude would and could “take out” Trango Devices. It cost us a bunch in false alarms and truck rolls… To avoid taking our nodes down we have pared Dude back to only icmp echo/ping monitoring. We can’t monitor any other snmp services or it will take the Trango nodes down. Sure as night and day.
It’s interesting to hear that it is a flaw in the Trango AP for accepting non-deliverable SNMP packets. We have “struggled” w/Trango over the years as they have released premature/non-existant product lines/announcements and especially as they went back and forth between outside sales and factory sales. Let alone their week long RMA authorization qualifaction before you can send back a part.
Needless to say we are busy replacing our Trango nodes with faster “less spooky” Mikrotik/RouterBoard Product.
Are you using any Atlas M5010 Backhauls? We are having serious issues with them at the moment when trying to do anything other than ICMP Echos. I can crater a M5010 pretty quickly. The older 5830APs and Tlinks seem to do fine.
We struggled with Trango equipment for the best part of 2 years. After all the problems, firmware fixes that actually made things worse, equipment crashes and advertised ranges that did not even come close; we finally caved in and preformed a $260K overhaul and went to Motorola Canopy.
It was the best choice we ever made.
If Trango could ever get their act together they may at some point in time have a reliable product.
Thanks George. I have been working with Trango for the past couple of weeks giving them access to a couple of Atlas BHs in our lab. Doesn’t take long to lock them up with simple snmpgets without out any data traffic flowing (the units aren’t even associated…opmode off). They can’t repro the problem in house though.