SNMP issues on some routers

Hi,

I have various RB2011s, Metal 5SHPns, NetMetals, and CRS125s. Of all of my devices with nearly identical configurations (deployed using Ansible, so user error chance is at least lower), I have one metal and two RB2011s that wont respond to SNMP – I receive timeouts. Torch on those devices shows the packets received and no packets sent out. The remote machine can in fact ssh to the routers, however. I can also do SNMP to devices around the three that do not work.

Here’s an export from the Metal:

[ryan_turner@sec3.hil] > /snmp export
# dec/07/2015 22:36:58 by RouterOS 6.33.1
# software id = LTVD-TT50
#
/snmp community
set [ find default=yes ] addresses=44.34.128.0/21 name=hamwan
/snmp
set contact="#HamWAN on irc.freenode.org" enabled=yes
[ryan_turner@sec3.hil] > /ip firewall export
# dec/07/2015 22:37:44 by RouterOS 6.33.1
# software id = LTVD-TT50
#
/ip firewall mangle
add action=change-mss chain=output new-mss=1378 protocol=tcp tcp-flags=syn \
    tcp-mss=!0-1378
add action=change-mss chain=forward new-mss=1378 protocol=tcp tcp-flags=syn \
    tcp-mss=!0-1378

And here’s the timeout from the remote:

root@monitor:/var/log/prometheus# snmpwalk -v1 -chamwan sec3.hil.memhamwan.net 1.3.6.1.4.1.14988.1.1
Timeout: No Response from sec3.hil.memhamwan.net

Corresponding sniff:

[ryan_turner@sec3.hil] > /tool sniffer quick interface=ether1-local port=snmp
INTERFACE             TIME    NUM DI SRC-MAC           DST-MAC           VLAN
ether1-local         1.897      1 <- 4C:5E:0C:89:7E:BF 00:0C:42:6E:6C:1E
ether1-local         2.912      2 <- 4C:5E:0C:89:7E:BF 00:0C:42:6E:6C:1E
ether1-local         3.901      3 <- 4C:5E:0C:89:7E:BF 00:0C:42:6E:6C:1E
ether1-local         4.904      4 <- 4C:5E:0C:89:7E:BF 00:0C:42:6E:6C:1E
ether1-local         5.911      5 <- 4C:5E:0C:89:7E:BF 00:0C:42:6E:6C:1E
ether1-local         6.908      6 <- 4C:5E:0C:89:7E:BF 00:0C:42:6E:6C:1E
[ryan_turner@sec3.hil] > /interface ethernet print
Flags: X - disabled, R - running, S - slave
 #    NAME        MTU MAC-ADDRESS       ARP        MASTER-PORT      SWITCH
 0 R  ether1...  1500 00:0C:42:6E:6C:1E enabled    none             switch1

So… what gives? I’m stumped as to why this isn’t working.

Either there is some firewall rule blocking the traffic, or the source address is not in the specified network for the snmp community.

It’s definitely in the address

root@monitor:/etc/grafana# ifconfig
eth0 Link encap:Ethernet HWaddr 00:0c:29:07:fb:1f
inet addr:44.34.128.171 Bcast:44.34.128.191 Mask:255.255.255.224

In my previous post I showed the firewall settings as well as sniffer traffic showing in fact traffic was being received. Even if there was a firewall in between filtering traffic going from sec3.hil to monitor, the local sniffer on sec3.hil should’ve seen packets being sent out.

Still stumped by this… anywhere else to check? Any other services I should try?

When you did the sniff, did you also examine one of the packets to see what the IP source address was?
Your posting only shows src/dst MAC addresses.

Basically, one of three things is happening:
The Mikrotik is discarding the SNMP messages on their way up the IP stack during ingress
The Mikrotik’s SNMP service is ignoring the requests / failing to process them
The SNMP replies are being discarded on their way down the IP stack during egress (or are being mis-routed due to an IP routing issue)

Are you trying to do control plane isolation with VRFs on these devices? I’ve had problems trying this when the control plane is on any other than the default VRF.

Here’s a full sniff with the IPs; they match what they’re supposed to be:

[ryan_turner@sec3.hil] > /tool sniffer quick interface=ether1-local port=snmp
INTERFACE                                                                             TIME    NUM DIR SRC-MAC           DST-MAC           VLAN   SRC-ADDRESS                         DST-ADDRESS                         PROTOCOL   SIZE CPU
ether1-local                                                                         0.821      1 <-  4C:5E:0C:89:7E:BF 00:0C:42:6E:6C:1E        44.34.128.171:38138                 44.34.128.101:161 (snmp)            ip:udp       86   0
ether1-local                                                                         1.824      2 <-  4C:5E:0C:89:7E:BF 00:0C:42:6E:6C:1E        44.34.128.171:38138                 44.34.128.101:161 (snmp)            ip:udp       86   0
ether1-local                                                                         2.829      3 <-  4C:5E:0C:89:7E:BF 00:0C:42:6E:6C:1E        44.34.128.171:38138                 44.34.128.101:161 (snmp)            ip:udp       86   0

[ryan_turner@sec3.hil] > /ip address print
Flags: X - disabled, I - invalid, D - dynamic
 #   ADDRESS            NETWORK         INTERFACE
 0   44.34.128.101/28   44.34.128.96    ether1-local
 1   44.34.128.145/28   44.34.128.144   wlan1-gateway
 2   44.34.128.102/28   44.34.128.96    vrrp1

Not doing anything with VRF:

[ryan_turner@sec3.hil] > /ip route vrf export
# dec/08/2015 18:08:03 by RouterOS 6.33.1
# software id = LTVD-TT50
#

Please note that when you filter packets in a firewall rule, they still appear in the packet sniff!
So the appearance in that output does not indicate your firewall must be OK.

OK, I understand that, but in my first post I showed the /ip firewall export; there aren’t any rules except two to change-mss.

Strange problem, I can’t reproduce it here on any MikroTik equipment (including RB2011 and some access points)

Any other advice or things to look in to? I guess i’ll have to make duplicates of these pieces and then go to the field and swap them out…

In other threads about mysterious problems I have sometimes read the advice to export the configuration, then reset the
router to defaults with initial config from that file (on the local router flash).
Not that I would dare to do that on a remote router without experience… :slight_smile:

Yeah one is easy to get to but the other is only accessible every few months. I’ll just have to buy spares and swap them out. Very disappointed that this has happened.

Maybe you can try that method on the accessible one and when it succeeds and fixes the problem try it on the other one.
In principle, it can be done from remote. The only problem of course being that it is down when something goes wrong,
and right now it probably works but just cannot be monitored.

In the meantime I have done some of those configuration resets on remote routers, and while they proceeded
without problem, they never solved my (different) issues. I.e. after the reset the situation was exactly the same
as before. Which is what you would expect, of course.