Dissapearing routers

I keep having routers drop off the face of the earth… well, not really, but both Nagios & The Dude start reporting them as down/unreachable.

This demo is with a Cisco 3560 and a RB532A, though it happens randomly on a wide variety of hardware (333,433,411,133,532A) and software (3.0, 3.30, 4.11, 4.16, 4.17).

From the switch/router/thingie (cisco 3560):

sylwt-sw1#sh ip arp | inc a895
Internet  172.17.87.74            7   0015.6d63.a895  ARPA   Vlan543
sylwt-sw1#ping 172.17.87.74

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.17.87.74, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

From the MT unit (I used another unit to get into it)

[admin@LFK_AP2_AP3] > /ip arp pr
Flags: X - disabled, I - invalid, H - DHCP, D - dynamic 
 #   ADDRESS         MAC-ADDRESS       INTERFACE                             
 0 D 172.17.87.72    00:15:6D:63:9B:F7 vlan543                               

[admin@LFK_AP2_AP3] > /ip route pr
Flags: X - disabled, A - active, D - dynamic, 
C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme, 
B - blackhole, U - unreachable, P - prohibit 
 #      DST-ADDRESS        PREF-SRC        GATEWAY            DISTANCE
 0 A S  0.0.0.0/0                          172.17.87.65       1       
 1 ADC  172.17.87.64/27    172.17.87.74    vlan543            0

Now, let’s ping the network:

[admin@LFK_AP2_AP3] > /ping 172.17.87.64
172.17.87.64 ping timeout
172.17.87.73 64 byte ping: ttl=255 time=1 ms
172.17.87.77 64 byte ping: ttl=255 time=2 ms
172.17.87.68 64 byte ping: ttl=255 time=4 ms

I get responses from 3 of the 4 switches (all of these are 2960) on this VLAN, but none from the gateway (and oddly enough, no response from any of the 9 other MT routers).

Let’s ping the gateway directly:

[admin@LFK_AP2_AP3] > ping 172.17.87.65
172.17.87.65 64 byte ping: ttl=255 time=3 ms
172.17.87.65 64 byte ping: ttl=255 time=2 ms

Yay! It works! My network is NOT broken. Let’s ping it again!

[admin@LFK_AP2_AP3] /ip arp> /ping 172.17.87.64
172.17.87.73 64 byte ping: ttl=255 time=3 ms
172.17.87.77 64 byte ping: ttl=255 time=3 ms
172.17.87.65 64 byte ping: ttl=255 time=4 ms
172.17.87.68 64 byte ping: ttl=255 time=4 ms

Ok, now we get a response from all 4 cisco switches on the VLAN…

Anyone know what’s up with this?

Thanks!

-T

BUMP

I’m surprised nobody else has seen this before.