Crazy routing problem.

Hello,
I have three RB750G’s setup to route to each other. For some reason I can’t get to the subnet in the chain of routers until I login to a remote machine and start to ping out to the Internet. Here is a drawing

Internet <—> main layer3 switch <—> RB750G B1 <—> RB750G B2 <—> RB750G B3

Each link is connected with a /30 subnet with dedicated ports.
B1 provides a NAT’d subnet on one of it’s ports.
B2 also provides a NAT’d subnet on one of it’s ports along with a normal public subnet.
B3 is similar to B2 in that is has a NAT’d subnet and a public subnet.

For some reason hosts in the subnet of B3 do not traceroute/ping from the outside world. The traceroute stops at B2! If I connect a machine to B3 and start to ping to the outside world, then the traceroute works!!! Why would B2 care if the machine is pingable, shouldn’t B3 worry about that?

All RB750G’s have master-ports set to ‘none’

I have all the routing information statically configured. No routing protocols. Traceroute seems to show that I don’t have any mistakes in my routing tables. What could be wrong here?

Could you post the following information?

/ip address export
/ip route export
/ip firewall filter export

I removed all the default filter configuration on all the routers. Here is the configuration on the last router (B3). The other routers are configured similarly, except they have two /30 links instead of one.

[admin@comstock] > /ip address export
# jan/03/1970 22:24:52 by RouterOS 5.2
# software id = JXNK-LP90
#
/ip address
add address=xx.xx.60.66/30 disabled=no interface=to-wilshireterrace network=\
    xx.xx.60.64
add address=xx.xx.61.1/25 disabled=no interface=cablenet network=xx.xx.61.0
add address=192.168.88.1/24 disabled=no interface=admin network=192.168.88.0
add address=172.21.1.1/24 disabled=no interface=cablenet network=172.21.1.0
add address=172.21.2.1/24 disabled=no interface=cablenet network=172.21.2.0
[admin@comstock] > /ip route export
# jan/03/1970 22:24:52 by RouterOS 5.2
# software id = JXNK-LP90
#
/ip route
add disabled=no distance=1 dst-address=0.0.0.0/0 gateway=xx.xx.60.65 scope=\
    30 target-scope=10
[admin@comstock] > /ip firewall filter export
# jan/03/1970 22:24:54 by RouterOS 5.2
# software id = JXNK-LP90
#
[admin@comstock] >

What about on B2…could you show me the output of

/ip route print where 38.119.61.0 in dst-address
/ip route export

Also, you said if you start a ping from a host connected off of B3 then the traceroutes into the 38.119.61.0/25 subnet magically start working?

Yes, I also tried to setup multiple IP’s to a single host, but only the main IP was pingable.

I take that back, I had a wrong mask on the additional IP.

It seems kinda flakey, now all the IP’s are pinging. I will reboot the machines and see if they come back and are pinging.

Okay… I confirmed it. I added another IP address to the 61.0/25 subnet and it was not pingable until I ping’d outward. It is like there is some sort of statefull firewall for something. But I have filters turned off. Is there somewhere else I should be looking? This is driving me nuts. I wonder if the other routers are doing the same thing, but since there is a lot of traffic on those routers, the symptom is not showing up. B3 is a new network with almost no traffic coming from it.

Wild guess, and I do not know what would cause it, but check the ARP table before and after outbound ping. Maybe the router for some reason can’t ARP for the destination IP because the ARP request doesn’t get answered. When the host pings out it ARPs for the router instead, which then finally learns the mapping.

Nope. ARP entry was added.
I just found out another thing. The incoming ping from the Internet starts working when I ping the B2 (!) /30 link. So maybe the problem is with the B2 router, not the B3 router. The B2 router is also pretty clean. No filtering rules. Only a NAT rule on the building subnet port (not the /30 link to the building).

What would cause the B2 router decide to forward packets to B3? Especially since it is already forwarding packets to the B3 /30 link.

Here is the dump on those commands:

[admin@wilshireterrace] /ip route> /ip route print where xx.xx.61.0 in dst-address
Flags: X - disabled, A - active, D - dynamic, 
C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme, 
B - blackhole, U - unreachable, P - prohibit 
 #      DST-ADDRESS        PREF-SRC        GATEWAY            DISTANCE
 0 A S  0.0.0.0/0                          xx.xx..60.69       1       
 4 X S  ;;; Comstock building
        xx.xx.61.0/25                     xx.xx.60.66       1       
[admin@wilshireterrace] /ip route> /ip route export
# jan/03/1970 12:33:26 by RouterOS 5.2
# software id = JHMD-045I
#
/ip route
add disabled=no distance=1 dst-address=0.0.0.0/0 gateway=xx.xx.60.69 scope=\
    30 target-scope=10
add comment="Comstock building" disabled=yes distance=1 dst-address=\
    xx.xx.61.0/25 gateway=xx.xx.60.66 scope=30 target-scope=10
add comment="Comstock NAT" disabled=yes distance=1 dst-address=172.21.1.0/24 \
    gateway=xx.xx.60.66 scope=30 target-scope=10
[admin@wilshireterrace] /ip route>

I have been able to make it work, but it is a total hack. I setup an IPIP tunnel between B1 and B3 and the route works without any issues.

Do you happen to see anything in the configuration that I missed? This is driving me crazy. It appears one of the routers has some sort of connection tracking firewall turned on. That should not be the case since I removed all the filter rules. I should have a plain router.

/ip firewall connection tracking set enabled=no

Have you done that on the router?

Isn’t that required for any sort of NAT’ing? I NAT a block of IP’s at each building.

You’re right. It is needed for NAT. I quickly skimmed your post and suggested that since you seemed to express an interest in disabling that.

I’m curious, why are those two routes disabled in B2? They would be ones to route the traffic to B3.

They’re turned off right now because I’m using an IPIP tunnel from B3 to B1 to get around this crazy problem.

Is there any way you could post your entire configs? Its difficult to see what may be the problem with the information we’ve seen so far.

You could contact me outside of this site via telephone if you wish. I’m also currently in the #mikrotik chat on irc.freenode.net if you want to PM me there.

Thanks to Blake for his time today. The problem has been solved.

He suggested turning on WDS for the wireless link that was in between B2 and B3. These were new radios (not MikroTik ones) that were recently added to the mix. Although they were talking over a /30 subnet the radios were one MAC address short. This is when he suggested turning on WDS. So we had an Ethernet (wireless), not a IP level problem, which is what I thought I had. Once WDS was turned on, all the MAC’s that should be there appeared in the radios and things magically started working!