Routing/arp problem [solved]

I have a relatively standard Mikrotik setup where the VPN connections appear as interfaces, with /32 . The internal machines are in a bridge ether3-ether5, with the upstream directly connected in ether1.

I’m not sure if my problems started with some configuration changes on our part or my providers changed some internal parameters. As the problem seems to start happening with the 4

I used to be able to reach the internal machines from the VPN, and I’m still sort of, but I’m seeing a problem where the bridge is no longer reachable from the VPN, and every time I try to ping one of the internal addresses from the VPN (beyond the 192.168.XX.1 gateway, which works), the result is an entry added to the upstream arp table, like:

[admin@MikroTik] > /ip arp print where address~"XX" 
Flags: X - disabled, I - invalid, H - DHCP, D - dynamic, P - published, C - complete 
 #    ADDRESS         MAC-ADDRESS       INTERFACE                                                                                             
 0 DC 192.168.XX.NNN SO:ME:MA:CA:DD:RE ether1
 1 DC 192.168.XX.NNN SO:ME:MA:CA:DD:RE ether1
 2 DC 192.168.XX.NNN SO:ME:MA:CA:DD:RE ether1

The SO:ME:MA:CA:DD:RE is the address of my upstream router, which seems to have a greedy proxy-arp set up. I have since set “secure-redirects” to no, to see
if the Mikrotik was redirecting the internal machines to use directly the VPN gateway, not reachable to them… No difference.

Now, I don’t understand why is the Mikrotik even doing arp in the upstream. The error is driving me crazy and I need to fix it.

It seems to be erratic but sticky, as if depending of some hw acceleration or initialization order: typically some of the VPN work, others don’t.

Now, the routes seem to be right from the limited way we can question the router:

[admin@MikroTik] > /ip route check 192.168.XX.NNN
     status: ok
  interface: <l2tp-user>
    nexthop: 192.168.XX.NNN

(the result does not change for any router IP address that we use in src-ip)

I would need either a way to avoid the arp lookup or a way to fix the effective routing, as it is randomly impeding my VPN<->internal traffic, which is the primary reason why we use a VPN.

It is very difficult to debug the packets flowing, but it looks like the Mikrotik sometimes, instead of using the router that /ip route check returns (see above) tries to send those packets through its own the default route, where the sticky proxy arp eats them :frowning:

When I do the ping from one of the “unreachable” machines, like (ZZ is the bridge network, XX the vpn one):

[admin@MikroTik] > /system ssh address=192.168.ZZ.NNN user=root command="ping -c 4 192.168.XX.NNN"    
password:
PING 192.168.XX.NNN (192.168.XX.NNN) 56(84) bytes of data.

--- 192.168.XX.NNN ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 2999ms


Welcome back!

YY.NNN is one machine in the internal network, unreachable from the VPN, but perfectly from the router. It
has only one default route:

[admin@MikroTikParc] > /system ssh address=192.168.YY.NNN user=root command="ip route; echo; ip route get 192.168.XX.NNN" 
password:
default via 192.168.YY.1 dev eth0 
192.168.YY.0/24 dev eth0  proto kernel  scope link  src 192.168.YY.NNN

192.168.XX.NNN via 192.168.YY.1 dev eth0  src 192.168.YY.NNN 
    cache 

Welcome back!

Any help would be welcome.

Regards,
Santiago

I’d check the routes, because it doesn’t make sense, why would the router even try to send arp requests for 192.168.XX.NNN to upstream ether1 (where I understand this address definitely isn’t).

After carefully discarding all the rest, I found what was the deep cause of it. I’m explaining here to help others:

In my original I simplified my exposition of the problem to avoid swamping you with data.
We really have dual up-streams here, and I was using the solution Dual WAN Load-Balancing with Fail-over.
This solution uses two mangle rules to “mark” the routes, taken from mikrotik book blog:

/ip firewall mangle
 add chain=prerouting dst-address-type=!local in-interface=bridge per-connection-classifier=both-addresses-and-ports:2/0 action=mark-connection new-connection-mark=WAN1_conn passthrough=yes
add chain=prerouting dst-address-type=!local in-interface=bridge per-connection-classifier=both-addresses-and-ports:2/1 action=mark-connection new-connection-mark=WAN2_conn passthrough=yes

The problem is the

!local

selector: as is typical of mikrotik setups, there is no local address in the VPN network… so the vpn->bridge were sent down the WAN* route tables. It costed me quite a bit of cleanup and tracing time to understand it. The problem was compounded with me not being local and not too familiar with mikrotik, else I could have plugged myself into the “upstream” switch and debugged it with wireshark or similar, or would have known how to properly use the internal tools…

Thanks to anyone that took time to think about it