To give you an exact answer I would have to see the whole output of
/ip ipsec export
/ip address export
/ip firewall export
/ip route export
commands, with passwords and public addresses replaced by bogus ones to keep things secure, but there are several things which are very different in Mikrotik’s implementation of IPsec as compared to other VPN implementations and even other IPsec implementations.
Other VPNs usually use a virtual interface with its own IP configuration to represent the tunnel. So if you want to send a packet through a tunnel, you indicate a route through a gateway in the IP subnet of that virtual interface. And received packets which came through the tunnel get to the firewall twice - once still encrypted, marked with the physical interface as incoming interface, and with the “outer” source and destination sockets, and another time already decrypted, marked with the virtual interface as the incoming interface, and with the “inner” source and destination socket.
In Mikrotik’s IPsec implementation, even a packet which came through an IPsec tunnel is marked with the physical interface and is only offered to the firewall rules once, after decryption. So by the incoming interface alone, you cannot distinguish between a packet which came through that interface unencrypted (like e.g. a DNS response from public DNS server) and a packet which came through the same physical interface but encrypted and encapsulated into ESP further encapsulated into UDP and got to the firewall after decapsulation and decryption. To discriminate between these two cases, you need to use the ipsec-policy attribute of the packet, where packets with value “ipsec” are the decrypted ones and packets with value “none” are the other ones. It appears that you could use the “inner” sockets to identify decrypted packets, but security-wise it is a bad idea and the ipsec-policy attribute is there on purpose.
So I suspect that your existing firewall rules at some stage drop all packets coming to the internet-facing interface which didn’t fit to some of the “accept” rules preceding the final “drop the rest” one, which includes decrypted ipsec packets. By adding the rule you have mentioned before that “drop the rest” rule, you provide an exception for the decrypted IPsec packets which come through the same interface - although it is not displayed in configuration export, the default action for a firewall rule is “accept”.
The difference between pinging Mikrotik’s own IP and other IPs in the same subnet is that incoming packets towards own IPs are handled by firewall chain “input” while incoming packets towards other IPs are handled by firewall chain “forward”.
To avoid further surprises once you get past the “pinging phase”, I’d recommend you to read carefully chapter 16.5.4 of the IPsec Wiki manual - https://wiki.mikrotik.com/wiki/Manual:IP/IPsec#NAT_and_Fasttrack_Bypass. In short:
- while icmp does not constitute a trackable “connection”, other protocols do, and some ways of handling connection-tracked packets (namely, fasttracking) collides with IPsec policy matching, so you have to prevent fasttracking of IPsec packets; to compensate the resulting slowdown a small bit, you may want to speed their processing up a bit by preventing them from being connection-tracked at all.
- while packets sent from the IP address of the internet-facing interface are not source-NATed (no point in doing that), packets forwarded through that interface typically are. The problem is that IPsec policy matching takes place only after eventual source-NAT operation has been applied. So for going-to-be-encrypted packets routed out through the internet-facing interface, you have to set an exception from source-NAT so that the IPsec policy could recognize them and encrypt and encapsulate them before actually sending them out.