Configuration - 2 CCR1009-8G-1S connected connected via IPIP Tunnel with IPSec (MD5, AES256-cbc). Tunnel has been working well for 3 months without a problem - speed is OK, but it’s been very stable.
Issue - Tunnel stopped passing data (no config change, no adjustments) suddenly - The connection shows that it’s running. Addresses from either side of the tunnel are showing up in Torch, but data is only transmitted on each routers ip tunnel interface, not received. Ex.
Site 2 Torch - IP Tunnel interface (Src addresses are from Site 1 LAN, Dest are from Site 2 LAN)
Src Dest TX Pkt RX Pkt
10.10.0.28 10.0.0.5 1 0
10.10.1.53 10.0.0.5 1 0
10.10.1.49 10.0.0.30 1 0
Site 1 Torch - IP Tunnel interface (Src addresses are from Site 2 LAN, Dest are from Site 1 LAN)
Src Dest TX Pkt RX Pkt
10.0.0.5 172.16.0.2 1 0
10.0.0.15 10.10.0.80 1 0
Troubleshooting steps:
verified basic connectivity between sites - ping, telnet, SSH
verified firewall settings
verified IPSec settings, changed IPSec settings (the SA’s install and increment Current Bytes in both directions)
updated to latest bugfix software on both sides of the tunnel
tunnel passes data with encryption disabled.
I’ve engaged the ISP - but haven’t gotten much info back yet. They’re looking at a possible routing issue. Traceroute shows the same address twice from Site 2 to Site 1, but not vice versa. It seems to me, though, that if basic connectivity works, then the tunnels should - what am I missing?
with encryption - a ping from a device on address 10.10.0.5 to 10.0.0.5 for example fails
without encryption - same ping works.
enabling IPSec logging on both ends - no errors
monitoring ‘Installed SAs’ in the IPSec window:
– Site 2 shows the inbound and outbound entries incrementing
– Site 1 shows only the outbound entries incrementing - this maybe indicates that packets are flowing from site 1 to 2 but not 2 to 1.
One further wrinkle - when the VPN is active (encrypted), Site 2 is showing a constant stream of dropped packets from an address 2 hops upstream from Site 1 (not the gateway, but the next device in a traceroute).
I recreated your scenario in a lab and I am not running into any issues. One thing I did do differently is all of the gateways are set for their respective interface versus an IP address.
i.e.:
[admin@MikroTik] > ip ipsec proposal print
Flags: X - disabled, * - default
name="default" auth-algorithms=md5 enc-algorithms=aes-256-ctr lifetime=30m pfs-group=none
[admin@MikroTik] > interface ipip print
Flags: X - disabled, R - running, D - dynamic
# NAME MTU ACTUAL-MTU LOCAL-ADDRESS REMOTE-ADDRESS KEEPALIVE DSCP
0 R ipip-t... auto 1450 1.2.3.4 1.2.2.3 inherit
[admin@MikroTik] > ip route print
Flags: X - disabled, A - active, D - dynamic, C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme,
B - blackhole, U - unreachable, P - prohibit
# DST-ADDRESS PREF-SRC GATEWAY DISTANCE
0 A S 0.0.0.0/0 ether1 1
1 ADC 1.2.3.0/25 1.2.3.4 ether1 0
2 A S 10.0.0.0/24 ipip-tunnel1 1
3 ADC 10.10.0.0/24 10.10.0.254 ether2 0
4 ADC 172.17.0.0/30 172.17.0.2 ipip-tunnel1 0
5 ADC 192.168.5.0/24 192.168.5.2 ether5 0
6 ADC 192.168.20.0/24 192.168.20.1 ether3 0
7 A S 192.168.21.0/24 ipip-tunnel1 1
0 E spi=0xFB5A506 src-address=1.2.2.3 dst-address=1.2.3.4 state=mature auth-algorithm=md5 enc-algorithm=aes-ctr
auth-key="27a35dd7632224c48aae871ade016131"
enc-key="c5e9e326325055fecc1d41df8451444f7bd80d484d778229f18f88d074095cef1c7b1bfa"
addtime=jul/02/2016 23:05:02 expires-in=9m39s add-lifetime=24m/30m current-bytes=3532 replay=128
1 E spi=0x211FFA9 src-address=1.2.3.4 dst-address=1.2.2.3 state=mature auth-algorithm=md5 enc-algorithm=aes-ctr
auth-key="29d0500056f5acb05a35a3f06cc4902f"
enc-key="e992e8ae51bfe66a72ec012b023f22fb5dd4abdf074f1e3803d38efb19bd2545cddeae97"
addtime=jul/02/2016 23:05:02 expires-in=9m39s add-lifetime=24m/30m current-bytes=3647 replay=128
And all others like topic starter:
works fine until today (more than year), works if I disable ipsec (just plain ip-ip tunnel) and ipsec goes without error (established).
If you didn’t change anything, it’s unlikely that you’d hit some bug in RouterOS after such long time. Some problem outside of your control, anything on the path between routers, seems more likely. You can’t do much with that, but it should be possible to discover what exactly it does (some lost/filtered packets perhaps), but even that may not be easy. One easy thing to check, IPSec is not single connection, so even if it shows as established, it’s only first part. Second is actual tunnelled data, that’s different kind of packets (protocol 50, unless there’s NAT in the way). You can see if there’s anything coming in using “/ip ipsec installed-sa print”. I know you wrote that you have everything else same as OP, and OP wrote that counters are increasing, but just to be sure that it really is same…
Recreating tunnels and reboot both routers didn’t help. But this morning tunnel starts working again.
It seems that there was some problems on ISP side.
Since the issue was with encrypted IPIP, you’ve posted way too little from the actual configuration. The manually configured mtu values on the /interface ipip rows suggest that you did think about not having the IPsec transport packets fragmented. I mention this because broken handling of fragments on the path between your peers is one of things that may break your IPIP/IPsec tunnel.
However, you haven’t stated whether the WAN addresseses of both routers are public ones or not; since IPIP works without IPsec encapsulation, it seems that it is the case. If so, the IPsec transport is bare ESP, which the ISPs also sometimes break. It would be surprising, however, that deliver of ESP would be broken while delivery of IPencap would not.
/tool sniffer running at both ends is your best friend in many situations, and this one is one of them - if you can see an ESP packet to leave from one peer and never arrive to the other one, it’s the ISP. If you can see ESP packets reaching their destinations in both directions, it’s something in your routers.