VPN stops passing traffic overnight

Hi,

Sudden Tuesday morning strangeness :open_mouth:

I have two offices with two RB2011UiAS routers with FW 6.49.8. I have them setup pretty well identically. KISS :slight_smile:

Both sit behind a bridged PPPoE router. No fancy networking - just a few port forwards to a local linux server each end, and a Hurricane IPv6 tunnel just for testing IPv6.

They each have a IPsec VPN to connect each other, and another to a cloud based libreswan server.

Remote subnet 10.0.0.0/24
Local subnet 192.168.10.0/24
Remote Libreswan subnet 192.168.98.0/24

This morning for no apparent reason the remote Mikrotik decided to stop passing VPN traffic to this end, and the asterisk server.

My local Mikrotik still sends traffic to and from the libreswan server.

The VPNS on the remote all seem up and happy, but no traffic flows on a IP ping.

10.0.0.0/24 ↔ 192.168.10.0/24 VPN Password - up - Fail
10.0.0.0/24 ↔ 192.168.98.0/24 VPN Certificates - up - Fail
192.168.10.0/24 ↔ 192.168.98.0/24 VPN Certificates - up - Pass

I have restarted both routers to no avail. Restarted the Libreswan server. Mine connects and passes traffic, the remote connects but does not pass traffic. The two ends connect, but do not pass traffic.

Nothing was touched on either router or the libreswan server prior to this - it had been working happily for ages.

I can’t seem to trace what is going on - there must be packet blockage on the remote router but I can’t see what.

FW attached (yes it is a cobble together from several sources and may well be wrong, but then this end is pretty well identical)

Any thoughts or suggestions welcome. I can post a full config too if required.

Thanks.
UK-FW.txt (7.23 KB)

Something I just noticed.

The Libreswan box 192.168.98.1 can ping the remote router 10.0.0.250 but not the IPs behind it eg 10.0.0.1

The remote router 10.0.0.250 cannot ping 192.168.98.1

The local boxes cannot ping 192.168.10.* - 10.0.0.*

I guess it must be firewalling but I have no idea why it just stopped or how to fix it :frowning:

Well, is isn’t my ISP - at least on old copper ADSL.

So I:

Disabled Ipsec for this connection on the Mikrotik router - both Policy and Peer
Set a route for the local network to the remote subnet :
10.0.0.0/24 ↔ 192.168.98.0/24 via 10.0.0.251
Flipped the ipsec incoming IP on the Libre server - the ONLY change made on it
Fired up the old Endian router which uses the identical ipsec setup

And it connected immediately.

ping 192.168.98.1
PING 192.168.98.1 (192.168.98.1) 56(84) bytes of data.
From 10.0.0.250: icmp_seq=1 Redirect Host(New nexthop: 10.0.0.251)
64 bytes from 192.168.98.1: icmp_seq=1 ttl=63 time=27.6 ms

Whatever it is, it is in the Mikrotik. No idea what to do next.

I can try ripping out all the firewall rules and adding them again?

Any suggestions appreciated as I am still Voip Phoneless (we only use Voip over VPN for security)

After a reboot I did a quick ping from there to here and got one response, and then radio silence.

[root@remote ~]# ping 192.168.10.1

PING 192.168.10.1 (192.168.10.1) 56(84) bytes of data.
64 bytes from 192.168.10.1: icmp_seq=1 ttl=62 time=42.2 ms
^C
--- 192.168.10.1 ping statistics ---
5 packets transmitted, 1 received, 80% packet loss, time 4001ms
rtt min/avg/max/mdev = 42.218/42.218/42.218/0.000 ms

try again.

[root@remote ~]# ping 192.168.10.1
PING 192.168.10.1 (192.168.10.1) 56(84) bytes of data.
^C
--- 192.168.10.1 ping statistics ---
8 packets transmitted, 0 received, 100% packet loss, time 6999ms

Note I was just cross checking settings between here and there and trying to ensure they were identical. Literally line by line in the config files.

The router here is a few years old. The remote one should be identical but was only recently purchased. I have another one there the same age as this one so might try that tomorrow so I can discount the newer hardware/chipsets.

I’ll continue talking to myself :slight_smile:

Interesting.

Disabled ipsec connection on the remote end.
Fired up old Endian box on old copper ADSL - predecessor to the Miktotik+fibre
Connected immediately to asterisk after changing just the Asterisk incoming IP address
Connected immediately to Mikrotik here after changing the incoming IP addrfess

Added a couple of temporary static routes on the far end Mikrotik to reroute the subnets and we are running our Voip again.

Next. Fire up the older Mikrotik - same age as this one - and see what happens when we set that up with exactly the same config.

Well,

Got the original router plumbed in.

Test ipsec is up, but i have the same issue with it failing to ping one way when it comes up. As soon as the Mikrotik passes traffic to the remote end then the remote can pass back to the Mikrotik.

Same as earlier here.

https://forum.mikrotik.com/posting.php?mode=reply&t=198637&sid=419d5140b52913db8ff87572dafca03e#pr1018987

That also leads to breakage when it re-keys.

I have tried lots of different rules but nothing seems to crack it :frowning:

I think this is to do with the incoming packet from the remote end not being forwarded correctly.

I can see that I have the FF for established & related, but not new

add action=fasttrack-connection chain=forward comment=“defconf: fasttrack” connection-state=established,related

Clearly missed something in the firewall rules :frowning:

Well, it looks like some Qos rules involving DSCP seemed to be the source of the issue.

If I disable these we work.

add action=change-dscp chain=prerouting comment="Voip" disabled=yes dst-address=192.168.98.0/24 new-dscp=6 passthrough=yes
add action=change-dscp chain=prerouting comment="Files" disabled=yes dst-address=192.168.99.0/24 new-dscp=6 passthrough=yes

I have no idea why - the other router which is essentially a mirror image has those rules without any problems. I am no packet wrangler - I tried to see if I could find out why but could not.

The only difference between the two boxes is that on the one which works, the ISP router in bridge mode still gives a ‘local’ DHCP address to the WAN interface, but the one that fails does not - it gets just the pppoe address so if you still want to access the router you need to add an IP& route to it manually.

Beyond that they are essentially identical bar their local subnet and their WAN IPs.

Go figure.

I have now added a bit of QoS via some mangles and queue tree. I have a ton of bandwidth each end but just like to make sure the ipsec/voip gets priority