VPN over UDP dies over time

Hi

I Have a strange problem with Metal 2SHP - I have a VPN connection routed over that box, it is on UDP port 443. Everything works fine for a time (depends, maybe 10 minutes only or 4-8 hours) but then suddenly the VPN stops to work. It looks like the router stops to forward the packets - the vpn server on the other side is still fine (checked many times with another internet connexion bypassing the router).

The problem immediately goes away if I reboot the 2SHP. The VPN then immediately reconnects and everything is fine again. When I kill the vpn client and just let “Normal” traffic pass through the 2SHP, it is still alive and everything looks normal. So it is 99% sure to me it only stops forwarding these port 443 UDP packets.

Changing the port on client and server seems to have no influence on the problem.

I wonder what type of problem might that be… coz the box is still routing everything except that vpn traffic. Might there be some accounting & traffic shaping on the 2SHP that I overlooked? So when I’m on VPN the only traffic that is going through the box is on the single UDP/443 connection - might that saturate something on the router, so that finally it cuts down the speed or totally cuts the connection?

I have just default queues on the WLAN & ether side.

ps. the router is used as p2p link to another WLAN that is some 200 meters away (but as I said non-vpn works fine when the problem occurs - e.g. I can perfectly surf the web or login via ssh to the remote vpn server - so it is for sure not the next hop, it is the 2SHP itself).

It starts to drive me crazy :confused:

thanks

Hi,

is connection tracking enabled and used on one of your Metals? If so, it might be the UDP connection timers.
I have had quite a similar issue with SIP sessions over UDP and I think it is related to the UDP connection timers.

Ape

I think there is a bug somewhere.
I see the same issue with TCP sessions with long lifetime, e.g. SSH.

The connection tracking timers do not get reset when there is traffic. So, a connection that lasts longer than the
timeout in the connection tracking will experience problems. When the first traffic after a timeout is from the “inside”
the router will just create a new connection tracking entry with maximal timer, but when the traffic is from “outside”
it is just dropped.

Hopefully MikroTik will fix this sometime. However, I do not know how to file bugs with them.

Write an email to support@mikrotik.com

Yeah I think it thas connection tracking on. However never experienced such bug on any other Linux device with connection tracking.

I think the traffic does not die totally over that UDP port but gets slowed down towards 0 - asymptotically.

Recently I’ve enabled something called flow monitoring - or so - and the VPN/UDP is stable since then. But still under observation.

Is there no fillable form or some bug tracking somewhere that asks for the relevant information and maybe allows
to search for existing reports?
Sending a free format mail will probably result in replies “asking for more information”.
But I see the above issue here and have posted about it before, others confirm it.

I have just updated to version 6.31 (expecting that this is the first thing that would be replied when I submit an
issue for an earlier version) and hey, it looks like it is no longer reproducible :slight_smile:

So maybe this is a solution for the original poster of this topic as well…

Hi,

@kuranga

Sounds comprehensible, but every udp packet corresponding to the identified udp “connection” should reset the timer, which it obviously does not do. It really looks like a bug to me. I agree, I also never encountered such a problem with connection tracking on a linux / iptables firewall.

@pe1chl

Just write an e-mail to support@mikrotik.com.
Describe your issue as detailed as possible and include the supout.rif.

I never had contact with MikroTik’s support, but they’ll propably ask you to use the most recent RouterOS to fix your issue.
Good luck! :slight_smile:

Ape

Hi

I think that I’ve found the cause of the problem… looks like for some reason the NAT table saturates and there are no any more ports available (for MASQ). Somehow it kicks out the UDP connection too.

I’ve no solution yet… so far trying to reduce the timeout values in NAT / Tracking.

Hi,

How many connections are NATed on the Metal 2SHP?
Is this legitimate traffic or some sort of attack?

I don’t have any real life experience with many NATed connections on a Metal 2SHP, but from the specs (400MHz, 64MB RAM) I guess, it should be able to handle some hundred NATed connections.

Ape