Stalling PPP links

Hello,

I’m experiencing an issue with PPP links that “stall”, in lack of a better work.
This happens with both SSTP and L2TP protocols and maybe with OVPN, though I can’t confirm at this point.
Note that this is not a “general” issue. My CHR-based router serves ~800 VPN connections, with only some showing this issue.

The common factor appears to be connection quality.

What happens is :

PPP establishes fine and is working.
At some point, the dynamic IP and route assigned to the interface generated, i.e. , disappear. The PPP connection it self remains up.
On the remote side, it also remains up. This creates an issue where it doesn’t work, it doesn’t try to reconnect by itself and the remote users, being non-technical, call to report an outage.

I’ve setup static server bindings for each affected secret and set up static routes with the new, static interface as a gateway but on each connection, even though the interface bind to the static one, a dynamic IP and route are also created.
When the incident triggers again, it removes the dynamic IP and route.
If I also add a static IP address, PPP connection cannot establish because “the address already exists”.

I’m looking for a away to fix this so it doesn’t happen, or stop the PPP connection from creating dynamic routes and addresses, so I can assign them statically and know they’re always there.
The workaround currently is to run a custom script that finds those without an address and disconnect them, but that just hides the issue and I don’t consider it a scalable solution.

What version are you running? My experience in 6.47.10 is different - when the client device gets a new dynamic address, it creates a new (sstp in my case) connection but the old connection survives for minutes at server side until all timeouts expire, so I cannot communicate with the subnets on that client until I remove the old connection because routes via both remain active.

In any case, detailed logging of L2TP into a file, sniffing the public IP of the client into another file, and creating supout.rif once it happens is the only way to collect enough evidence for Mikrotik to analyze and fix the issue.

Hello sindy,

Thanks for the input.
CHR (vpn server) is 6.49.1.
clients are whatever between 6.45.6 to 6.49.2. The “affected” clients are all at least 6.48.4 as updating them was part of the troubleshooting.

Actually getting the required info is very hard, plus I’m a bit anal about sharing detailed info. :stuck_out_tongue: