I have a large number of IPsec tunnels in my network, which consist of only MikroTik routers.
Commonly it is all made as one central-router which the “satelite”-routers connect to.
There are main ways of setting up the IPsec policies and peers, such as dynamic policies and what not.
I chose a while back to set it up with static policies and peers, managing it all using a few scripts and a DNS server. It all works fine, but there are a few questions about how best to manage this.
The biggest question is how DPD (Dead Peer Detection) works best.
The default is 120 seconds with 5 failures. This seems like a very long time, and in theory I don’t want the central-router to keep the tunnels alive, I want the “satelite”-routers to keep the tunnel up.
Currently I have the central-router’s DPD disabled, and the “satelite”-routers have 20 second interval with maximum 1 failure. The desire is to detect problems with the IPsec tunnel and re-establish the connection quickly.
So what is best practice for DPD?
Why is default interval so long?
Is 1 maximum failure too few?
Should it be identical in both ends?
Or is one-way DPD ok?
The manual doesn’t give much inside into this, my limited knowledge may be at fault though. All I see is:
I normally use 30 seconds and 3 failures for fast response.
I think indeed it may be fatal when the DPD interval * retries at one side is smaller than the DPD interval at the other side,
but I am not sure. I am investigating a problem where one user cannot maintain the connection and is reported as
a dead peer, this might be the reason. (dpd at his side set to 120, at our side to 30*3)
I cannot see in the RFC that it is mandatory that the DPD be the same on both sides. According to RFC 3706 point 5. DPD Protocol:
To this end, each peer may have different requirements for detecting
proof of liveliness. Peer A, for example, may require rapid
failover, whereas peer B's requirements for resource cleanup are less
urgent. In DPD, each peer can define its own "worry metric" - an
interval that defines the urgency of the DPD exchange. Continuing the
example, peer A might define its DPD interval to be 10 seconds.
Then, if peer A sends outbound IPSec traffic, but fails to receive
any inbound traffic for 10 seconds, it can initiate a DPD exchange.
Peer B, on the other hand, defines its less urgent DPD interval to be
5 minutes. If the IPSec session is idle for 5 minutes, peer B can
initiate a DPD exchange the next time it sends IPSec packets to A.
What’s more, on the macOS side with DeadPeerDetectionRate = Disabled and on the side mikrotik DPD = enabled (60s 3 attempts), the DPD messages reach the macOS client and it responds to them correctly. The RFC does indicate that these messages must be answered.
@pe1chl Did you have these problems because of the different ways of configuring DPD?