Question involving multiple IPSEC tunnels

hapoo · May 15, 2021, 7:43pm

I’m struggling a bit with my setup and need some help.

I manage a group of clients who all have a IPSEC tunnel to a business. They’re all given a /30 range and connect to 10.0.0.0/8. I’ve set them all up with a Hex S and are all behind their ISP routers. I have no control on the business side of this connection, so when issues pop up or we need more tunnels, I’m at their mercy. Due to their setup I also have no way of connecting from one client to another through the tunnels. As a result I’ve setup an outside “Management Router” that all the Hex’s connect to also using IPsec tunnels from their /30 range to 10.1.1.0/24 through which I can access them and their attached computers.

We’ve recently decided to expand the scope of this management tunnel and I’m having trouble figuring out the best way of handling it. For starters we have other networks attaching to it on different subdomains (instead of the 10.1.1.0/24 subnet they’re on the 192.168.1.0/24 subnet). We also want to setup the Management Router as a backup connection to the Business, in case their direct connections fail, this should take over.

Ignoring my existing solution of using IPsec tunnels with multiple policies, what’s the proper way of setting this up? Tunnels? L2TP? GRE? Using RIP/OSPF? A lot of technologies I’m vaguely familiar with but don’t have enough experience to choose between or implement.

Thanks in advance.
Screen Shot 2021-05-15 at 15.22.29 PM.png

sindy · May 16, 2021, 8:40am

If you were starting from scratch:

bare IPsec takes least overhead and is most different from normal routing
IPsec-encrypted IPIP tunnels allow you to use normal routing with dynamic routing protocols but there’s additional overhead, albeit a few bits smaller than with GRE

However, you’re not starting from scratch, and even worse, you’ve got no control about the HQ side.

Since 6.47.something, RouterOS permits a policy to be associated with two peers; only a single pair of SAs for that policy is established at a time, to one of the peers. So you could set up the following arrangement:

A     B
|\   /|
| \ / |
|  X  |
| / \ |
|/   \|
1-----2
 \   /
  \ /
   3
   |
   |
   C

C is the host in the branch office that needs to talk to hosts A and B in the headquarters’ network
3 is the branch office router with two IPsec peers, 1 and 2

As one of the components of the overall security concept of IPsec is that packets that reverse-match a traffic selector of an existing policy, even an inactive one, must be dropped if they came in some other way than via a security association linked to that policy, it is essential that the policies at 1 and 2 are generated from a template rather than configured statically, otherwise these two routers wouldn’t be able to hand over the traffic from the remote peer directly to one another as it would be coming in the “wrong” way to the other one. The latter is true for RouterOS; whether it is the same for the IPsec router used by the Business is up to you to find out.

Another reason for dynamic generation of policies from a template is that both 1 and 2 can then have a static regular route to 3’s subnet via the other one; this regular route is overridden by the dynamically created policy.

If 1 and 2 should act as stateful firewalls, you’d need them to use VRRP synchronized with the IPsec failover instead of both being active, as the stateful firewall doesn’t handle non-symmetric routing well (if a SYN packet from A to C goes via 1 and 2 but the response SYN,ACK packet from C to A takes a shortcut from 2 directly to A, a stateful firewall at 1 won’t see the SYN,ACK so it won’t let subsequent packets from A to C through, hence you need that A sends a packet for C directly to 2 if the SA is active between 2 and 3).

So if you can agree with the Business to arrange their topology accordingly, the above is one possible way to go. It should be possible to set up a separate policy for communication between 3 and your management router acting as a backup one for the one of the Business.

To migrate to a setup based on IPsec-encrypted IPIP tunnels, you’d also have to work in tight cooperation with the administrators of the Business. They would have to change the setup for the branch offices from bare IPsec to IPsec-encrypted tunnels one by one to minimize the outage; for them, such arrangement is much more complicated as they need to configure one tunnel per branch office, whereas with bare IPsec the policies may get created dynamically (at least in RouterOS).
A combination of bare IPsec towards their 1 and IPsec-encrypted IPIP tunnels towards your 2 is the most complex one to handle for you at 3, as you’d need to activate an action=none policy to shadow the (currently inactive) one towards 1 each time the IPsec session to 1 would be down, and I can see no advantage of such a mixed approach.

So all in all, my private opinion is that dynamic routing protocols are advantageous in mesh type networks; since each 3 only has two paths to the rest of the network, and since dynamic generation of IPsec policies substitutes a dynamic routing protocol in terms that it “installs a route to each 3” at 1 or 2, I’d stay with bare IPsec in this particular topology.

Things to consider are whether the failover time of 100 seconds by default (DPD messages are sent every 10 seconds and 10 of them must stay unresponded to declare the peer down) is sufficient for you or whether you’d change that to something faster, and whether you want to force the security association back to 1 after it recovers - RouterOS keeps the SA on 2 until it fails itself. To force it back to 1, you have to disable and re-enable the peer representing 2, so you need some kind of monitoring script to watch for recovery of 1 and take action. It is also a good practice not to rush with moving back to the primary path if the switchover operation is not hitless (which is our case, it takes some time for the SA to establish so a few packets may get lost). And the IPsec code doesn’t currently provide any possibility to run a script on state change similar to what DHCP or PPP offer, so you’d have to schedule a periodic run of the fallback script.

Also, things become complex responsibility-wise once the 2 under your administration becomes a backup for 1 under their administration, as in that case, you’ve got access to an element which may play a firewall role in their network (policing where the C is allowed to get within the headquarters network).

hapoo · May 16, 2021, 7:44pm

sindy,

Every single time I have a question, you’re quick to respond with thorough and accurate responses. Thank you.

Your answer was helpful, but as you may have guessed, I have no control over the business sides chosen setup, so I can’t switch them to IPIP, so I might be stuck with a mixed network setup. I’d love your help on this and would be willing to compensate you for your time. Unfortunately the forum doesn’t seem to allow direct user messages.

sindy · May 17, 2021, 9:02pm

I didn’t have to guess, you’ve stated that clearly. But the thing is that no matter what you do at your router “2”, you cannot make it a backup point of access to their network without cooperation with their network administrator.

So if there is no chance that they cooperate, you can give up on this part and only concentrate on the management access to the client devices. And for the managent, it is enough to use a public IP for the end of the management tunnel which is at the management server, making use of the fact that public IPs are (well, should be) globally unique, to prevent IPsec policies between any two private (RFC1918) addresses from being in conflict with the policy used for management. And if you configure the “management” peer at the client devices with a domain name of the management server, you can even migrate the management server to another IP address if you update the DNS record accordingly. Use of mode-config to assign an IP to the client router and split-include to make it generate a policy dynamically will make the client devices always create a correct policy to reach the management server. It is not unusual that a policy in tunnel mode has sa-xxx-address and xxx-address identical at one end, so you don’t need separate public IPs for the SA and for the payload.

I can give you my public key so that you could send me your contact info encrypted via here if you think it’s worth it, but I’m quite busy these days so my help may not be as quick as you expect.

hapoo · May 17, 2021, 9:33pm

So if there is no chance that they cooperate, you can give up on this part and only concentrate on the management access to the client devices.

I obviously don’t understand enough to know why that is. If I have an IPsec tunnel from my management router to the business, and all the clients are connecting to me, even if I can’t just pass through their connection, I should be able to NAT them through, right?

EDIT: I think I understand what you mean. While the VPNs handed to us have been in tunnel mode, there really was no need for it since no one on the business end actually needs access to the clients. Only the clients need access to resources on the Business network. Because of this I believe our setup should work just fine NATing a single link.

And if you configure the “management” peer at the client devices with a domain name of the management server, you can even migrate the management server to another IP address if you update the DNS record accordingly. Use of > mode-config > to assign an IP to the client router and > split-include > to make it generate a policy dynamically will make the client devices always create a correct policy to reach the management server. It is not unusual that a policy in tunnel mode has > sa-xxx-address > and > xxx-address > identical at one end, so you don’t need separate public IPs for the SA and for the payload.

Parts of that are similar to what we currently have setup thanks to your previous help. The management router is setup with a dns record and all the clients connect with dynamic policy. And I looked into the split-include to cover parts of our network that are out of the 10.0.0.0/8 range (like our 192.168.1.0 network in my original post) rather than use multiple polices, but I thought that mode-config was for transport mode, not tunnel mode. Not sure if it makes a practical difference, but that’s just how we had it setup.

I can give you my public key so that you could send me your contact info encrypted via here if you think it’s worth it, but I’m quite busy these days so my help may not be as quick as you expect.

I’d like and appreciate if you would. I don’t expect everything to be fixed tomorrow, but I also know that with your understanding, you can do in one hour what would take me 2 days of experimentation.

sindy · May 18, 2021, 6:23pm

OK. As private messages have been disabled again after a few months of working, you can send me your e-mail address and/or mobile phone number using this instruction (the method at line 16). After creating the-encrypted-short-file, run openssl base64 -e -in the-encrypted-short-file and paste the output of that here as text. My public key follows:

hapoo · May 18, 2021, 7:25pm

Zome619kUkdB3H5CIPRsvaDatyNSMHaR1dNWf4xv4kMaUryCIhGsQTyRuZwY7akE
YkP4DKog7Uk6Dp2wi8Lz0bidrOrh7/veo7SyAoXeDp5NZwAQUk5+vuUTzeMdQnOF
L5F7rRrmm9OcPIOhf9L5o5CHDqIhSUI5WDxhQl80C0ZUfcqpPf4vZMlsoG7PM6dH
+W1Ub8clX52pC9LoJy7389VmbDFJEu1o4jbMBv0DZgBc1Jeasb2F6VInacDJ+Z6T
L8SkhjwN3aZKURb5crOhnK8dg6JLujGikPBUjDQs0ZC16cEIbnlfIQ7du76zMs0c
Row25U7GzLe6CT7aCkKrjaVwGgLqpsxT7HlMfcMdqeIcCfpq4dgVU5EmkR7H3rrW
0IGo+pd8/YKEwNhXXgUdqMK18yhOAveApv5FM5hKXpsffUQUSpMPRMQJ+G/UuFI0
Lsz8XjCe4O6LiWkFSP3kbvYjkE0EqcoLtHT+hvlLs+ZmwaUG6jJ6Eyvcy2BmTU0q
MRVHEwq1FuSXYjhoJ6ag9jwR2hcy3x9VxKIIHcouJyZMDqWorzw+BWFbqYqT/Chn
9zp7Vz4/PGLeboyhS9grjE+aB9vHU2KAsfQyyAHIvE0cNYnzKmuM+MTMEGSmjmVQ
RG0o2X7AdFhfMjf5zX4cfdrTfeV+LlWieROmAnIs5Ns=