Community discussions

 
User avatar
oliveirapaulo
just joined
Topic Author
Posts: 3
Joined: Sun Mar 12, 2017 1:28 am
Location: Brazil

IPsec communication problems over VRRP configuration

Thu Sep 12, 2019 9:37 pm

Hello, everyone!

I am facing an annoying problem with my IPsec tunnels. Hope you guys can help me find the solution.

Cenario:

I have a Router (RouterBoard CCR1036-8G-2S+) working in a High Avaiability configuration using VRRP. My network ID is: xxx.xxx.xxx.8/29. My Master Router got the xxx.xxx.xxx.14/29 on his Lan Interface and the xxx.xxx.xxx.13/32 on his VRRP Interface (i am using netmask /32 because the wiki says so, but i don't really understand why. I guess is because something like broadcast, whatever. If someone can explain to me later I'll be very thankfull). My Backup Router got the xxx.xxx.xxx.12/29 on his Lan Interface and again the xxx.xxx.xxx.13/32 on his VRRP Interface, wich remains disable till something goes really wrong.

So, I got on my Master Router multiple IPsec tunnels with different players as: SonicWall, pfSense and Mikrotik. In all of the transform sets I use the local address xxx.xxx.xxx.13 to force the tunnel to use my VRRP IP for HA reasons (my backup has all the same configurations, so if one goes down, the tunnels goes back online with the other router). So, all tunnels presents the same problem at a given time...

The problem:

For no apparent reason, the traffic between some of the networks (e.g.: MyRouter -> SonicWall; MyRouter -> pfSense; MyRouter -> Other Mikrotik), suddently stops. OUT OF NOWHERE. This happens not offten and with no patterns at all, just happens. Sometimes with one of the tunnels, sometimes with the another one. I use the tunel without worries for a loooong time (like 1 month) and out of nowhere the traffic just stops. Somewhile later everything goes back to normal without me doing nothing, just waiting or doing something else.

Today I was looking in depth this problem and I found something really interesting and odd at the same time: I noticed that when the traffic stops, if i generate some log's I can see that the traffic is being translated to another IP address, and when the traffic goes back on, there is no translation.

Exemple: When the traffic stoped, I created 4 rules on the filter tab:
/ip firewall filter 
add action=accept chain=input dst-address=xxx.xxx.xxx.xxx src-address=MyPeerIP
add action=accept chain=output dst-address=MyPeerIPsrc-address=xxx.xxx.xxx.xxx

This two rules above I just created to see if the traffic is getting to my router and going out of it. (According to the packet flow, the packet will enter my firewall after the routing decision on input chain because of the destination IP the peer puts on, and will go out at the output chain because the IPsec policies reencapsulates de packets with the the outgoing IP so that the other peer accepts it). So, 'till here no problems, I can see the packets comming through and getting out.

So I created two more rules to see if the paylod itself is getting through the firewall after de decryption occurs on the router. In this rules I used the RFC1918 IP's because the packets already passed the IPsec policies and was decrypted going through forward chain. 192.168.110.0/24 is one of the CCR's network ID's and the 192.168.100.0/24 is one of the remote network ID's:

/ip firewall filter 
add action=accept chain=forward dst-address=192.168.110.10 src-address=192.168.100.23
add action=accept chain=forward dst-address=192.168.100.23 src-address=192.168.110.10

So, here is the problem: When I get the problem (the traffic stops), I see no traffic on this two last rules, meaning that something is going wrong on the decrytion task of my transform set. I realized it because a tested the same rules on the other tunnels that at this time were working and I can see normal traffic on all of the rules (public Ip's comming in and going out wrapped on esp header and rfc1918 Ip's comming in and going out wrapped on tcp headers... everything as expected). But, on the tunnels that the problem occurs, I see no traffic on the last two rules.

So, I logged the two first rules (the ones with publics ip's) and I saw something REALLY strange:

In the input rule (the one who shows me the traffic comming from the peer), I realized the destination IP (the one that should be .13, my VRRP IP address), was being translated to the LAN IP of my router (.14). The log shows:

<srcaddress> -> <xxx.xxx.xxx.14>, NAT <srcaddress> -> (<xxx.xxx.xxx.13> -> <xxx.xxx.xxx.14>). In short: the destination address is not the .13 (the IP wich the tunnel was established), it's being nated to the LAN IP of the router, and I have no idea why. I have already created all the accept rules you can imagine to bypass this but didn't work.

So, thinking about the traffic flow, I looked up what was going one before the forward, e.g.: prerouting (mangle; dstnat...). I've created a rule accepting the scr address of my peer on dst nat trying to bypass this but without success. So, looking at the mangle, I've also created a rule to log the traffic and this is what I'm getting:

Image

So, the tunnel is established over .13 IP and the traffic in mangle is showing the destination as .13, but for some reason this is being translated to .14. While this happens, the traffic that should be destined to .13 is being changed to .14 and because of it, the IPsec transform set does not work because when the traffic moves to input chain the destination IP is already .14 and not .13, bypassing my IPsec rules, in that point I can't see traffic on the two final rules (rfc1918 IP's) and I get no traffic at all.

The problem is also that out of nowhere, after a while, like 20 minutes or more, everything comes back to normal, withou me doing nothing. After the traffic comes back to normal again, I remove all the rules I've created and everything keeps working.

Does anybody else faced something like that? Or, can anyone help me?

I have no problems with Mikrotik IPsec on others cenarios, only in this one with VRRP.

OBS: I use v6.43.16 on my CCR.
Be the better version of you.
 
User avatar
16again
newbie
Posts: 48
Joined: Fri Dec 29, 2017 12:23 pm

Re: IPsec communication problems over VRRP configuration

Fri Sep 13, 2019 12:27 am

Review logs , to see if VRRP didn't hick-up.
Also, I'd focus on src-nat rules. Exclude VRRP source address from ever being masqueraded or sNAT-ted.
Or in filter-out chain , block IPSEC packets to peers, sourced from .14

20 minutes recovery time......conntrack timeout for ESP is 10 minutes. Look into conntrack table during error condition.
 
Sob
Forum Guru
Forum Guru
Posts: 4527
Joined: Mon Apr 20, 2009 9:11 pm

Re: IPsec communication problems over VRRP configuration

Fri Sep 13, 2019 12:43 am

<srcaddress> -> <xxx.xxx.xxx.14>, NAT <srcaddress> -> (<xxx.xxx.xxx.13> -> <xxx.xxx.xxx.14>). In short: the destination address is not the .13 (the IP wich the tunnel was established), it's being nated to the LAN IP of the router, and I have no idea why. I have already created all the accept rules you can imagine to bypass this but didn't work.
This could use some more info, what dstnat rules you have (some of them must be doing this), what exactly you added to fix that (I can imagine different rules, but can't know what you actually tried), ...
People who quote full posts should be spanked with ethernet cable. Some exceptions for multi-topic threads may apply.
 
User avatar
oliveirapaulo
just joined
Topic Author
Posts: 3
Joined: Sun Mar 12, 2017 1:28 am
Location: Brazil

Re: IPsec communication problems over VRRP configuration

Fri Sep 13, 2019 3:54 pm

Review logs , to see if VRRP didn't hick-up.
Also, I'd focus on src-nat rules. Exclude VRRP source address from ever being masqueraded or sNAT-ted.
Or in filter-out chain , block IPSEC packets to peers, sourced from .14

20 minutes recovery time......conntrack timeout for ESP is 10 minutes. Look into conntrack table during error condition.

Actually this makes sense... My src-nat rule that allows internet access for my local clients is set to masquerade an outgoing interface list, wich is consist of my Lan (.14) and my VRRP (.13). I am almost sure that you have, with this few words, illuminated my mind. Thinking of it now, it makes sense. I have some src-nat rules to avoid that the traffic originated from the locals networks that participates of the transform sets, gets masqueraded.

e.g.:

/ip firewall nat
add action=accept chain=srcnat comment="IPsec ByPass"  dst-address-list=ListOfDestination out-interface-list=Wan
src-address=192.168.110.0/24

With this rule I allow my traffic from local network to goes out without NAT to the tranform sets and therefore gets encrypted as I wish. BUT, the problem, I guess, is after this... On the bottom of my NAT chain I have the masquerade rule I mentinoned above that is masquerading the encrypted outgoing traffic to the outgoing interface IP, that is .14. So, when I will get a reply i think the conntrack is seeing that this traffic was destinated to .13 but leave my router with .14, so the magic occurs and it translates to the "wrong" IP.

Does that make sense to you too?

I will try to masquerade the outgoing traffic only for my Lan IP (.14). Do you think that I would have any kind of problem doing this? I mean, like clients from the outside trying to conect in the VRRP IP (.13) in a certain dst-nat rule not get the connection he desires because it can't go out masqueraded?

I mean, I masquerade the outgoing traffic only for Lan Interface (.14) but all my outside clients connects to my internal services via dst-nat rules with the VRRP IP (.13) for HA purposes. Would I get any trouble with this kind of connection if I only masquerades the .14?

Thank you.
Be the better version of you.
 
User avatar
oliveirapaulo
just joined
Topic Author
Posts: 3
Joined: Sun Mar 12, 2017 1:28 am
Location: Brazil

Re: IPsec communication problems over VRRP configuration

Fri Sep 13, 2019 4:04 pm

<srcaddress> -> <xxx.xxx.xxx.14>, NAT <srcaddress> -> (<xxx.xxx.xxx.13> -> <xxx.xxx.xxx.14>). In short: the destination address is not the .13 (the IP wich the tunnel was established), it's being nated to the LAN IP of the router, and I have no idea why. I have already created all the accept rules you can imagine to bypass this but didn't work.
This could use some more info, what dstnat rules you have (some of them must be doing this), what exactly you added to fix that (I can imagine different rules, but can't know what you actually tried), ...

I am really sorry for not posting the print of the log here, I didn't have the chance to collect the filter log. I will get the next time.

Actually I didn't create any rule to fix the problem... I was creating rules TRYING to fix it but it fixed itself... After the traffic comes back to normal (something like after 20 minutes), I remove the rules I've created to try to fix it and the traffic continues to go on.

I don't believe my dst-nat rules would do any harm on this because I set them really especific with in-interface, dst-ports, protocols and etc... All the traffic that comes into the router from the IPsec comes wrapped in the esp header, so I think that is not the problem. I think I am masquerading really wrongly my outgoing traffic using both Lan IP and VRRP IP.

What do you think about it?
Last edited by oliveirapaulo on Fri Sep 13, 2019 5:26 pm, edited 1 time in total.
Be the better version of you.
 
Sob
Forum Guru
Forum Guru
Posts: 4527
Joined: Mon Apr 20, 2009 9:11 pm

Re: IPsec communication problems over VRRP configuration

Fri Sep 13, 2019 5:19 pm

I think it's possible that I was wrong and dstnat rules are innocent. If the log is for response packet for connection initiated from this router, it can be srcnat.
People who quote full posts should be spanked with ethernet cable. Some exceptions for multi-topic threads may apply.

Who is online

Users browsing this forum: Bing [Bot] and 43 guests