I work with Mikrotik routers (mostly rb2011 and rb4011) for a year now, and I have to say they are versatile and powerful devices, but I still struggle with some recurrent issues.
The main issue a struggle with is IPSec instability. I explain :
Every IPSec tunnel (or GRE over IPSec) I’ve set up stop functionning at some point, and I have to reboot one side in order to restore connectivity.
In the log, I have a message saying “phase 1 negotiation failed due to time up”, nothing more.
I think that the problem could happen just after a short WAN link down, but I can’t swear that’s always the case…
Does someone here already had that kind of issue, and knows what cause the problem ?
From the wiki:
“phase1 negotiation failed due to time up” what does it mean? There are communication problems between the peers. Possible causes include - misconfigured Phase 1 IP addresses; firewall blocking UDP ports 500 and 4500; NAT between peers not properly translating IPsec negotiation packets.
This error message can also appear when local-address parameter is not used properly.
Source: _https://wiki.mikrotik.com/wiki/Manual:IP/IPsec#Mode_configs_
This issue is often caused by intermediary routers that have stateful firewall and NAT.
E.g. a problem exists in the AVM Fritzbox routers concerning this.
When there is no explicit forwarding entry for UDP port 500 and 4500 and the IPsec connection is running, but there is some hickup in the internet connection, the Fritzbox router may not recognize the new attempt to connect and will block it or it will do a source port translation.
This will make the IPsec connection fail until you disable it for a short time (allowing the Fritzbox to remove the incorrect entry) and then enable it again.
Unfortunately this even happens when the MikroTik router is set as exposed host. Only an explicit port forwarding in the Fritzbox sort of fixes this.
The same may happen with other routers.
@Zacharias : the thing is that the tunnel works most of the time, so the configuration seems to be ok.
@pe1chl : That’s the kind of thing I thought about, but most of my routers are directly connected via PPPoE interfaces.
So the only firewall involved is the one into RouterOS.
Could it be related to the fact that I have dual wan ? I thought, what if the wan interface used for the VPN go down for an instant, could it cause problems ?
I will try to add a blackhole route to make sure that the router never send packets from the bad wan interface, could that be useful ?
When you have dual WAN you should always have some route rules and/or firewall filters to make sure that only packets with the correct source address leave each interface.
And when you have dual WAN you can also setup dual VPN tunnels for each of them. I.e. 2 GRE/IPsec tunnels with 2 different local addresses each to the same destination and then use some mechanism to send the payload traffic via the GRE tunnel you prefer. When one WAN goes down the other GRE tunnel will be used.
I have dual tunnel on most of my routers (some have nated backup wan which doesn’t support vpn passthrough).
So, about preventing packets from living the router from the bad interface, which is the best way to do this ?
I thought about adding a blackhole route with priority 2 for the public IP of the other side, but maybe a more specific firewall rule would be better (by blocking only ports 500 and 4500) ?
When you have dual WAN you should always have some route rules and/or firewall filters to make sure that only packets with the correct source address leave each interface.
Almost True… Very general description..
So, about preventing packets from living the router from the bad interface, which is the best way to do this ?
There are not good or bad interfaces…
The actual truth, is that IPsec uses the main routing Table.. so if for exmlple IPsec uses not the main Routing Table but lets say a custom Table named ABC for the Tunnel to get Established, since it will try to reach the destination using the main routing Table, it will fail…
I had the same problem again today, on a router with 3 GRE over IPSec tunnels, one of them was disconnected without any reason.
The associated IPSec rule was in state “ready to send” on both sides.
In the log, I had a message like “phase 1 failed due to time out”.
I tried to reboot the router, same issue.
So I disabled ipsec, then enabled it again : same issue.
So I tried to disable ipsec and work with GRE only, and tunnel went up instantaneously, but I had another issue, for which I asked in another thread.
After a while (about 30 minutes) I enabled ipsec again, and everything worked.
So I just spent an hour trying to repair something which wasn’t broken, and I know it will happen again.
Does somebody have an idea of what happened here ???
This is well known, RouterOS IPsec is ridiculously unstable at the moment. Reboot both ends at the same time (kill connection or flush SPI will not help). You can even observe this same behaviour on two routers directly connected that have no other interfaces or configuration but a gre tunnel with IPSec protection.
Not my experience, I have many GRE/IPsec tunnels operational without issue. Different types of router, different recent RouterOS versions, they all work OK.
Only issue is when operating over a provider-supplied NAT router like AVM Fritz!box, as described above.
I sometimes have to use providers-supplied routers, but in this particular case, that’s not the case, on both side I have PPPoE interfaces on the Mikrotik.
Since davidcx and me are experiencing the same kind of problems, and pe1chl isn’t, there maybe is a mistake that we made in our IPSEC configuration ?
Or something we haven’t made : when using GRE over IPSec, RouterOS uses the default peer profile and the default proposal, and I didn’t modify anything.
So, pe1chl, did you modify the default profile or proposal ?
I’m using different setups, both with the “easy IPsec” config (just set a key in the GRE interface and use default profile) and with GRE configured
in plain mode and a separate peer profile that encrypts protocol 47. Usually with settings similar to defaults.
I also use L2TP/IPsec.
With PPPoE you need to watch out for MTU issues. The PPPoE MTU is usually 1480, can often be forced to 1492, but in my case it is 1500 because my provider allows it (RFC4638).
MTU issues can be quite hard to track especially when tunnels are involved. To be safe, watch the MTU chosen automatically on the GRE tunnel and subtract 20 or 8 to allow
for the GRE overhead.
So just to make sure I understand : I watch at the MTU automatically chosen (1406 in this case), I substract 20 (so 1386), and I force this value in the GRE settings ?
Yes, that would be a good thing to try!
(assuming you have 1480 byte MTU on the PPPoE interface. you can usually force that to 1492 by setting MTU/MRU to 1492 and in that case you only need to subtract 8 )