I still have a lot of issues with IPSec, getting my GRE over IPSec tunnels down without any reason, with a log message about a phase 1 timeout.
So I tried to just disable IPSec encryption on one tunnel, and it instantaneously get up.
But, there is a big big “but” !
I can ping any device on the other side of the tunnel, but I cannot establish any TCP connection (SIP phones doesn’t register, I can’t access the web admin of a device).
This is weird, and I don’t understand what happens.
Does anyone had this issue in the past and knows how to fix it ???
Did you setup the GRE/IPsec tunnel using the wiki example ?
It can be anything wrong… Since you have problems with IPsec as well…
Could you export with hide sensitive your IPsec config with policies, proposals, peers etc everything, your NAT rules, your firewall and your routes ?
The problems I experience with IPSec are not the same.
Actually, I have problems with IPSec, with or without GRE, those are instability problems (tunnel stopping to work without any apparent reason).
The fact is that the same GRE tunnel work great with IPSec (except the instability), and doesn’t work at all without IPSec.
I indeed follow the wiki example when I set up my first GRE tunnels, but I always used IPSec encryption, so I never noticed the issues without it.
It’s difficult to make anything wrong following the wiki example, since it’s basically 3 lines on each side, but I could have done a mistake with everything not in the example, I admit it.
Fun fact : yesterday, I was able to establish a fully fonctionnal tunnel between a Mikrotik 4011 and a Zyxel USG20.
Mikrotik was connected via LTE router (NAT on the router + NAT on the provider side), and the Zyxel was connected through the provider’s router (with NAT).
So I don’t understand why I can’t do the same with my two RB2011 connected through PPPoE interfaces…
Hi himvas, thanks for answering
I don’t think the issue is in firewall, my gre interface is in a list named GRE, which is included in the LAN interface list.
Plus, I tried to put my gre interface directly in the LAN list, and also to add some filter rules to accept anything coming and going through this interface…
When you have issues like “one type of traffic works and another type doesn’t” you need to debug your firewall.
You write:
I don’t think the issue is in firewall, my gre interface is in a list named GRE, which is included in the LAN interface list.
but that isn’t even possible in RouterOS!
You may have lists with the same name as interfaces but that does not make them the same thing. Lists can only have interfaces as members, not other lists.
So first get that straight.
When it still does not work for TCP connections, start debugging MTU issues. You mention you use PPPoE so check the MTU of the PPPoE interface (will usually be 1480 or 1492) and see how much less than 1500 this is (20 or 8 in these cases) and subtract that number from the MTU that has been automatically set on the GRE interface (by default that will be 1476 so change it to 1456 or 1468).
When that still does not fix it, apply this rule to the mangle list:
It seems indeed to be tcp related, since UDP works well, and ICMP too.
I tried to low MTU value (I tried 1400 and even 1300), but it is not resolving any issues.
I tried the mangle rule to clamp mss to pmtu, nothing.
If I try a ping with “don’t fragment” flag, it works up to the MTU value.
An idea ?
Here is what I’ve done :
after setting up my tunnel with default MTU, I check the maximum packet size with the tool ping : 1440
from a device on the network, the maximum size packet is 1412, I think that is normal.
I tried to enable or disable the “Clamp TCP MSS” option in GRE tunnel configuration : no change.
I tried with and without the mangle rule : no change.
I tried to enable or disable “Allow FastPath” : without this option, I cannot even ping, which is weird.
With the “Allow FastPath” option enabled, I can ping, I can connecte SIP phones over the tunnel in UDP, but not in TCP, and I can’t establish any http connections.
I checked what happens in case of an http connection on wan port with packet sniffer :
I see the initial packet from my laptop going out the router, and I see the ACK going back to my router, with correct informations inside.
But, sometimes after, I see retransmissions of my initial packet, which means that the ACK is received by the router, but not transmitted to my laptop…
I double checked by capturing packets from my laptop : the ACK never arrives.
So it seems that for some reasons, the router doesn’t forward the received ACK, which is why TCP doesn’t work.
And just to be sure the firewall isn’t the cause, I’ve added rules to accept anything coming from and going through my gre interface, it doesn’t change anything, but the counter of the rule for what is coming from doesn’t increase, so the problem seems to be before the firewall. It’s like routeros doesn’t decapsulate packets.
I think I solved my problem !
I had to add a filter rule to accept GRE protocol in the input chain, and I had to add it before the default rule dropping invalid connections.
Without this rule, ICMP and UDP works, but only with the “Allow Fast Track” option enabled, with it, I can disallow fast track, and TCP works.
I have a small question, just to understand exactly what happened : why is this rule
add action=drop chain=input comment="defconf: drop invalid" connection-state=invalid
This is due to a recently introduced bug, that was the result of fixing an (apparent) other bug in the firewall handling of GRE.
It was quickly noticed that this problem was introduced at the time but the promised fix has not yet been delivered it seems.
I did not know that it could impact part of GRE traffic, I would think it affected the basic GRE tunnel traffic so everything sent over GRE.
Actually, it does affect all the traffic.
But since I had enabled the “allow fast track” option, I think that UDP and ICMP didn’t pass through the firewall… (I’m not sure how fast track works exactly, so I’m speculating here)
I still have issues with others GRE tunnels…
For now I succeeded to establish a working tunnel between a RB2011 directly connected with PPPoE and another RB2011 behind a NAT router.
So tonight I try to establish another tunnel between the same PPPoE connected RB2011 and another PPPoE connected RB2011, with exactly the same settings, and it doesn’t work !
I made multiple tests, and I have found that almost every packets coming in my first router input from the second router have a raw length of only 90 bytes, but the IPv4 total length is much more than that (for example, 576 bytes).
Wireshark display this :
[Expert Info (Error/Protocol): IPv4 total length exceeds packet length (76 bytes)]