I have an EOIP tunnel configured between two mikrotiks running 6.44.5. The EOIP tunnels are bridged to physical ports on each device. When, for example, an SSH connection is attempted from one end to the other, I see SYN from A to B, and SYN/ACK from B to A. The SYN/ACK goes in the tunnel to A, but does not come out the physical port on A. It seems like it should be pretty simple and straightforward:
Nice job testing and describing the problem!
Can it be possibly MTU issue? I have many EoIP tunels and they certainly don’t block anything. I literary just tested SSH on my production machines and it went through without any issue.
The only other option I can think of is some bridge trouble (bridge filter, arp-proxy stealing packets, wrong VLAN config etc?)
To further investigate:
you said the SYN/ACK going from B to A is visible going into tunnel an does not come out of the physical port. Now, question is where the heck it gets lost. Can you run the sniffer on router A on all interfaces? In theory, you should see:
SYN RX on Ether (received from your computer)
SYN TX on EoIP (sent to tunnel)
SYN/ACK RX on EoIP (received from tunnel - I really wonder about this one)
SYN/ACK TX on Ether (sent to your computer - you already said you don’t see this)
You should also see corresponding flow on router B, but I assume you investigated that one because you confirmed that “SYN/ACK goes in the tunnel” so that should be fine (unless the MAC is wrong)
I also don’t believe it’s an MTU issue because the connection isn’t getting far enough for large packets to be involved.
I’m not sure how the mac address would affect some packets and not others, but:
[eoip-gw] /interface bridge host> print
Flags: X - disabled, I - invalid, D - dynamic, L - local, E - external
# MAC-ADDRESS VID ON-INTERFACE BRIDGE AGE
0 D 5C:A6:2D:42:27:30 eoip-tunnel eoip-bridge 24s
1 D 5C:A6:2D:42:27:65 eoip-tunnel eoip-bridge 0s
2 D A2:1E:80:00:00:0A ether2 eoip-bridge 3s
3 D C0:64:E4:71:8D:09 ether2 eoip-bridge 20s
4 DL E4:8D:8C:1B:AD:15 ether2 eoip-bridge
5 DL FE:CF:B7:E5:4B:BB eoip-bridge eoip-bridge
6 D FE:FE:8C:81:8C:A4 eoip-tunnel eoip-bridge 13s
I’ve attached a more detailed diagram:
The goal is a layer 2 virtual circuit from LAN 1 to LAN 2
however, it must go over a PPP IP link. To this end, eoip-gw
was installed to bridge from the EOIP tunnel to the vlan that
carries the data the rest of the way through various infrastructure.
eoip-gw and eoip-client are mikrotiks; eoip-gw is “A” in the original post
core switch is a juniper
core switch has a port mirror that allows monitoring ge-0/0/5 or
ge-0/0/3. SYN/ACKs from LAN1 go out ge-0/0/5 in the EOIP
tunnel, but do not come in ge-0/0/3.
A perhaps better option would be to end the tunnel in the
juniper, however I don’t know that there is a common protocol
for doing so, as it’s my understanding that EOIP is mikrotik
specific.
If “some” and “others” differ by direction, the bridge forwarding table can be related. If they differ by protocol (TCP doesn’t get through but UDP or ARP does) but the direction is the same, I agree with you that MAC address should not affect that.
But you have provided the bridge host table only from one end (the eoip-gw), and haven’t highlighted/listed the MAC addresses of the two devices attempting to talk together, so not much can be deducted from this information.
Unfortunately it is; what might work would be EoIPv6 because unlike in the IPv4 case, Mikrotik has apparently implemented it as the standard GRETAP encapsulation (but I don’t remember where I’ve seen that information and I haven’t tested that myself yet). Yet another possibility would be to use PPP’s BCP for the purpose, but also here I’m not sure whether Juniper supports it.
So I’d say send a more complete info on the bridge forwarding tables and follow @vecernik87’s suggestion regarding sniffing on the two Mikrotiks directly to find out where the packets disappear. Also, try pinging in addition to TCP, see how ARP requests and responses go through in both directions etc. while sniffing, so that the picture is more complete.
icmp and udp work, it’s just tcp’s 3-way handshake that is being broken, which sounds like a firewall issue, but that should not affect layer 2 traffic (none of ether2, eoip-bridge or eoip-tunnel have an ip address assigned to them) and in any case, I permitted all traffic with “chain=forward action=accept” and it made no difference.
Since the packets go into eoip-gw and don’t come out, I don’t think anything on eoip-client is relevant.
So you can see the SYN,ACK arriving encapsulated inside the EoIP on the wire, do I get you right? I would still confirm this by sniffing on eoip-gw’s ether1 internally, then on eoip-tunnel, then on ether2. Bugs do exist, but this one is especially weird You can sniff into a file and then open it using Wireshark, it’s a regular .pcap format.
You are right that the IP firewall is unrelated, unless you’ve set use-ip-firewall to yes under bridge settings.
Doing packet captures on eoip-gw indicate that the SYN/ACKs are getting dropped on input (capturing on the eoip interface does not show them)
A test setup on my desk works fine. All four mikrotiks upgraded to 6.46.7. Diffing configs does not show anything obvious - the config for this is really pretty simple. This is the diff of the /interface export (lgw.cfg is the one not working):
$ diff testgw.cfg lgw.cfg
1,2c1,2
< # jan/03/1970 01:22:45 by RouterOS 6.46.7
< # software id = I3A4-QMEW
---
> # sep/17/2020 19:03:54 by RouterOS 6.46.7
> # software id = JVYT-02MY
5c5
< # serial number = 8B00099B199D
---
> # serial number = 8B00095FBEB4
9,11c9,10
< set [ find default-name=ether1 ] comment=Management
< set [ find default-name=ether2 ] comment="EOIP tunnel"
< set [ find default-name=ether3 ] comment="Ethernet endpoint"
---
> set [ find default-name=ether1 ] comment="EOIP tunnel"
> set [ find default-name=ether2 ] comment="Upstream Link"
13c12,13
< add mac-address=FE:63:FB:68:55:84 name=eoip-tunnel remote-address=10.1.1.2 tunnel-id=0
---
> add mac-address=FE:A6:EA:31:45:88 name=eoip-tunnel remote-address=10.103.1.202 \
> tunnel-id=0
16c16,17
< add bridge=eoip-bridge interface=ether3
---
> add bridge=eoip-bridge interface=ether2
>
Do the “wanna-be-production” and test devices have the same CPU architecture? I have recently found that there are issues which only occur on some architectures. In this particular case I have no clue why encapsulated TCP should be treated differently due to architectural differences (in those other cases it was making more sense), but why I am asking is to remind you to provide this information when you’ll be opening a ticket at https://help.mikrotik.com/servicedesk (or by e-mail to support@mikrotik.com).
But if the CPU architectures do differ between the two devices, there is one more thing to do - if there are any firewall filter rules in the input chain on the device which doesn’t let the SYN,ACK through, add a rule rule like action=accept protocol=gre (with some restrictive conditions on in-interface and src-address if it helps and you’ll keep it for production) there, before (above) action=drop connection-state=invalid. The woodoo behind is that since some GRE vulnerability has been patched, GRE packets are sometimes considered connection-state=invalid and dropped. And EoIP is an application using GRE protocol, except that the four bytes of the optional GRE identifier are used in a proprietary manner, where two bytes carry the “EoIP ID” and the other two bytes carry the payload size.
If adding the “accept GRE on input” rule doesn’t help, would you mind posting a .pcap (attachment to a post, see the Attachments tab next to the Options one right below the form field where you compose the post) of the (outgoing) TCP SYN and the (incoming) TCP SYN,ACK still encapsulated in EoIP (i.e. sniffed on the ethernet port) from the eoip-gw device where the SYN,ACK is not getting decapsulated?
Also, can you change the EoIP ID from 0 to something else (1..65535)? I know it works with a 0 on the test pair, but still.
I think i had same or similar issue, made tunnel between 2 routers, and on other side i linked it to SSID, you can connect, you get IP from DHCP, you can ping anything, use ip scanner to scan entire network, ping servers like 8.8.8.8, but web pages dont open, cant open RDC(tcp), cant enter network shares..
Tried with completely turned off firewall both sides.
Hard to say. If you were as thorough as the OP (@abatie) back when you’ve encountered that and tracked it down to SYN,ACK not passing through, it is likely that it was the same issue; if you haven’t dug that deep, an MTU issue is more likely to be the root cause in general whenever “everything works except TCP”.
It appears the rule chain=input action=drop was blocking the traffic; I guess this makes sense now that I think about it, since the EOIP tunnel is an IP connection to the router - on the other hand, why didn’t it affect all tunnel traffic? Added “chain=input action=accept protocol=gre” and it’s working now.
That’s why I’m asking you for all the additional information about CPU architecture of the “working” and “non-working” devices and the capture of the EoIP packets carrying the TCP SYN one and the TCP SYN+ACK one. The point is that the Mikrotik firewall has some trouble even with “plain” GRE, i.e. the one carrying IP payload, and with EoIP (which is a proprietary use of GRE) as well. And the trouble seems to be related to the CPU architecture somehow, as it works different on different Routerboard models (and CHRs). But there is too much noise in my observations, so having a clear example of packets which are handled correctly on one architecture and incorrectly on another is necessary to create a model case for Mikrotik support to allow them to reproduce and fix the issue.
Normally, the GRE packets to the router itself should be accepted by the “accept established” rule in the input chain if the router itself has previously sent a GRE packet in the opposite direction, but this currently does not (always) happen. In your case, the EoIP packet carrying the SYN one was sent by the router, so the responding one carrying the SYN+ACK should have been let in by the “accept established” rule alone (since you’ve mentioned you had the “drop” one in the input chain, I assume you had the complete set of default firewall rules in place). I routinely see cases where the two directions of EoIP are seen as two separate tracked connections. I also routinely see cases where, if I do not add the “accept gre” rule before enabling the IP (not EoIP) GRE tunnel, I have to disable the tunnel for more than the lifetime of a GRE pinhole, although /ip firewall connection print where protocol~“gre” shows nothing on the machine which blocks the traffic. But “giving it a rest” makes it work.
So there are multiple issues in the firewall handling of GRE, and more of them appeared since the GRE vulnerability fix somewhere in 6.44.x.
The router in question was originally a 1036 with a tile processor; having run into issues with the tile processor before, we swapped it out with a hex (750gr3). Both had the problem.
I can test the setup once my LtAP arrives (same MT7621A like RB750gr3), but not without the knowledge of the packets which are known to be mishandled that way. I need the outer and inner IP addresses (those of the EoIP session and of the TCP SYN,ACK one). Every bit may make the difference.
Hello @abatie, my replacement LtAP has arrived last week after all (it seems that the whole world wants the LTE6 modules), but in the meantime the PM has been switched off again on the forum so I cannot access the information you’ve sent. If you are still interested in me testing the issue, I can send you my public key and a tutorial on how to use it to encrypt your contact information so that you could send it to me here.
I have two 1072 tiles linked together. This morning they’ve had 6.49.2, now 7.15.3 and without the “gre rule” in the FW it didn’t work as well!
It cost me 2 days of desperation and sweat, until I fount this threat.