Input to Mikrotik issue when Dual WAN implemented

Hello!

I have the following scenario. The problem is on RouterOS v7.x. The model does not affect. Also tried in different location with different ISPs - no change.

Mikrotik, acting as the gateway for local users. It has two ISPs connected:
ether1_WAN - connected to ISP1 gateway with local address 10.110.110.11 (ISP1 local address is 10.110.110.10, and has public IP for access from the internet, 22 and 8728 ports are forwarded to 10.110.110.10).
ether5_WAN_reserve - connected to ISP2 gateway and has Public IP for access from the internet.

I have configured two default routes in main routing table:

  1. 0.0.0.0/0 via 10.110.110.10 with distance 1
  2. 0.0.0.0/0 via <ISP2_gateway> with distance 2

I’ve created two additional routing tables for each of the ISPs:

  1. PBR-to-ISP1
  2. PBR-to-ISP2

Then I’ve configured two default routes for each of the additional routing tables:

  1. 0.0.0.0/0 via 10.110.110.10 with distance 1 in PBR-to-ISP1 table
  2. 0.0.0.0/0 via <ISP2_gateway> with distance 1 in PBR-to-ISP2 table

Then I’ve configured the following mangle rules:
[admin@MikroTik] > ip firewall mangle print detail
Flags: X - disabled, I - invalid; D - dynamic
0 chain=input action=mark-connection new-connection-mark=ISP1-conn passthrough=yes in-interface=ether1_WAN
1 chain=input action=mark-connection new-connection-mark=ISP2-conn passthrough=yes in-interface=ether5_WAN_reserve
2 chain=output action=mark-routing new-routing-mark=PBR-to-ISP1 passthrough=yes connection-mark=ISP1-conn
3 chain=output action=mark-routing new-routing-mark=PBR-to-ISP2 passthrough=yes connection-mark=ISP2-conn

Now I have the opportunity to route traffic from certain local subnets to certain ISP using prerouting rules, as an exemple:
;;; bridge10_test_default_ISP
chain=prerouting action=mark-routing new-routing-mark=PBR-to-ISP1 passthrough=yes dst-address-type=!local in-interface=bridge10_test

And it works fine, but my issue is not linked to the local subnets, it is linked to an access to my router.

I need to connect to the router via ssh (port 22) and api (port 8728) periodiacly. In 90% times I can successfully connect to the router using both the ISP1 router address and ISP2 router address. But sometimes, I noticed that the connection is timed out either on port 22 or 8728 on both ISPs simultaneously.

I’ve started to troubleshoot this issue and implemented the monitoring of ports 22 and 8728 on both WAN interfaces of the router. Every 20 sec my monitor tries to establish 4 TCP connections using:

  1. ISP1 ip address and port 22
  2. ISP1 ip address and port 8728
  3. ISP2 ip address and port 22
  4. ISP2 ip address and port 8728
    My monitor software is located in other network and I connect via Internet.

And the following problem has been detected. In a random moment my router doesn’t answer from either 22 or 8728 port. It is continuing for 2-3 times. It is 40-60 sec. And when it happens, I cannot connect using 22 port even from other host in the internet on both ISP IP addresses, while 8728 is responding on both ISP IP addresses and vice versa. Very rarely, but the problem is on both ports, just a random.
It happens several times in an hour for each port.
Current connections are not broken, the problem is only for new connections.

I’ve sniffed packets on all interfaces while the problem appearing. In the dump I see, that my monitor host tries to connect sending TCP SYN packet to mikrotik. Mikrotik receives it, but does not respond. After that I see several SYN Retries from monitor host and again, no respond from Mikrotik.
If I look in Connections table of the router, the TCP state of theese connections are “syn sent”. But actually the router does not even try to respond. There are no any respond packets from the router at all.

In 40-60 secons the problem is gone and the router starts to answer normally. It’s strange for me!

If I disconnect the cable from ether1_WAN or ehter5_WAN_reserve of the router, then the remaining uplink is working fine and there is no problem.
I’ve also tried to do this scenario without mangle rules but with routing rules and has still the same problem.

Can this be a bug or not? Maybe I should configure something else? But it seems to be a bug because the router does not even try to send and answer but marks tcp state of connection as “syn sent”.

Mikrotik experts, please help me…

If you are sure that already established TCP sessions continue working even during the time when the anomaly shows up, the defense against SYN flooding is the first thing to come to my mind. As you state elsewhere that you cannot see thosepossible SYN flooding on tcp portmessages in the log, it may indeed be a bug as you suspect - but if you define a bug as “a behavior different from the author’s intention”, the actual bug may be just the absence of that log message and the throttling mechanism may still be working.

Another possibility would be missing ARP responses from the gateways, but on top of the fact that the established sessions continue working, the other reason why that doesn’t seem to be the answer is that it only happens if both WANs are up.

I assume you are familiar with the fact that connection tracking handles the received packets very early, so the connection state “syn sent” indicates nothing more than that the packet did arrive and constitute a new connection - what has happened to the packet later, i.e. in mangle, routing, filter, and the eventual local receiving process like the SSH server cannot be concluded from that. So printing the details of the connection might reveal whether the connection mark has been assigned to those connections but that wouldn’t help much as you were sniffing on all intefaces and haven’t seen the response anywhere, i. e. a routing error is not really likely (the gateway of the route chosen for the response would have to be dead, i. e. not responding to ARP requests).

Since a SYN flood remains the most likely cause, I would start from usingin-interface=etherX protocol=tcp tcp-flags=syn,!ack action=passthroughrules inraw to count incoming SYN packets (and later maybe some script to store the counters to a file once per second) - it is well possible that there is a threshold for SYN rate that is common for all inputs, i.e. that the rate of the attempts arriving to a single address are not enough to trigger the defense. TCP port 22 is one of the most attacked ones, especially if it does respond on a particular address, and 8728 is likely to be in the top 100.

Thank you all! I’ve solved this issue. I noticed that the problem is when someone else trying to connect via SSH or API simultaneously. If someone trying to connect e.g. to port 22 and starts TCP Handshake, and I do the same at this moment - then our TCP Handshakes are crossed by timestamps and as a result - Mikrotik just drops both connectuions and does not answer on my TCP SYN.

I have resolved this just by adding a firewall filter rule that drops any packets destined to 22 or 8728 port with src address not from my allowed list.

While I’m glad that you have resolved your issue, I am struggling to understand the root cause you’ve identified. Normally, a TCP server doesn’t have any issue to accept simultaneous connection attempts from multiple clients as these clients normally connect from distinct sockets (at least the client side port has to be different even if the client side address is the same). One of the purposes of TCP timestamps is indeed to filter out packets that belong to an old session, but this is relevant for packets belonging to sessions whose socket pairs are identical (i.e. a packet from the old session has arrived after the client side socket has been reused for a new one). Am I missing something?

1 Like