I have a Mikrotik Cloud series router connected with 1Gb fibre to my ISP, and I have a second Routerboard on which I created another subnet. Each router does DHCP for its own subnet, and the RB is uses the Cloud as its default router. The RB is running routeros 7.5 and the Cloud routeros 6.44.6.
Ping and traceroute show <1ms latency between the RB and the Cloud on both ipv4 and ipv6, however when trying to connect with real traffic the Cloud is contacting test-ipv6 over ipv4 in 3ms, but the RB has a latency of 7 whole seconds on ipv4. Web browsing is painfully slow and imap keeps timing out. Wreshark is only telling me that there is no response and a lot of retransmits yet, but ICMP is super fast. Both RB and Cloud use the same DNS.
If traceroute is showing no latency, and wireshark doesn’t help, what’s the next step in troubleshooting?
I’m not familiar with interface lists in routeros 7.x so I’m not sure how LAN and WAN are defined. The default NAT is configured for WAN.
However the RB is not routing to the Cloud. It’s a layer2 bridge between the routers.
That brings up another interesting point about VLANs. Port 8 on the Cloud is connected to port 1 on the RB. The RB has no VLANs defined, but port 8 on the cloud is bridged to a VLAN. Let’s draw a picture …
There might be a way to direct NAT on the main router to router 2 and masquerade on that router then. I won’t have time until Friday to test my theory, though - and I might not be knowledgeable enough to get it to work, since I always set NAT on the router connected to the gateway. In this sense, the 1 router failover guide provided by MT is simple and logical to me.
I played with this a bit, and the major issue for me is the path switching between both routers. NAT is required by the gateway in IPv4, so obviously using two routers, even if you reverse the path, still incurs double NAT.
I got failover working (following the MT guide) quite easily when the second router is used as a switch (no NAT, no DHCP on R2): is there a reason why you can’t follow that topology ?
A simple icmp redirect is causing the huge latency - as the routers are bridged the Mtik-RB is using icmp redirect to send traffic to the Mtik-Cloud rather than routing. All the clients are simply taking ages to handle this and aren’t updating their default router.
After changing the DHCP default router setting everything works perfectly - I guess when it’s necessary to use cellular I’ll just have to use a script to update the DNS settings and then disable/enable all the interfaces to cause a DHCP renew.