Community discussions

 
User avatar
Cha0s
Forum Veteran
Forum Veteran
Topic Author
Posts: 767
Joined: Tue Oct 11, 2005 4:53 pm

IPv6 intermittent timeouts to random IPs

Mon Jul 09, 2018 5:24 pm

I have a setup in a datacenter running 2 CCR1036 in an active/standby setup.

Both CCRs have identical configuration and use VRRP for the failover.
This setup has been in use for over 4 years (an I suspect the problem I will describe is that old too)

Everything works perfectly fine except IPv6.

When CCR1 is master (which is pretty much always) IPv6 is almost unusable.
It will stop routing traffic for random IPs. Clients cannot even ping the router. This will either happen for a few seconds on a single IP or for hours or days for another IP. It totally random (both the timing and the affected IPs), I haven't found any patterns whatsoever as to what it's causing this and to which IPs.

On the other hand, when i failover to CCR2 (which remember, it has identical configuration with CCR1), then there is no packet loss to any IP for days (so far).

The IPv6 setup is rather simple.
Both routers do not make use of firewall (connection tracking is completely disabled) and in IPv6 do not have any special config.
We announce via BGP our /48 and have various /64s on various VLANs behind the CCRs and a some static routes for some /128 and /64.
IPv4 traffic works perfectly fine on both CCRs for years without any issue.

I tried resetting the configuration on CCR1 and reconfigured everything from scratch using an export (not restoring a full backup).
Nothing changed. The problem is exactly the same.

At this point the only thing I can think of is that the ROS installation on CCR1 is somehow bad and causes these issues.
The next thing to try would be to do a netinstall.
I tried doing this remotely (using a windows VM at the datacenter and bridging its network interface to port 8 of the CCR) but no matter how many attempts it would never show up in netinstall. Also tried on port1 but to no avail.

So I'll have to go to the datacenter myself to be able to do a proper netinstall with my laptop connected directly to port8.

My question at this point is, if anyone else has had similar issues and if they found a solution that might save me from the trip to the DC.
 
pe1chl
Forum Guru
Forum Guru
Posts: 4451
Joined: Mon Jun 08, 2015 12:09 pm

Re: IPv6 intermittent timeouts to random IPs

Fri Jul 13, 2018 10:48 am

When you say "clients cannot ping the router", do you mean clients at your local network or clients elsewhere on the internet?
When local, I would think it is something related to ND.

W.r.t. the reinstall, I have suggested in another topic that there should be a feature where you can put a clean RouterOS install
from a running router with 2 partitions into the inactive partition, optionally copy the config, and then switch over the active
partition and reboot to have a clean install. This feature could save a lot of netinstalls (well, except on new low-end hardware
with only 16MB flash of course!) and would be very handy for routers in datacenters or for doing such a reinstall outside business
hours when not onsite.
 
User avatar
Cha0s
Forum Veteran
Forum Veteran
Topic Author
Posts: 767
Joined: Tue Oct 11, 2005 4:53 pm

Re: IPv6 intermittent timeouts to random IPs

Fri Jul 13, 2018 6:12 pm

When you say "clients cannot ping the router", do you mean clients at your local network or clients elsewhere on the internet?
I mean local clients (servers) behind the router cannot ping the router (gateway).
They can ping each other (those under the same prefix of course).
 
pe1chl
Forum Guru
Forum Guru
Posts: 4451
Joined: Mon Jun 08, 2015 12:09 pm

Re: IPv6 intermittent timeouts to random IPs

Fri Jul 13, 2018 6:24 pm

So that could be an ND issue... Check what is happening in IPv6->Neighbors
(interestingly, the menus "ND" and "Neighbors" are swapped in IPv6)
 
User avatar
Cha0s
Forum Veteran
Forum Veteran
Topic Author
Posts: 767
Joined: Tue Oct 11, 2005 4:53 pm

Re: IPv6 intermittent timeouts to random IPs

Fri Jul 13, 2018 6:30 pm

So that could be an ND issue... Check what is happening in IPv6->Neighbors
(interestingly, the menus "ND" and "Neighbors" are swapped in IPv6)
ND is disabled.

Neighbors doesn't show anything useful apart from status 'failed' when an IP is not reachable.

At the same time, the same exact configuration works without a single packet lost on the second CCR.
So I am not sure this is a configuration issue. If it were, common sense says it should have affected both CCRs.

Who is online

Users browsing this forum: No registered users and 6 guests