Community discussions

 
User avatar
Cha0s
Forum Veteran
Forum Veteran
Topic Author
Posts: 827
Joined: Tue Oct 11, 2005 4:53 pm

IPv6 intermittent timeouts to random IPs

Mon Jul 09, 2018 5:24 pm

I have a setup in a datacenter running 2 CCR1036 in an active/standby setup.

Both CCRs have identical configuration and use VRRP for the failover.
This setup has been in use for over 4 years (an I suspect the problem I will describe is that old too)

Everything works perfectly fine except IPv6.

When CCR1 is master (which is pretty much always) IPv6 is almost unusable.
It will stop routing traffic for random IPs. Clients cannot even ping the router. This will either happen for a few seconds on a single IP or for hours or days for another IP. It totally random (both the timing and the affected IPs), I haven't found any patterns whatsoever as to what it's causing this and to which IPs.

On the other hand, when i failover to CCR2 (which remember, it has identical configuration with CCR1), then there is no packet loss to any IP for days (so far).

The IPv6 setup is rather simple.
Both routers do not make use of firewall (connection tracking is completely disabled) and in IPv6 do not have any special config.
We announce via BGP our /48 and have various /64s on various VLANs behind the CCRs and a some static routes for some /128 and /64.
IPv4 traffic works perfectly fine on both CCRs for years without any issue.

I tried resetting the configuration on CCR1 and reconfigured everything from scratch using an export (not restoring a full backup).
Nothing changed. The problem is exactly the same.

At this point the only thing I can think of is that the ROS installation on CCR1 is somehow bad and causes these issues.
The next thing to try would be to do a netinstall.
I tried doing this remotely (using a windows VM at the datacenter and bridging its network interface to port 8 of the CCR) but no matter how many attempts it would never show up in netinstall. Also tried on port1 but to no avail.

So I'll have to go to the datacenter myself to be able to do a proper netinstall with my laptop connected directly to port8.

My question at this point is, if anyone else has had similar issues and if they found a solution that might save me from the trip to the DC.
 
pe1chl
Forum Guru
Forum Guru
Posts: 4868
Joined: Mon Jun 08, 2015 12:09 pm

Re: IPv6 intermittent timeouts to random IPs

Fri Jul 13, 2018 10:48 am

When you say "clients cannot ping the router", do you mean clients at your local network or clients elsewhere on the internet?
When local, I would think it is something related to ND.

W.r.t. the reinstall, I have suggested in another topic that there should be a feature where you can put a clean RouterOS install
from a running router with 2 partitions into the inactive partition, optionally copy the config, and then switch over the active
partition and reboot to have a clean install. This feature could save a lot of netinstalls (well, except on new low-end hardware
with only 16MB flash of course!) and would be very handy for routers in datacenters or for doing such a reinstall outside business
hours when not onsite.
 
User avatar
Cha0s
Forum Veteran
Forum Veteran
Topic Author
Posts: 827
Joined: Tue Oct 11, 2005 4:53 pm

Re: IPv6 intermittent timeouts to random IPs

Fri Jul 13, 2018 6:12 pm

When you say "clients cannot ping the router", do you mean clients at your local network or clients elsewhere on the internet?
I mean local clients (servers) behind the router cannot ping the router (gateway).
They can ping each other (those under the same prefix of course).
 
pe1chl
Forum Guru
Forum Guru
Posts: 4868
Joined: Mon Jun 08, 2015 12:09 pm

Re: IPv6 intermittent timeouts to random IPs

Fri Jul 13, 2018 6:24 pm

So that could be an ND issue... Check what is happening in IPv6->Neighbors
(interestingly, the menus "ND" and "Neighbors" are swapped in IPv6)
 
User avatar
Cha0s
Forum Veteran
Forum Veteran
Topic Author
Posts: 827
Joined: Tue Oct 11, 2005 4:53 pm

Re: IPv6 intermittent timeouts to random IPs

Fri Jul 13, 2018 6:30 pm

So that could be an ND issue... Check what is happening in IPv6->Neighbors
(interestingly, the menus "ND" and "Neighbors" are swapped in IPv6)
ND is disabled.

Neighbors doesn't show anything useful apart from status 'failed' when an IP is not reachable.

At the same time, the same exact configuration works without a single packet lost on the second CCR.
So I am not sure this is a configuration issue. If it were, common sense says it should have affected both CCRs.
 
User avatar
Cha0s
Forum Veteran
Forum Veteran
Topic Author
Posts: 827
Joined: Tue Oct 11, 2005 4:53 pm

Re: IPv6 intermittent timeouts to random IPs

Thu Aug 02, 2018 2:58 pm

So far I've narrowed down this to VLANs.

Using IPv6 on normal interfaces works without any packet lost.
Using IPv6 on VLAN interfaces (under an sfp+ interface - if it somehow makes any difference) will cause random packet loss to random IPs.
It's like the neighbor solicitation/advertisement packets are getting lost somewhere and the router can no longer find the MAC address for an IP.

What's driving me crazy about it, is that the second CCR with the exact (rechecked a full export line by line for both CCRs) configuration on the same exact physical location, connected on the same physical switches, etc, does not exhibit this behavior.
 
ugenk
just joined
Posts: 4
Joined: Sun Nov 23, 2014 7:46 pm

Re: IPv6 intermittent timeouts to random IPs

Wed Sep 05, 2018 10:23 pm

We have same situation with 1036 and centos7 kvm vm with static ipv6.

If we're doing ip -6 nei flu all, then OS resends nd and everything works for some time.

We have proxmox and two nexus'es with latest sw between VM and 1036/6.40.8. We've disabled firewalls/iptables everywhere, and disabled igmp snoopings. Nothing happends.

Workaround (haha):
# cat cron_fix_ipv6.sh
#!/bin/sh
for i in `seq 1 55`
do
ip nei flu all
ping -6 ipv6gw -c 1
sleep 1
done
 
User avatar
Cha0s
Forum Veteran
Forum Veteran
Topic Author
Posts: 827
Joined: Tue Oct 11, 2005 4:53 pm

Re: IPv6 intermittent timeouts to random IPs

Wed Sep 05, 2018 10:33 pm

I still haven't found any solution.

I did a netinstall and the problem persists.

As a temporary workaround I've set up a VM with MikroTik which acts as the router for IPv6.
So I have a static route from the CCRs to that VM via a physical interface instead of the VLAN interfaces, and then I have the VLANs configured on the VM.
IPv6 on non-vlan interfaces does not have any problems. So this static route works fine.

For the time being this is somewhat acceptable since IPv6 traffic is pretty much 3-4% of our total traffic and routing via a VM is not a bottleneck yet.
But it still drives me crazy what on earth can possibly be causing this behavior. Especially when CCR2 with an identical configuration works perfectly fine.

Any solution that involves the clients/VMs/servers is not acceptable to us as we provide collocation/dedicated servers/VMs with no access to them whatsoever.
 
User avatar
Cha0s
Forum Veteran
Forum Veteran
Topic Author
Posts: 827
Joined: Tue Oct 11, 2005 4:53 pm

Re: IPv6 intermittent timeouts to random IPs

Wed Sep 05, 2018 10:44 pm

Also, after the netinstall, I configured everything manually, I didn't restore the configuration from a backup just to make sure that the 'problem' was not restored with it. But it didn't make any difference.
 
mducharme
Trainer
Trainer
Posts: 666
Joined: Tue Jul 19, 2016 6:45 pm

Re: IPv6 intermittent timeouts to random IPs

Thu Sep 06, 2018 12:22 am

When you have this issue can the client ping the gateway via its link local?

Share the IPv6 addressing, ND and VRRP portion of your config(s), I might know what the problem is.

Who is online

Users browsing this forum: Bing [Bot], DanielM1, fugazi, nikc and 73 guests