Ive been having a problem for a few months now, at 10am every day my Routers appear to die for 2 minutes, every day at the exact same time for the same length of time they start getting higher and higher ping untill they just time out and the network dies.
I have 2x RB1100AH configured as a primary router and a backup router. When the primary dies the secondary kicks in but after a few seconds it too gets overwhelmed and dies. Ive done all i can think of monitoring ports to see where the traffic is coming from and ive been monitoring the ports on my switches to see where its coming from but im not able to pin point it.
My network has 2x RB1100AH, 15 HP Procurve switches and approx 200 users.
Aside from just unplugging parts of the network at 10am to find the source im out of ideas. it looks to me like a scheduled task of some sort but ive checked all servers and devices and nothing has a task running at that time.
If anyone could suggest somewhere to start troubleshooting this i would be very grateful.
The problem is i dont know what to look for. I had some an idea that it may have been anti virus updates all going at once so i set some logging for that but that gave me nothing, infact during the time the routers drop out they dont seem to log everything.
What sort of things should i be looking for? i set up some graphs to monitor traffic on each port but it doesnt look like traffic volume is the cause as i see no spikes at that time.
If you have syslog then make log rules on all chains on your interface connecting the network to the remote syslog. (Or you could set a log action to disk but make sure you don’t leave that on)
At least you should get SOMETHING from the event telling you what the culprit is.
I did a wireshark monitor of the VLAN i suspected as the cause of the issue, and from what i can see 1 IP address has done an arp request on every IP in the 10.0.0.0/16 range.