One of my routers, decides out of the blue to stop routing traffic on its own. All the ips are in the arp table, all the static and dynamic routes show runnig, all the port negotiations are proper ( at least show proper), but for some odd reason the router provides provides conectivity to some of the ips in the range but not some others ( same range), and the whole thing is completly random, it happens at random times, it goes away at random times. Also , the aps that host the subscribers (also Mtik and directly connected to the router) CAN ping the very same ips that the router chooses not to. Also, when you mac telnet into any of the “dead ping” subscribers and start pinging the router from the command promt, the ping and all conectivity comes back. Please help, I run a huge isp on all mikrotik routers, and this is having desastarous consiquences on my business.
I saw this a few times in VERY isolated cases on one of my x86 routers before (even on 3.17 - you don’t mention what version you are running). After battling with it and being unable to resolve it, I replaced the router. It never happened again. If you put a console on the router, you’ll get some nasty error message flooding the screen untill kingdome come - can’t remember what the error is now though ![]()
In my case it seemed to have happend when the PPS through the router was high.
Considering new hardware solved it, I suspect I had some faulty hardware or memory or something
Another thing I noticed just now is that, when the unresponsive is ip manualy removed from the ARP table, all conectivity returns a few seconds later when the entry comes back dynamically.
And, yes, the router is an x86, 2.4 Ghz dual core, running 3.18 version.
No that’s definately different from what I had… I couldn’t even access my router to check arp when it happened to me… Nevermind it turning back to normal, even for a while ![]()
Is it perhaps not a issue with 3.18, considering it’s been released just a hour ago! make a support output file when it happens again, mail it off to MT, and I guess, try and see if the same happens on 3.17…
You really shouldnt be deploying new versions to production / critical routers within a few minutes of it being released
testing things is king in this business
This was also happening on 3.14 and 3.17, and I have changed the hardware-with an identical set mind you, but its deffinetly not the hardware. I simply had to try the new version, reaching desperation here…
How do I make a support output file?
In Winbox click Make supout.rif …
Third menu from the bottom
for fun you may want to /system hardware set multi-cpu=no (i think thats the command) and reboot and see if your problem persists. Just a thought.
Scott
There’s high chanche that the problem is not MT related. When you remove entries from ARP table, you are also refreshing Layer2 Swicthes paths and MAC table, because your rotuer will send a broadcast to which the clients respond and swicthes learn the destination port again. So your problem can either be:
- some clients or connected device (client) with proxy ARP enabled, and this makes the device respond ARP request that are not for them, and this stops connectivity to the real device. Check all of your connected device for proxy arp, and disable it if needed.
- some clients with the same mac address, and this confuses switches. Check all of your connected device for duplicate mac addresses.
- some clients with the same IP address, and this confuses router! Check all of your connected device for duplicate IP addresses.
- a faulty switch problem in the path from router to clients. If so, replace the faulty hardware.
Good luck!