Strange Wireless Issue

Hi guys, we have a 5.8Ghz link setup for a customer with a RIC522 at each end, simple enough.
The link is only about 400M long, not the best of links from a setup perspective, but has been a solid signal for nearly 12 months without an issue.
This link is a routed connection with an EOIP bridge across it which starts and terminates on routerboards other than the RIC522’s, so they simply route a small subnet between them happily.

Recently, the station end of the link started dropping and came straight up after a reboot. It started to happen more and more, to the point now it will stop a couple times a day in the worst case. Now here is the weird thing, when it stops working, I can jump on the AP and see the station still registered in the registration table, still good signal strength, but I can’t ping any addresses on it, or past it though. I can however MAC Telnet to it, and do anything I want to the router and it works, there is just no IP getting there at all. As mentioned before, after a reboot it then works fine. I can’t find anything in the logs, nothing at all to help diagnose the issue.

Ultimately right now I am down to conceding that it could be my first routerboard issue, and trying to figure out the best way to replace the board in it as it is 1000km away and the only person there doesn’t know the gear…
At the moment I just monitor it and when it goes down I log into the AP, MAC Telnet to it and reboot it, then off it goes for another X hours.

Any suggestions would be greatly appreciated.

The things I have tried so far are:

  • 5Ghz turbo and non-turbo mode
  • All the 5Ghz frequencies available
  • Nstreme and non-nstreme
  • Upgrading to latest stable ROS - 2.9.15 (I didn’t want to risk an upgrade to V3.x remotely)
  • When it is down, I have tried disabling ethernet interfaces, disabling IP addresses, everything I can think of to no avail.

Regards
Paul

So you can MAC ping the station from the AP but you can’t IP ping the station from the AP, correct? What changed? Yes, I know, “nothing changed” is the standard answer, but something probably did. If you have MAC connectivity and no IP connectivity, then look for an ARP issue. Did you enable proxy-arp somewhere and forget? Get simple and build back up.

Thanks for the reply, yes, MAC ping works, IP ping doesn’t, actually, in all seriousness, nothing changed, it just started playing up one day, we didn’t even have remote access at that stage, and nobody onsite has access to the routers.

If it was an arp issue, do you think I should be looking on the AP or station side ?

Regards
Paul

When you are in the failed state, look at the ARP table on the AP side to see if it has an entry for the IP on the other side. If so, see what interface that matches. If you have a bridge on the client side (you should) ensure the bridge is using an admin-assigned MAC and don’t let it inherit a MAC from the interfaces in the bridge because this can cause your bridge to change mac addresses on you.

OK thanks, I’ll check it when it breaks next.

The station end does appear under IP/Neighbours, but I will check the ARP table next time, thanks for the advice.

I’ll manually set a MAC address on the bridge then and give that a try

Regards
Paul

Since you are remote to the site, remember that SAFE-MODE is your friend.

OK, it broke again, the only things in the ARP table were IP’s from the ethernet side of the radio, not the wireless side, so it’s like the wireless card stops processing IP packets or something crazy. Perhaps the first thing to try is to replace the card on the station end, any other suggestions ?

Regards
Paul

What’s the bridge-mode on the station side set to? Look at the status of the bridge ports when you are in the failed state – see if that gives you a clue. Also, are your IP addresses assigned to the bridge, the wlan interface, or the ether1 interface? If something is happening to the bridge and the IP addresses are assigned to the bridge, then the system may not respond to the ARP requests. Maybe the bridge is in STP mode and gets BPDU’s from something plugged into the LAN side which is confusing it?

I doubt replacing the card will help since it knows nothing of layer-3 and since you have to ship out something to the remote site, I’d just ship a new router+card. Another WAG would be to execute a system reset on the station and reprogram it from start based on the theory that something in the config has gotton out of sync between what RouterOS has in its data files and what’s in the underlying linux OS config. Again, that’s a complete shot-in-the-dark. I rather think something is happening to the bridge.

But only the ethernet port and an EOIP port are in the bridge, the wireless ports are just routed, nothing fancy there, and I can’t even ping from wireless to wireless when this happens. I have confirmed the bridges don’t have STP on, I agree with not replacing the card only, I just can’t see how it would do it, it’s possible though…

Problem is I don’t have a spare complete unit, and at the price of the RIC522’s I can’t really afford to keep a spare :frowning:

Next time it fails I will check that status of the bridge ports anyway just in case it helps tell me something.

Thanks for the suggestions, it is appreciated.

Regards
Paul

OK, I looked at the bridge port status on the station side when it failed, no problems there, ports were looking OK.
Interesting thing though, I check the ARP table on the station and it could see hosts on the ethernet interface and wlan while it was in failed state, but the AP side did not have anything in it’s ARP table for the wlan interface side, strange. This means the station side CAN see IP from the AP, but the AP can’t see IP from the station.

Regards
Paul