Wireless interface - it looks like a bug in ARP reply-only

Hello,

I have about 50 RBs with wireless interfaces and each one has this issue:

Interfaces have ARP set to reply-only and of course there are static ARP entries. Sometimes after MT reboots (for example due to power outage) wireless clients can’t connect to network. It means: all the clients are normally connected in wireless registration table but they are inaccessible - I can’t ping them. The solution is to disable the wireless interface on MT, wait several seconds and enable the interface. Clients connect again and they are already accessible - I can ping them now.

This issue concerns different RouterBoards (RB433, RB600, RB532), different versions of RouterOS (4.10, 4.17, 5.1, 5.6) and different wireless cards (Senao EMP3601, Ubiquity UB-5, R52H). I didn’t notice such problem with ethernet interfaces.

I tried to investigate this problem. It’s quite hard to do because sometimes after the reboot of RB all is OK. It happens that I reboot RB more than 10 times and every time all is OK. On the other hand it happens that I reboot RB several times and each time clients are inaccessible (I have to turn off and turn on wireless interface).

While problem persist (clients are inaccessible) there is little traffic on the interface. I caught it using sniffer and I see:

  1. MikroTik (10.50.0.1) sends ICMP packet to client (10.50.0.2) - all is correct
  2. The client device asks using ARP: Who has 10.50.0.1? - this is still correct
  3. … MikroTik doesn’t reply the ARP request - and this is the problem!

Please find in attachment the file from sniffer. There is also file with configuration of RouterOS on which I was examining the issue.

When I switch ARP setting on interface from “reply-only” to “enabled” the problem disappear - I can reboot the MT and clients connect without any problem every time. But of course it is not solution :slight_smile:

Currently I use this scheduler entry (at startup):

delay 10
/interface wireless set [find disabled=no && arp="reply-only"]
comment="restart" disabled=yes
delay 5
/interface wireless set [find comment="restart"] comment="" disabled=no

It works well but this is a clumsy workaround. I hope you help me to find better solution :slight_smile:

Regards

PS. I reported this issue to support [Ticket#2011042566000258] but.. it’s sad to say that MT has WORST support I’ve ever seen! I sent them files they asked and there is no reply for 7 weeks! I remind them 3 times about this issue and still there is no reply from support…
config-and-sniff.zip (11.5 KB)

PS. I reported this issue to support [Ticket#2011042566000258] but.. it’s sad to say that MT has WORST support I’ve ever seen! I sent them files they asked and there is no reply for 7 weeks! I remind them 3 times about this issue and still there is no reply from support…

I’m sorry you haven’t got a reply at time. I’m checking your ticket.

While problem persist (clients are inaccessible) there is little traffic on the interface. I caught it using sniffer and I see:

  1. MikroTik (10.50.0.1) sends ICMP packet to client (10.50.0.2) - all is correct
  2. The client device asks using ARP: Who has 10.50.0.1? - this is still correct
  3. … MikroTik doesn’t reply the ARP request - and this is the problem!

Please find in attachment the file from sniffer. There is also file with configuration of RouterOS on which I was examining the issue.

When I switch ARP setting on interface from “reply-only” to “enabled” the problem disappear - I can reboot the MT and clients connect without any problem every time. But of course it is not solution

Currently I use this scheduler entry (at startup):

As far as I know we have tested the same problem on few different routers. Routers were rebooted >30 times without any luck. Support person is researching your sniffer file now. We have even tried to reproduce the same problem with your configuration, but clients communicated all the time correctly.
But I think about alternative solution for your problem, just use access-list to filter unauthorized clients (or to invent additional security in your network use WPA/WPA2 additionally).

Thank you very much for your time spending to investigate my problem. I’m sorry for bad words - I was just disappointed due to 7 weeks waiting without any answer.

Below there is step-by-step procedure how I configure MT and the problem appears. I think there is very simple configuration - I changed just few settings..

I used:

  • RouterBoard 433 (a new one)
  • 2 wireless interfaces R52H (I used two wireless interfaces because with only 1 wireless interface problem appears very rarely).
  • 2 wireless access point clients: TP-Link WA501G and Minitar MWGAR (I think you can use any other cheap APCs)
  1. Plug R52H interfaces to J401 and J402 mPCI slots.

  2. Power on RB using PoE adapter 18V.

  3. Connect PC to RB ether1 (thorugh PoE adapter).

  4. Upgrade RB to 5.6.

  5. Reset configuration of RB. After reboot click “Remove Configuration”.

  6. Configure wireless interfaces:

/inter wireless
set 0 mode=ap-bridge arp=reply-only band=2ghz-b frequency=2412 disabled=no
set 1 mode=ap-bridge arp=reply-only band=2ghz-b frequency=2452 disabled=no ssid=MikroTik2
  1. Set IP addressess:
/ip address
add address=10.0.0.1/24 interface=ether1
add address=10.0.1.1/24 interface=wlan1
add address=10.0.2.1/24 interface=wlan2
  1. Add static ARP entries:
/ip arp
add mac-address=00:19:E0:A5:E6:F2 address=10.0.1.2 interface=wlan1
add mac-address=00:02:72:6E:7A:23 address=10.0.2.2 interface=wlan2
  1. Disconnect PC with RB and configure other devices properly:
  • on TP-Link ssid set to: MikroTik, IP address 10.0.1.2/24, gateway: 10.0.1.1
  • on Minitar ssid set to: MikroTik2, IP address 10.0.2.2/24, gateway: 10.0.2.1
  • on PC set IP to: 10.0.0.2, gateway 10.0.0.1
  1. Connect PC to RB ether1 and using command line check that configuration is correct:
ping 10.0.1.2 -t

Reply from 10.0.2.2: bytes=32 time=1ms TTL=254
Reply from 10.0.2.2: bytes=32 time=2ms TTL=254
Reply from 10.0.2.2: bytes=32 time=2ms TTL=254
Reply from 10.0.2.2: bytes=32 time=2ms TTL=254
Reply from 10.0.2.2: bytes=32 time=2ms TTL=254
Reply from 10.0.2.2: bytes=32 time=2ms TTL=254



ping 10.0.2.2 -t
Reply from 10.0.1.2: bytes=32 time=1ms TTL=63
Reply from 10.0.1.2: bytes=32 time=1ms TTL=63
Reply from 10.0.1.2: bytes=32 time=1ms TTL=63
Reply from 10.0.1.2: bytes=32 time=2ms TTL=63
Reply from 10.0.1.2: bytes=32 time=1ms TTL=63
Reply from 10.0.1.2: bytes=32 time=1ms TTL=63

So it’s OK.

I used -t parameter so ICMP packets are still sending and receiving.

  1. Reboot RB:
/system reboot 
y
  1. Check if there are ping responses after reboot. If yes - reboot RB again. If no - there is the problem.

I catched problem after first reboot. There were both devices on wireless registration page but Minitar was unreachable (ping timeouts) - while TP-Link was ok. I rebooted Minitar but didn’t help. I disabled wlan2 on MT, waited 3 seconds, enabled wlan2 and now I have connection with Minitar.

After second reboot both devices were working well. But after several seconds Minitar stopped working. I had to disable and enable wlan2 again and Minitar was accessible.

I rebooted MT 5 times and each time there was problem with connecting to Minitar. I made change: I set Minitar connecting to wlan1 and TP-Link to wlan2. After 5 reboots: Minitar was OK each time, TP-Link was unreachable 4 times.

It is not rule that problem is always with wlan2. On different RouterBoards with different cards and configuration I had this issue with every wireless interface. I spend many many hours investigating the factor on which depend if the problem appears - but without success.

I made this step-by-step procedure again and now everything was OK (I rebooted MT 10 times). Then I stopped pinging to both devices, rebooted MT and waited several minutes. Then I start to pinging and both devices were out. It’s really strange.

If you after all won’t notice this problem, please try

  • adding 3d wireless interface and 3d client device
  • turning off RB for several minutes
  • waiting several minutes after RB reboots
  • doing whatever using winbox (disable another interface, change arp to enable and then to reply-only again, etc..)

I’m sure you will catch this problem because it appears often. If it was appearing once in 20 reboots I would don’t care of it..

Regards

Thank you very much for the detailed description.
Person who was responsible testing this issue, followed your instructions precisely and ARP problem didn’t show after multiple reboots on different devices.
Now, we took few other boards to make the same tests again.

Thank you. I’m quite surprised because I could recreate this issue easily. Did he follow these hints?

  • turning off RB for several minutes or longer - leave RB powered off for 2 hours and then check again
  • waiting several minutes after RB reboots and then start to ping
  • doing whatever using winbox (disable another interface, change arp to enable and then to reply-only again, etc..)
  • changing ARP static entries (move client devices from wlan2 to wlan1 etc..)

Problem shows only after reboot. It never showed suddenly while MT was already working.

Reboot only the RB. Client devices should be working constantly.

What wireless client devices did he use? I didn’t test this with connection MT to MT. Problem seems to appear independent on client device (I had the issue with different cheap devices: TP-Links, Minitars, Ubiquities..) but maybe when MT connects to MT there is ok.. Have you got some cheap access points - different than MT?

I sent you long time ago Supout.rif file that was made when problem was active. Wasn’t anything suspicious in this file? I sent also file from sniffer so you clearly see that MT didn’t work properly. It worked as if ARP was disabled, not reply-only.

Yes, correct we have used MikroTik-MikroTik network to get the problem.
Now, for the first time we were able to catch the problem (now it will be easier to find and fix the reason).

Yeah, thank you. So I’m waiting for fix :slight_smile:

Thank you very much for the detailed problem report and patience.
Problem is fixed at RouterOS 5.7 version.

Hi …

Nice to hear abt that. I use to had this issue with reply-only ARP on ap-bridges interfaces and since then I was using the very same turn around: a small script just 30 seconds after reboot which disable and enable interface back.

The radio card is R52n (2GHz only G) at a RB433UAH and this behaviour was noticed since version 4.X.

Regards;

I’ve just upgraded to 5.7 and tested several RBs and problem seems to be resolved :slight_smile:

Thank you!