Weird issue with NAT which is puzzling me

I am encountering a very weird issue, which I cannot find what’s going on:

I have a simple setup as follows:
1 x Mikrotik RB750 v4.11 Acting as my main router doing PPPoe to ISP, DHCP, NAT and Masquerade
1 x Linksys WRT300N used as AP, NAT and DHCP disabled, hence all wireless clients just use it as a bridge.
The networks serves about 3-5 workstations using wired connections, through unmanaged switches and about 3-5 wireless devices.

The issue is that while some workstation works 100%, others randomly have issues with internet access.

The issues encountered is NO browsing, but DNS resolutions, ping and traceroute to the requested sites works fine, in such status Skpye sometimes also works.

To go round this issue, I setup/enabled the proxy on the Mikrotik and then I am able to browse fine. However this is not a solutions, especially for mobile wireless, since its frustrating have to disable/enable the proxy, and some devices such as mobiles does not support.

At one point I was suspecting that the issue was with the Linksys AP, since upon rebooting it, it normally solved the problem. However gradually the problem started affecting some wired workstations as well.

I also suspected that I had some configuration on MT messed up due to various upgrades etc, so I started with a fresh config, however the problem reproduced itself.

I have checked arp, routing, DHCP leases, DNS cache, NAT rules (very simple), used torch to check that packets are arriving etc but can’t find anything wrong.

Has anyone encountered anything similar? Does someone have any idea what I can do to find the issue.

Many thanks in advance.

I continued investigating this and done the following:

  1. Removed an intermediary switch, right next to the MT, and instead used 3 ports as switch directly on the MT. I did this to eliminate the possibility of this small switch acting funny. Problem persisted.
  2. Changed the Linksys AP from AP bridge to Router (doing NAT, DHCP on a different subnet). Of course this would mean double nating. Problem persisted
  3. Changed the Linksys AP with a different one, same settings just AP bridge. Problem persisted.

At this point it very clear that the issue is the MT.

So I started using Wireshark to see what’s going on and compare between a machine experiencing the issue and one not.
I am seeing that on problematic machines, there are some lost packets, so I checked using netstat and torch on the MT, sessions / ports combine and are established, but it seems that traffic from MT get interrupted and does not reach the machine. Then the session goes into Wait_Fin_1 state for a long period.

To me it seems that MT somehow is doing sort of a tarpit to the traffic, but I do not have such settings, queues are all disabled.
Please any ideas???

Hi,

I don’t know whether it will work or not but atleast Try once after changing tcp-mss to 1400 on both in and out interface.

Hmm… interesting I have tried lowering the MTU without anychange, but I have not tried with MSS as yet.

Correct me if I am wrong, but this has to be done with Mangle right?

Correct on in Interface and out interface with in and out traffic

So I done the mangle on the PPPoe (DSL) on in and out, and indeed it did improve the situation. On some laptops that previously I had no browsing except through the proxy, now works.

But not yet 100% there, I am still experiencing the same problem intermittently.

Any help?

BTW, is there anyway how I can monitor the MSS situation on the MT?

Hi,

If your RB750 acts as PPPoE client, check that in PPP current profile ‘change-tcp-mss=yes’ and leave your MTU set to default 1500.
This will add two dynamic mangle rules and do the trick :slight_smile:

Hope this helps, Grzegorz.

Done mate, its very easy this way :slight_smile:

Unfortunately I am still experiencing problems, but now I am suspecting its the Linksys WRT300N Wireless router. Randomly internet browsing does not work through it.