Can't get internet access apart from dns,ping (5.0 RC1)

I am trying a very simple setup of one ppoe connection to the internet and one ethernet connection to my LAN. I got routing, nat and DNS setup. ping to any site on the net works fine but i can not access webpages. I am not sure if running the OS on a VMware server is effecting thing and i know it complains about the root user when it starts up but i can’t work out what is wrong. I have no firewall rules so that should not be blocking things. It looks like traffic can get out but not get back in unless it is a ping for some reason.

Any tips on what i could be doing wrong?

Post the output of

/ip address print detail
/ip route print detail
/ip firewall export

and a network diagram.

ok here is the output from the comands

[admin@MikroTik] > /ip address print detail
Flags: X - disabled, I - invalid, D - dynamic
 0   ;;; default configuration
     address=192.168.88.1/24 network=192.168.88.0 broadcast=192.168.88.255
     interface=ether1 actual-interface=ether1

 1   ;;; added by setup
     address=192.168.1.1/24 network=192.168.1.0 broadcast=192.168.1.255
     interface=ether1 actual-interface=ether1

 2 D address=202.164.195.51/32 network=203.123.72.66 broadcast=0.0.0.0
     interface=aaNet actual-interface=aaNet
[admin@MikroTik] > /ip route print detail
Flags: X - disabled, A - active, D - dynamic,
C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme,
B - blackhole, U - unreachable, P - prohibit
 0 ADS  dst-address=0.0.0.0/0 gateway=203.123.72.66
        gateway-status=203.123.72.66 reachable aaNet distance=1 scope=30
        target-scope=10

 1 ADC  dst-address=192.168.1.0/24 pref-src=192.168.1.1 gateway=ether1
        gateway-status=ether1 reachable distance=0 scope=10

 2 ADC  dst-address=192.168.88.0/24 pref-src=192.168.88.1 gateway=ether1
        gateway-status=ether1 reachable distance=0 scope=10

 3 ADC  dst-address=203.123.72.66/32 pref-src=202.164.195.51 gateway=aaNet
        gateway-status=aaNet reachable distance=0 scope=10
[admin@MikroTik] > /ip firewall export
# oct/18/2010 19:49:10 by RouterOS 5.0rc1
# software id = P8IY-IBFL
#
/ip firewall connection tracking
set enabled=yes generic-timeout=10m icmp-timeout=10s tcp-close-timeout=10s \
    tcp-close-wait-timeout=10s tcp-established-timeout=1d \
    tcp-fin-wait-timeout=10s tcp-last-ack-timeout=10s \
    tcp-syn-received-timeout=5s tcp-syn-sent-timeout=5s tcp-syncookie=no \
    tcp-time-wait-timeout=10s udp-stream-timeout=3m udp-timeout=10s
/ip firewall filter
add action=log chain=forward disabled=no in-interface=ether1 log-prefix=\
    forward
add action=log chain=input disabled=yes in-interface=ether1 log-prefix=input
add action=log chain=output disabled=no log-prefix="" out-interface=aaNet
add action=log chain=input disabled=no in-interface=aaNet log-prefix=\
    "aanet in"
/ip firewall nat
add action=masquerade chain=srcnat disabled=no out-interface=aaNet
/ip firewall service-port
set ftp disabled=no ports=21
set tftp disabled=no ports=69
set irc disabled=no ports=6667
set h323 disabled=no
set sip disabled=no ports=5060,5061
set pptp disabled=no
[admin@MikroTik] >

While i do have a complicated setup of multiple networks and multiple internet connections for the minute i am trying to start off with something very simple. ether1 is my LAN on ipaddress 192.168.1.1/24 and dhcp. ether3 has my dhcp connection to my internet provider. That ppoe connection is named aanet. I am trying to give the LAN full access to the internet via aanet and to block incoming connection requests via aanet once i get that working.

If it still needs a diagram to make it clear let em know and i will post one.

Chris

Guys i have now confirmed this exact same setup works perfectly with 4.11 so it seems that it is a V5.0 RC1. I know this is the wrong forum but did not know for sure 5.0 RC1 was the issue before. Can anyone help? Also trying to downgrade but got issues there too.

ok now i really need some help. so as i mentioned before i have another vm setup exactly the same as my ver 5.0 RC1 vm that works perfectly. SO i have now downgraded my V5.0 RC1 vm to 4.11 as well and it still has issues. Now i am sure people are thinking it is config related cause i know i did as well so i have attached 2 config files one from the working vm and one from the downgrade vm that does not work. Apart from mac addresses, a slight case difference in the ppoe connection name and some user section at the end they are identical as far as i can tell.

Can anyone give me ideas on how to fix this as the stuffed one is my licensed one and the working one is not for my license.


*** EDIT: removed files ****

You have this line in your notworking one that isn’t present in the working one, but it doesn’t appear it should be making a difference (unless you’re using 192.168.88 somewhere else)

add address=192.168.88.1/24 broadcast=192.168.88.255 comment=
“default configuration” disabled=no interface=ether1 network=192.168.88.0

You also have a slight difference in /tool user-manager customer, which shouldn’t affect connectivity. (this is the user section at the end you referred to?)

I assume you’re connecting everything to the same ports? And you’re running these in virtual machines? Are you sure the virtual machines are configured the same, particularly which ethernet ports match up to which virtual ports?

When the notworking one is running, is the PPPoE connection up, and did it add a 0.0.0.0 route to the routing table? What happens if you traceroute somewhere?

When you do these things do you see anything appear in the /ip firewall connection pr?

Sorry, I got distracted several times while composing that last response, I see you already posted some of what I asked for, and I didn’t read your problem carefully enough.

So ping to the internet works, but not web pages, and this is the case with both 4.11 and 5.0rc1 on your notworking vm? Can you ping the website by name and by address? (is it just a DNS resolution issue?). If those work, what happens if you telnet to port 80 of the website? If you get a blank screen try typing something and hitting enter. It will probably give you an error message and kick you out if you’re really connected to a web server.

Can you ping using larger packet sizes and see what happens?

How much memory are you giving the VMs? Is there anything suspicious in /log pr?

yes both the not working 4.11 and 5 RC1 can ping any internet domain name without an issue so not DNS releated.

I get a blank screen but text i type does not show up and i get nothing back. Looking at the router it looks like it establishes the connection but something blocks all the data coming back via the router but i could just be reading things incorrectly.

Sorry not sure how to do this but considering i have another 4.11 setup exactly the same not sure how this would be different between the 2. Happy to test if you can give me the command.

Ok this is the slight difference between the VM’s. The working one has a 256MB and the not working has 512MB.

There are no errors or anything just looks like normal start up stuff like ppoe connection up and when i log into admin client.

I have sent an email to support but i would like to keep working on it here and i pointed support to this forum so they can see what we have tested so far. Is there any way to blow away the current install and reinstall without losing the softkey so i add my license back in? I tried putting the install CD in the drive to do a reinstall but the CD is ignored and the OS just boots so i could not do a reinstall that way.

Regards

Chris

/ping a.b.c.d size=1400

Both should be fine and plenty. The VMs are on the same physical hardware with the same physical resources mapped to virtual resources and the virtual resources are the same? The VMs have the same CPU count? And how much disk space? (for instance, if its vmware, one isn’t using bridged networking while the other one is using natted or host networking?)

[/quote]

You can do a /system reset-configuration, which should be the equivalent, mostly. This should just reset the configuration as if you had just installed it, and leave the key intact, but you might want to backup your vm container just to be safe.

You could start a packet-capture (/tools packet-sniffer, easier in winbox) on the pppoe interface (after its up) and see if packets are coming back, and/or watch the interface counters. Can you access the router on whatever IP its pppoe session got from an external location? (the winbox packet sniffer won’t fill in the packets until after you stop it, so don’t expect to see anything until you stop it).

In the two routers (working and notworking) do they have the same packages enabled, and are they all the version they are supposed to be? (/system package pr)

You could also look in the /system resource stuff and make sure everything is the same, IRQs and such (to verify the VMs are the same).

The most likely thing I can think of is that one of the VMs has a different physical to virtual ethernet configuration in some way. Or you have two (or more ) VMs running that have the same “virtual” mac address and packets that should be going to one VM are getting directed to the wrong one. (you might want to try that telnet to the web server directly from the routeros VM, if it wasn’t before (/system telnet a.b.c.d port=80) and see what happens).

Immediately after upgrading/downgrading a routeros version it is a good idea to look in the log and see if there are any errors. It should say something along the lines of “successfully installed routeros 5.0rc1” or words to that effect (for whatever version). Personally, after a couple nasty instances, I’ve found its a good idea when uploading a new version to download it immedately after uploading and compare them before actually rebooting to verify what was uploaded is not corrupted. (RB192s we had were notorius for getting uploads to the flash filesystem corrupted if they came in on any port besides ether1.)

Upgrade procedure that I use:
upload routeros.
download routeros (into different directory than where you uploaded from)
in windows command prompt/cmd, use comp command with path to uploaded version and path to downloaded version.
in unix use diff command with same arguments.
if they compare ok, reboot router, otherwise delete uploaded image.
after reboot, check in /log to verify everything went well.
if routerboard, check in /system routerboard to see if firmware needs upgraded. if so, /system routerboard upgrade, reboot, check in log again (there is no sucess message for this one, just make sure there are no errors).

ok sort of got somewhere now but i have no idea what it means. what i have discovered is that apps that are not http or https seem to work like gtalk, msn and skype. I did not notice them connecting before with all my other issues. It looks like it is only http/https traffic effected. What gets even stranger is it seems the web browser does manage to get content back from the web server but only a a small amount maybe a packet of data. If i do view source while the page is still trying to load i can see a bit of the page source there even for sites i have never visited before.

This really has me stumped.

vmware esxi is different to vmware server. the network connections connect directly to virtual switches that connect to network cards and real switches. All my VM’s are set to the same virtual switches and i shut one down before starting the next one. I run a totally different firewall to make these posts plus i have 3 different router os firewalls i am testing with different versions and configs to try to work it out.

any ideas are welcome and i still need to do large ping test and try the telnet from the router.

telent to webserver from the router works but not from desktop.

I also sniffed the packets of a web request can anyone tell me if they look correct or not. I have attached an image of the packet sniff.
packetsniffing.jpg

It could be an MTU issue (which is usually the case when things seem to work except for websites, especially https websites), but that shouldn’t change between the two VMs.

Since your telnet test seemed to work from the router, but not the PC, it sounds like its either not NATting correctly, or not routing correctly, or the MTU problem is on the ethernet side of the router.

From that packet sniff snippet though everything seems ok (I forget what causes the 6 byte discrepancy but I think its normal, I think its recording extra information on inbound that it doesn’t need to record for outbound).

Probably need to see more of the sniff to see what it looks like when the packets are large.

xxii there was not much more in that packet sniff log part from traffic from winbox to manage the router. I will try again tonight when i get home to get more info. I have been playing with the MTU but so far that has not helped and when ever i changed the MTU on the Ethernet side things broke. I also noticed that while the MRU is set on the ppoe connection the actual one shown on the status screen is larger than the one set on the ppoe config screen but not sure what that means.

while i have been playing around with networks, firewalls etc for years i have to say this issue is out of my abilities. Isn’t there some way to locate the issue? if it was an MTU issue wouldn’t packets be dropped or something to wouldn’t is show up in the packet sniffing? Is there something i can change in my config to better locate the issue?

At this stage i am thinking of taking my modem out of bridge mode but then port mapping etc becomes a pain as i have 2 firewalls and 2 port mappings to do on both routers.

Chris

I’ve been trying to approach this from what could be different between your VMs, and unless there is a difference between the VMs handling of ethernet somehow, I don’t see how it could be MTU.

So on the packet sniff, are there any large packets, and which side was the first to not reply? Were there only those 4 packets in your screenshot?, otherwise, Did any packets come in the ethernet and not go out aanet or vs versa? (basically, as you mentioned, we’re trying to see if packets were dropped somewhere). If you browse to the routeros VM itself, does it come up (I mean the routeros running in the VM. routeros has a web interface.) If you lower the MTU on whatever you’re browsing from to 1480 or less, and try to browse what happens?

Did you ever try those large pings (from the router and from your pc (if its windows ping -s 1400, add -f to set don’t-fragment flag)), you can experiment with different sizes and dont-fragment options to figure out what the MTU is between you and whatever you’re pinging.

Basically its a matter of poking and prodding until you figure out where the problem is, which as I recall is that the two VMs are configured identically (except for memory) but aren’t behaving identically (and the broken one is the one with your license); which means something that makes a difference, somewhere, isn’t really identical, either with the VMs themselves, or the stuff in the VMs, or your upstream is seeing some difference and acting differently. (firewalls based on mac-addresses for instance, although why they would only kill web traffic seems strange).

Another thing you could try, is on the VM that works, open FILE in winbox, click on backup. copy off the resulting file, move it into the other VM’s FILE (making sure the other VM is running the same routeros version with the same packages enabled), and do a restore. That should ensure both have an identical configuration. (however, if you’ve renamed any physical interfaces (from routerOS’s point of view) (I don’t remember that you did), it won’t restore properly).

MTU issues generally arise because one end has a different MTU than the other, AND someone along the way thought it was a good idea to block ICMP (it isn’t, at least not bluntly), AND fragmentation isn’t allowed or blocked somewhere. You have change-MSS enabled on your pppoe connection in both VMs as I remember, so that should mitigate that. (simplified; change-mss alters the initial tcp packets passing through the lower-MTU PPPoE connection so the other side (or your side) doesn’t ever try to send too-large packets to start with). Old routerOS versions had a bug with this, and didn’t apply change-MSS correctly, I forget when exactly it was fixed, but I think it was in the early 3.x or very late 2.9.x series. If there is an old RouterOS with PPPoE somewhere between you and the other end, that could be an issue, however that wouldn’t change between your VMs.

I rambled a bit, I hope there is something helpful in there.

your help has been great and between your suggestions and support i am hoping to fix this issue.

Following your suggestion i did ping tests from windows but i used -l instead of -s not sure if it makes a difference. Anyway that was fine and windows could handle up to 1470. Support suggested i do a similar test from the router to my network as well so i went to give that a go and found it failed with just a normal ping.

I can talk to the router via its web management page or via winbox and i can ping it just fine but it seems to be unable to ping me at all. Just in case it is some stupid hidden protection in win7 i am taking home a win xp laptop since i know how to make sure that firewalls etc are not effecting that text but it looks like the router is having some sort of issue talking back to the network.

Oh as for packet logs i did another one last night but i forgot to post it so i will do so tonight or tomorrow morning but it looks like the first big packet is 1500 and comes in over the ppoe connection. Is there a way to turn on packet fragmentation support just to see if that is the issue or is that not a good idea?

---- added ----
I just realised the router where the pings failed runs a different config than the rest as a test. I will test the other routers tonight to make sure it is not just the experimental config that caused the failed pings.


Chris

Sorry, typo, in unix land its a -s. I remember verifying that it was -l, but I forgot to actually fix it in the post.

The windows firewall is great at protecting the rest of the internet from you (but not from any viruses you may have) :smiley:

I think the windows 7 (and vista) firewall by default does not respond to pings.

Fragmentation support is an IP feature. Packets can explicitly mark that they cannot be fragmented, and some firewalls may drop fragments (they shoudn’t), but otherwise it should just work. Normally, the MTU of the entire path is detected by the probing side sending a packet at what it believes is the MTU, with the dont-fragment bit set. If a router along the way has a lower MTU, its supposed to send back an ICMP too-large message which contains the largest MTU that router can accept on that path (and this is where a lot of broken firewalls screw up by dropping the ICMP packet). The originating host can then adjust its MTU accordingly for that destination. This may occur more than once until the packet makes it all the way to the other side with no ICMP messages coming back.

If the dont-fragment bit is not set, then the packet is just broken up and sent on in pieces to the destination. It may get broken up again on the way if any of the pieces are still too big. (Incidentally, this is hard on CPUs and buffers, and IPv6 has altered they way this process occurs to improve it (by mandating MTU discovery and not allowing intermediate nodes to fragment at all)).

grr you were right this was stupid win7 firewall. XP you turn the firewall off and it is off. Win 7 it trys to puts the network as public each time you change the gateway and turns on the stupid firewall grrrrrrr.

below is a copy of the email i sent to support but i thought i may post it here as well in case it gives someone in-site into my issue.

so after doing more tests this is what i found. hopefully this will allow you to point me in the correct direction.

My ISP has a transparent proxy. As a test i contacted them and asked to be removed off the proxy. Since they have done so some webpages will load but very slowly (more than 30 sec for a basic page). before they removed me i tested ftp as other non http protocols on different prots seem fine. Ftp works like normal flawless. I then decided to test the web proxy built into RouterOS as when i telented to a webserver from routeros it was very fast. once i enabled routerOS web proxy and told my browser to use it even with caching turned off every webpage loaded quickly( a few seconds) and i could not find any that did not work.

Does that help you locate my issue at all? I worked out the ping from router to pc issue was just win7 firewall. I am considering setting up a transparent proxy in routerOS to make the web work but it really should not be required and it will be an issue moving forward for me as i plan to have 2 connections and different web data will go out different connections based on the source and i don’t think i can do that with the proxy.

I have no idea if this is important but i am located in Australia so maybe something is different over here. I have no idea what else to tell you to get this issue resolved. if there is anything i can do to get this issue resolved quicker please let em know and i will do so if it is at all feasible.

Regards

Chris

I have done more testing and found the fault to relating to ppoe connections in RouterOS. I changed the adsl modem from bridge mode to router mode and make the routerOS use it as a gateway and everything works perfectly with just a nat rule. Switch it back to bridge mode with routerOS doing the ppoe connection and while everything else seems to work web will not work unless it goes via the cache service in routerOS.

Does this give you any idea of what is wrong as i would prefer to keep my adsl modems in bridge mode if at all possible but at this stage it does not seem possible due to issues RouterOS seems to have with ppoe when running on VMware ESXi 4.1 at least.

I am testing this on 5.0 RC1.

Its sounding like MTU again, except it apparently works fine with the router’s proxy, so router to the internet and router to your PC seems to be fine, but PC through router to internet is not. Make sure change TCP-MSS is enabled in your PPPoE client profile.

I think you had the MTUs on your PPPoE client set to 1480? try 1492 (also for the MRU) and see what happens. Also when the DSL modem is being a router, can you see what its using for MTU?

If you look in the “/ip firewall mangle” rules while the PPPoE session is up, you should see two rules that should be fixing the MTU difference caused by PPPoE; is this present?