Problems with DNS for www.google.com

Hello all,
I have a very strange issue that has been driving me crazy for the last week or two. I really hope someone here can shed some light.

The internet will be working great for all of my customers, but then I’ll start getting phone calls saying that they can’t get to www.google.com. I’ve even had it happen to me while I had myself setup with a CPE. I can ping google.com just fine, but when I add the www, ping acts as if the domain doesn’t exist. If I do an nslookup to www.google.com, it usually finds the IP address, which makes me think that the DNS server is actually working fine. I ran Wireshark and watched when I tried to ping www.google.com, and the strange thing is that it didn’t even look like it was sending a DNS query. I would see a couple netbios requests, but that was it. just going to google.com worked just fine.

Ok, here is how my system is setup. I have a Mikrotik PPPoE server. My customers are wireless Mikrotik (900 mhz) or UBNT (2.4 ghz) CPE’s. They login via PPPoE and are configured to use OpenDNS for their DNS servers and perform NAT to the customer’s LAN. It doesn’t seem to matter if they have a router at their place or not. So far I have heard of problems from a couple dozen Mikrotik clients, and only one UBNT client. Most of them use Windows, but a few have Mac’s and I have heard from one person who has a Mac and is having the same problem.

When a DNS query is performed, the client’s computer should ask the CPE, which should directly forward the request to OpenDNS’s servers. Has anyone else seen any similar issues? I’m going crazy trying to figure out is it is a problem caused by a windows update (unlikely since it appears to be happening on a Mac too), if it is something wrong with my CPE’s (strange considering it just started out of the blue even for clients who haven’t been touched in many months), or if Google is having a major problem (unlikely since I’m assuming it would be all over the news.

One last thing I have noticed, when I am logged into winbox and go to tools/ping, I can always ping google.com, but sometimes www.google.com gives me an error saying that it “expected an IP address”.

Any thoughts?

Thanks,
Joe

Do you have a NAT rule to redirect all DNS request in place? If yes, disable the rule and check if the issue is resolved.

Have you tried another dns provider during the “outage”? OpenDNS works by caching your lookup for a long time. It could be easily poisoned with bad info.

Sounds like a DNS issue to me.

Tom

skillful: I only have one NAT rule in place for each customer, and that is simply to masquerade all traffic out the PPPoE connection.

Tom: Yes, I have tried a few different DNS providers.

Like I mentioned, running nslookup has always worked as far as I am aware, so the DNS server itself appears to be working fine… It just seems as if the DNS requests simply aren’t getting to where they need to be.

One more weird thing I should mention… I have 3 computers at my house, which is setup like any other client of mine. One desktop and two laptops. At any given time, all three can be working, all three can be not working, or any combination of working/not working. I can sit two of the machines side by side and type “ping www.google.com” and one will work while the other gives me a “ping could not find host” message.

I’m almost at my wits end trying to figure this out, and a handful of my customers are breathing down my neck to get it fixed, and I honestly can’t even tell them if it’s my problem or not.

Thanks,
Joe

If your DNS works sometimes, then not, then it’s clearly an intermittent DNS problem.

Check all the bits down the chain.

OpenDNS, then the Mikrotik that’s acting as your gateway, any APs in the middle, then the CPE.

Each one can/will cache the DNS query results (depending on how you set it), as will your client PC/Mac.

It sounds like the DNS sometimes works, so the ‘correct’ DNS answer is cached somewhere, so it works. Then DNS fails, and the ‘failure’ is cached too, so it doesn’t work.

On my nets i set one Mikrotik AP to act as a DNS proxy for all devices on the network.

That narrows down the search for DNS problems a lot.

My bet is about big DNS replies and PPPoe MTU.

Does that DNS traffic go over UDP or TCP?

I once was using a wrong DNS server which eventually cut TCP support, and big requests didn’t work.

http://forum.mikrotik.com/t/edns-not-implemented/32004/1

Hallo, I have in network about 100 mikrotik routers and I have problem with http://www.google.com too in some sectors of network. This is not bug in DNS becouse I put http://www.google.com IP from OpenDNS: 208.69.34.231 to browser in two different parts of network in same time and in one part working google.com good and in other part not work. I have problem in 2 parts of network where is new version mikrotik 3.30. Another parts have 3.20 and older versions and http://www.google.com have no problem. If I reboot mikrotiks in part where google.com not work, it working perfect some times and than problem back again. Now I going to downgrade and I will test if problem will removing.

Good luck Libor.

Now I test it again and problem really can be in DNS, because IP: 208.69.34.231 working in all parts network but if I want finding with google.com it is not working and it jump to link: http://www.google.cz/#hl=cs&source=hp&q=fifa&btnG=Vyhledat+Googlem&lr=&aq=f&oq=fifa&fp=c17295b8fee82121 and this using DNS.

The temporary solution for this problem is

/ip dns static
add address=208.69.34.230 disabled=no name=www.google.com ttl=1d
add address=208.69.34.231 disabled=no name=www.google.com ttl=1d

When things are so strange, the way that I solved by doing the following:
a) Make a backup of the entire router
b) Reset default
c) Re-build the backup.

At least a couple of chances so I could solve similar problems. It’s like a bug that is generated within the router’s impossible to get otherwise.
Suerte!

Well, as suddenly as things quit working… they have started working again. No rhyme or reason. I had given up and quit messing with things, and 2 days later, nobody was having the problem. I have still seen it once or twice, but then it goes away again.

I tried that on the very first mikrotik box that the requests should have gone to. The PCs were configured to use the MT router as the DNS server, so nothing else should have been in between caching the requests. Pinging www.google.com still failed, so I’m clearly confused as to why things weren’t working.

Thanks for the responses everyone. If it crops up again, I’ll try to revive this thread with any new info I can find out.

Joe

Hi …

I noticed this behaviour some time ago, not only for Google but for a couple of web sites.

Since flushing DNS cache woked for me when I was trying to figure what was going on, my dirty turn around was a small ‘script’ that flushes the DNS cache from time to time, typed directly on scheduler, something like this:

/ip dns cache;
flush

Scheduler run each 10 or 15 minutes, I don’t remember. May be once an hour works too, donno. Ok, may be I miss the main caching feature but … better some miliseconds to a name resolution each 10 minutes than people screaming because google.com does not load.

Besides that most of those vy ‘dynamic’ sites defines some ip addresses ttl to 5 minutes or so anyway …

Regards;

Well, the DNS problem seems to be slowly reappearing for me and the biggest thing I notice is that when people have problem, it looks like their DNS record for www.google.com has a TTL of a week. Normally the TTL is only 30 seconds or so.

Does anyone have any idea on what would cause the occasional record to get cached for way too long?

Try

/ip dns set max-cache-ttl=15m

I tried this:
/ip dns
set udp-packet-size=768 cache-max-ttl=15m
But I get a “expected end of command” error.
Where am I going wrong?

te right sintax is “set max-udp-packet-size=768 cache-max-ttl=15m”

regards

RB493AH Version 4.2

I wanted to mention that I recently had a similiar problem with Google. I could ping google, but typing it into the address bar on Internet Explorer, it was unable to load. I could load it using the IP address returned by ping, but not google.com

A reboot seemed to fix it.

I’ve gone through different topics, it seems the problem is caused by the DNS server, which reports 1W as TTL for google, which then brakes communication with the server (correct me if I wrong). The solution for the problem is to set lower TTL (is it correct?).

Hi there,

A cuple of months ago also I’ve noticed the problem with www.google.com.

I’ve noticed it at the same time at the office and at home.

In the two case the design is a simple router NAT that connect to a network and give connection to other clients, using diffent type of medium and passing different middle switch or bridge. The two networks are bridged using PPTP.

Like EXACTLY jcremin said:

The internet will be working great for all of my customers, but then I’ll start getting phone calls saying that they can’t get to > www.google.com> . I’ve even had it happen to me while I had myself setup with a CPE. I can ping google.com just fine, but when I add the www, ping acts as if the domain doesn’t exist. If I do an nslookup to > www.google.com> , it usually finds the IP address, which makes me think that the DNS server is actually working fine. I ran Wireshark and watched when I tried to ping > www.google.com> , and the strange thing is that it didn’t even look like it was sending a DNS query. I would see a couple netbios requests, but that was it. just going to google.com worked just fine.

Also i want to say that i use OPENDNS and a central DNS proxy build with mikrotik [all the station/client connect to the central DNS proxy].

I goes completly crazy and digging i’ve found something about a strange error of OPENDNS. For me just change DNS provider resolve the problem.

The true is that i don’t found a real solution of the problem.

Best Regards,

RG.

As an update to all of my earlier posts. I now know a little more about this problem. The issue definitely is that www.google.com (and occasionally other Google domains) is cached for 1 week. Once the problem has affected one router, anything that relies on that routers cache is affected because the record (along with the wrong TTL) is passed along right down to the customer’s computer. The only fix is to flush the dns cache all the way down the chain.

The cause of the issue seems to only occur on Mikrotik routers. I originally saw the problem on other brands of routers too, but that was because all of my CPE routers were set to use my main Mikrotik router as their DNS server which was forwarding and caching DNS requests.

I have since changed my network so that all of my CPE devices currently point right to OpenDNS. What I have seen is that occasionally, a small handful of Mikrotik CPE still get the 1 week problem. The problem still gets passed along to any routers and computers they may have, but the scope of who gets affected is much smaller now. The problem definitely does not stem with OpenDNS. I have used a handful of different DNS servers (even tried setting my own server which directly used the root servers) and the problem still crops up from time to time, but again, only on Mikrotik routers.

This is a tough problem to troubleshoot since it only happens once in awhile and there is no way to duplicate it that I know of. Manually overriding the max TTL’s on the Mikrotik settings SHOULD work in theory, but I haven’t actually tried it as I’d prefer not to mess too much with workarounds.

Hopefully this info helps. It’s a frustrating problem and I wish it could be gone for good.

Joe