Page is a not externally linkable
- Hardware and OS Related Technologies
-- Website Technology Issues
---- Problems with googlebot after a domain move


alansch - 4:21 pm on Oct 30, 2002 (gmt 0)


I've run into a weirdness involving an Apache server and Google spidering. It may also affect other spidering search engines.
It seems that Google, in the interests of conserving bandwidth, remembers IP addresses so it doesn't have to do millions of DNS lookups as it spiders the web.

Our hosting server has recently had a couple of domains move elsewhere. But Google is still spidering our server for these domains (which proves to me that google DOES remember the IP addresses, BTW) and indexing the main server pages *as if they belonged to the domain that has left*!

On doing some checking with WebBug, I find that I can interrogate our server's IP with the Host: request header set to any URI not known to the server and, instead of getting the expected 404 Not Found status, I receive a status of 200 OK and receive the server's own home page.

Is this an Apache configuration issue? If so, what configuration parameters need to be tweaked, and how, to prevent this behaviour? Or, *horrors*, is it a hitherto unsuspected weakness in the backbone of the Internet?

I've just done a sort of informal survey of several dozen servers (from my browser bookmarks).

Every Apache server I checked (including www.w3.org, who should be able to "get it right" if anybody can) displays this same behaviour.

Netscape Enterprise server (the couple I stumbled across) also seems to display the same behaviour.

Much as it *PAINS* me to admit it , IIS *sometimes* (about 4 out of 5) seems to return a four-oh-something error under these circumstances.

Can anyone offer any light on this rather arcane subject or point me to a post that has covered this previously, please?


Thread source:: http://www.webmasterworld.com/website_technology/1331.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com