Forum Moderators: phranque

Message Too Old, No Replies

MSNBot viewing requesting the wrong VirutalServer

apache, search, spider, msnbot, bing

         

avatarworf

8:34 am on Feb 9, 2010 (gmt 0)

10+ Year Member



Hello,
I was wondering if anybody has noticed a similar problem with the MSNBot?
I have several vitualhost on a single IP (nothing on SSL). For some reason the MSNBot is requesting pages from the first VirtualServer which are actually located on another VS.

The Bot seems to be requesting a HTTP/1.1; would this mean it may be requesting with Host: <ip> ; Host: <first VS> or even no Host: at all

The simplest thing to do to make the MSNBot happy would probably be to make the first VS my main site. (or the one that's getting a bit of indexing) This could also be occuring for other VSs So this doesn't solve the problem.

jdMorgan

2:47 pm on Feb 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If the requests are HTTP/1.1, then they must include a hostname, otherwise, you'd see 400-Invalid Request errors in your logs.

The primary question here is whether these requests to the wrong hostnames actually resolve. What is the server's response to these requests? If the response is not a 404, then you got a configuration problem.

If it is a 404, then you can either 301-redirect the requests that you *know* will resolve on the other host, or ignore this and hope the msnbot team fixes the bug -- They've had some rather spectacular bugs in the past, and it wouldn't surprise me if this were another.

Jim

avatarworf

3:08 pm on Feb 9, 2010 (gmt 0)

10+ Year Member



Thanks for the quick reply.
Yeah the responses are 404 except for things like robots.txt (in the primary VirutalServer)

The two sites concerned are completely different. So I could try the process of 301-ing all those pages; or at least pattern match a bunch of it.

Hope this works out..

jdMorgan

3:37 pm on Feb 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If they're returning 404's, I'd leave it alone and ignore the error.

Look around for and close any 'back doors' that msnbot might have found to mis-index these sites -- For example, redirect IP-address-hostname requests to the proper canonical hostname or to a 'catch-all' default server that either offers a list of links to valid hostnames or just returns a standard Apache "default page."

Jim