Forum Moderators: goodroi

Message Too Old, No Replies

msnbot visits robots.txt and exits

         

pemba76

5:29 am on Nov 22, 2006 (gmt 0)

10+ Year Member



4 months old site in php, only homepage was indexed by msn on 21 October 2006. Did all i could to get inner pages indexed, but the spider simply did not move ahead. Today even the homepage is removed from msn index. please help.

The logs shows that msnbot visits the robots.txt file and does not index further. The log shows the following atleast once daily :
207.46.98.38 - - [21/Nov/2006:00:43:10 -0600] "GET /robots.txt HTTP/1.0" 200 484 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
207.46.98.38 - - [21/Nov/2006:00:43:11 -0600] "GET / HTTP/1.0" 200 174139 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"

robots.txt file is simple with :
User-agent: *
Disallow: /images

i have also included meta tag for msn bot :
<meta name="msnbot" content="follow all,index">

I even have some 40+ backlinks from well ranked sites.

Comments please.

goodroi

2:59 pm on Nov 22, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



hi pemba,

there are many things that could be going on here. To be technically correct you should add a trailing slash to images. I doubt that is causing your problem.

I would also remove the msnbot meta tag. MSNbot automatically will index and follow a page so your metatag is redundant. I'm also not sure "follow all" is a valid msn command.

40 backlinks is a small amount. I would go after more backlinks. Make sure to get some backlinks to link deep into your site and not to have all of your backlinks point to your homepage.

Also look at your homepage and make sure there are clear links that the search engines can follow. Search engines generally ignore javascript links and links in flash.

If all else fails, there is another reason why MSN might not include your url in its index - because they banned it. This is very rare and I doubt it applies here.

pemba76

5:29 pm on Nov 23, 2006 (gmt 0)

10+ Year Member



thanks a ton goodroi. i have made the changes that you have suggested.i am keeping my fingers crossed that i my site gets indexed soon and that my site is not banned.

jdMorgan

6:57 pm on Nov 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



msnbot currently appears to be broken, in that the various versions can't do proper matching on the User-agent strings in robots.txt. For example, with

User-agent: msnbot/
Disallow: /cgi-bin/

User-agent: *
Disallow: /


msnbot/1.0 will (correctly) index the site (except for the cgi-bin directory path), but msnbot-media still continues to try to fetch pages, despite the fact that a "/" does not follow "msnbot" in msnbot-media/1.0's user-agent string and it therefore should disregard the "User-agent" msnbot/" record and instead obey the "User-agent: *" catch-all at the end of the file. I have reported this problem to the MSNbot team, but they have been unresponsive.

The take-home message is that msnbot is still a bit fragile, and you must make sure your robots.txt and robots meta data is 100% correct. Otherwise, who knows what they might do.

Although goodroi is correct about your not needing the "index,follow" or "all" robots meta data, the attribute combination you included was incorrect. Considering only the "index" and "follow" attributes, the following are the correct combinations:

<meta name="robots" content="index,follow"> ----- (default robot behavior, so redundant)
<meta name="robots" content="all"> -------------- (equivalent to the above, so redundant)
<meta name="robots" content="index,nofollow"> --- (index this page, but do not follow the links)
<meta name="robots" content="noindex,follow"> --- (do not index this page, but follow the links
<meta name="robots" content="noindex,nofollow"> - (do not index this page or follow the links)
<meta name="robots" content="none"> ------------- (equivalent to "noindex,nofollow")

Jim

pemba76

4:34 am on Nov 24, 2006 (gmt 0)

10+ Year Member



Hi Jim
Well after i made the changes to the meta tags (removed them) and appended images with "/" in the robots.txt file, i have seen a major change in my logs. There has been a huge indexing done by gbot and yslurp. Though there has been no visit by msnbot yet, i hope it does visit soon.
By the way i know what msnbot is but what is this msnbot-media all about. Plus do u suggest that i should put in User Agent in the robots.txt file individually or will the "*" do the work for msn too.

Thanks
Pemba

pemba76

4:54 am on Nov 24, 2006 (gmt 0)

10+ Year Member



just checked the logs again.....:(
65.55.212.166 - - [23/Nov/2006:13:05:05 -0600] "GET /robots.txt HTTP/1.0" 200 489 "-" "msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)"
65.55.212.166 - - [23/Nov/2006:13:05:07 -0600] "GET / HTTP/1.0" 200 178700 "-" "msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)"

And the there was no trace of msnbot.
Seems like they dont like my robots.txt file!

pemba76

12:13 pm on Nov 28, 2006 (gmt 0)

10+ Year Member



hey guys,
my homepage is in the msn index today. shows the cache date as 27/nov. Any more advices for the inner pages?