homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

MSN Live not respecting robots.txt rules
What's their problem?

 9:02 am on Sep 26, 2007 (gmt 0)

MSN Live search is one of the only mainstream search engine that keeps getting caught up in my bot trap, which is, obviously, forbidden in my robots file. What's their problem? Why do they visit files they shouldn't be? Shouldn't they be focusing instead on evaluating real pages instead? It's not like they're sending any real traffic anyway... that's just another strike into tolerating their bot, but my patience has limit.


Matt Probert

 9:06 am on Sep 26, 2007 (gmt 0)

Robots.txt makes suggestions and requests. There is no obligation for any spider or bot to make use of the robots.txt file or its contents, it's just good manners if they do.



 5:37 pm on Sep 26, 2007 (gmt 0)

Their search engine is junk, they are not able to respect robots.txt and crawl rate. We have 2007 not 1998, seems like they have been hiding under a bridge.

I think they should give up on building their search engine, it is too late.

[edited by: SEOPTI at 5:38 pm (utc) on Sep. 26, 2007]


 8:17 am on Oct 2, 2007 (gmt 0)

Yup - just checked logs this morning. MSNbot appears to randomly select items from the robots file to ignore and subsequently index.

That said, it makes such a poor job of crawling the site (despite a few sitemap files) that it is hard to say if it would ignore all of the robots exclusions if it ever worked properly.

Also discovered that it does not appear to understand a sitemap index file. Only when you explicitly put all the sitemaps in the robots file does the silly bot retrieve them.


 1:47 pm on Oct 6, 2007 (gmt 0)

MSN failed big time to build a search engine, they are stuck 1998. They had almost 10 years to write a function which respects robots.txt
What a failure.


 3:20 pm on Oct 6, 2007 (gmt 0)

A useful technique for this situation is to detect known-good 'bot requests for your 'trap' URLs, and internally rewrite them to a minimal page containing a link to your home page and a <meta name="robots" content="noindex"> tag.

Yes, it's cloaking, but with no intent to deceive anyone.


Global Options:
 top home search open messages active posts  

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved