homepage Welcome to WebmasterWorld Guest from 54.226.168.96
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Visit PubCon.com
Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

    
MSN Live not respecting robots.txt rules
What's their problem?
koan




msg:3461190
 9:02 am on Sep 26, 2007 (gmt 0)

MSN Live search is one of the only mainstream search engine that keeps getting caught up in my bot trap, which is, obviously, forbidden in my robots file. What's their problem? Why do they visit files they shouldn't be? Shouldn't they be focusing instead on evaluating real pages instead? It's not like they're sending any real traffic anyway... that's just another strike into tolerating their bot, but my patience has limit.

 

Matt Probert




msg:3461192
 9:06 am on Sep 26, 2007 (gmt 0)

Robots.txt makes suggestions and requests. There is no obligation for any spider or bot to make use of the robots.txt file or its contents, it's just good manners if they do.

Matt

SEOPTI




msg:3461729
 5:37 pm on Sep 26, 2007 (gmt 0)

Their search engine is junk, they are not able to respect robots.txt and crawl rate. We have 2007 not 1998, seems like they have been hiding under a bridge.

I think they should give up on building their search engine, it is too late.

[edited by: SEOPTI at 5:38 pm (utc) on Sep. 26, 2007]

justguy




msg:3466760
 8:17 am on Oct 2, 2007 (gmt 0)

Yup - just checked logs this morning. MSNbot appears to randomly select items from the robots file to ignore and subsequently index.

That said, it makes such a poor job of crawling the site (despite a few sitemap files) that it is hard to say if it would ignore all of the robots exclusions if it ever worked properly.

Also discovered that it does not appear to understand a sitemap index file. Only when you explicitly put all the sitemaps in the robots file does the silly bot retrieve them.

SEOPTI




msg:3470671
 1:47 pm on Oct 6, 2007 (gmt 0)

MSN failed big time to build a search engine, they are stuck 1998. They had almost 10 years to write a function which respects robots.txt
What a failure.

jdMorgan




msg:3470707
 3:20 pm on Oct 6, 2007 (gmt 0)

A useful technique for this situation is to detect known-good 'bot requests for your 'trap' URLs, and internally rewrite them to a minimal page containing a link to your home page and a <meta name="robots" content="noindex"> tag.

Yes, it's cloaking, but with no intent to deceive anyone.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved