homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Robots.txt - Block all but Most Popular?

 6:08 pm on Aug 26, 2010 (gmt 0)

Hi Folks!

I'm re-visiting an old site of mine and any time I look into robots.txt I see the same thing - people making huge files to block specific bots and adding new ones to the list on a daily basis.

For my old site, almost all traffic comes from Google so it got me thinking, wouldn't it be smarter to just block everything but the biggest search engines?

If that works, how would I do it and what are the biggest engines?




 6:32 pm on Aug 26, 2010 (gmt 0)

Directives in the robots.txt file do not 'block' anything at all.

They are merely a polite request, which many rogue bots will ignore.

If you want to physically block stuff, you will need allow/deny directives or mod_rewrite code.


 1:42 am on Aug 28, 2010 (gmt 0)

it depends partly on how far you want to go and how much information about your configuration and your definition of a "good bot" you want to expose.
this thread should cover many of the issues and methods.

In the era of increasing numbers of bad bots is robots.txt irrelevant?:
http://www.webmasterworld.com/forum93/871.htm [webmasterworld.com]


 3:16 am on Aug 28, 2010 (gmt 0)

I whitelist the big 4 (now three) which are of my interest and disallow all others. While there are many bad bots that ignore robots.txt, a very surprising number DO honor it. Create that invitation only robots.txt and you'll not have to wade through tons of bots to find the ones that do evil, or in other words, just deal with the misbehaved and let the rest take a peek at robots.txt, take it, and go away. If nothing else, robots.txt does act as a filter regarding behaved bots and those that misbehave.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved