homepage Welcome to WebmasterWorld Guest from 54.163.89.8
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Robots.txt - Block all but Most Popular?
mmiller

5+ Year Member



 
Msg#: 4192676 posted 6:08 pm on Aug 26, 2010 (gmt 0)

Hi Folks!

I'm re-visiting an old site of mine and any time I look into robots.txt I see the same thing - people making huge files to block specific bots and adding new ones to the list on a daily basis.

For my old site, almost all traffic comes from Google so it got me thinking, wouldn't it be smarter to just block everything but the biggest search engines?

If that works, how would I do it and what are the biggest engines?

Thanks!

 

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4192676 posted 6:32 pm on Aug 26, 2010 (gmt 0)

Directives in the robots.txt file do not 'block' anything at all.

They are merely a polite request, which many rogue bots will ignore.

If you want to physically block stuff, you will need allow/deny directives or mod_rewrite code.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4192676 posted 1:42 am on Aug 28, 2010 (gmt 0)

it depends partly on how far you want to go and how much information about your configuration and your definition of a "good bot" you want to expose.
this thread should cover many of the issues and methods.

In the era of increasing numbers of bad bots is robots.txt irrelevant?:
http://www.webmasterworld.com/forum93/871.htm [webmasterworld.com]

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4192676 posted 3:16 am on Aug 28, 2010 (gmt 0)

I whitelist the big 4 (now three) which are of my interest and disallow all others. While there are many bad bots that ignore robots.txt, a very surprising number DO honor it. Create that invitation only robots.txt and you'll not have to wade through tons of bots to find the ones that do evil, or in other words, just deal with the misbehaved and let the rest take a peek at robots.txt, take it, and go away. If nothing else, robots.txt does act as a filter regarding behaved bots and those that misbehave.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved