Welcome to WebmasterWorld Guest from

Forum Moderators: goodroi

Message Too Old, No Replies

Exclude ALL bots but major search engines

Trying to make the perfect robots.txt



11:03 am on Oct 24, 2007 (gmt 0)

This is what I currently have:

User-agent: Googlebot
User-agent: Yahoo! Slurp
User-agent: MSNBot

User-agent: *
Disallow: /

It is my understanding that this robots.txt will refuse all (well-behaving) bots and just let Google, Yahoo! and Live Search to crawl my site. I am not interested in letting ANY other bots on my site, unless they are likely to provide a lot of traffic for me.

Are there any other search engines that I should include? Maybe ASK? What User-agent does it use? Is it worth it? Or should I just go for these top-3 ones?

[edited by: encyclo at 10:20 pm (utc) on Nov. 10, 2007]


7:33 am on Oct 25, 2007 (gmt 0)

10+ Year Member

Only major se's obey robots.txt, you should block them via .htaccess. I'm not expert on it ether but I'm sure somebody else here will be able to help you.


9:18 am on Oct 25, 2007 (gmt 0)

WebmasterWorld Senior Member woz is a WebmasterWorld Top Contributor of All Time 10+ Year Member

serpmaster, thegreatpretender is making the point that your efforts to block all but authorised bots via robots.txt will prove largely fruitless in that only the major bots obey robots.txt directives, and then only usually.

At the other end of the scale, given that any unscrupulous bots are attempting to spider your content unscrupulously, they are hardly likely to stop their unscrupulous activities just because you ask them to. That would be like leaving your house door open with a nice polite sign on the door asking all would-be robbers to please leave your property alone.

If you want to ensure you stop all unauthoised bots then you need to take more effective measure than only using robots.txt. If your server is apaches based, then htaccess is the way to go as thegreatpretender suggests. With other servers you need to use other methods.



9:25 am on Oct 25, 2007 (gmt 0)

WebmasterWorld Senior Member woz is a WebmasterWorld Top Contributor of All Time 10+ Year Member





12:03 pm on Oct 25, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

Although I disallow some sections to all robots, this is effectively what I'd have if the site were completely open to them. The 'mix' of allowed robots depends on many factors, such as your primary market (e.g. U.S. or E.U.), whether your site is listed in the ODP, whether you want thumbnail images of your pages to appear on MSN and Ask, whether you have mobile-device pages on your site, and whether you want your site archived to support copyright claims, etc.

So there's no simple answer, and what's right for me is likely not right for you.

# Googlebots, msnbots, Yahoo, and Ask
User-agent: Googlebot
User-agent: msnbot/
User-agent: searchpreview
User-agent: slurp
User-agent: Teoma
User-agent: YahooSeeker/M1A1-R2D2
# DMOZ/ODP, Verizon, girafa page thumbnailer, Internet Archiver
User-agent: Robozilla
User-agent: Verizon Superpages Web Crawler
User-agent: girafa
User-agent: ia_archiver

# disallow all others
User-agent: *
Disallow: /


Featured Threads

Hot Threads This Week

Hot Threads This Month