Forum Moderators: goodroi
User-agent: Googlebot
User-agent: Yahoo! Slurp
User-agent: MSNBot
Disallow:
User-agent: *
Disallow: / It is my understanding that this robots.txt will refuse all (well-behaving) bots and just let Google, Yahoo! and Live Search to crawl my site. I am not interested in letting ANY other bots on my site, unless they are likely to provide a lot of traffic for me.
Are there any other search engines that I should include? Maybe ASK? What User-agent does it use? Is it worth it? Or should I just go for these top-3 ones?
[edited by: encyclo at 10:20 pm (utc) on Nov. 10, 2007]
At the other end of the scale, given that any unscrupulous bots are attempting to spider your content unscrupulously, they are hardly likely to stop their unscrupulous activities just because you ask them to. That would be like leaving your house door open with a nice polite sign on the door asking all would-be robbers to please leave your property alone.
If you want to ensure you stop all unauthoised bots then you need to take more effective measure than only using robots.txt. If your server is apaches based, then htaccess is the way to go as thegreatpretender suggests. With other servers you need to use other methods.
Onya
Woz
So there's no simple answer, and what's right for me is likely not right for you.
# Googlebots, msnbots, Yahoo, and Ask
User-agent: Googlebot
User-agent: msnbot/
User-agent: searchpreview
User-agent: slurp
User-agent: Teoma
User-agent: YahooSeeker/M1A1-R2D2
# DMOZ/ODP, Verizon, girafa page thumbnailer, Internet Archiver
User-agent: Robozilla
User-agent: Verizon Superpages Web Crawler
User-agent: girafa
User-agent: ia_archiver
Disallow:
# disallow all others
User-agent: *
Disallow: /