Welcome to WebmasterWorld Guest from 22.214.171.124
Forum Moderators: goodroi
joined:Oct 3, 2007
User-agent: Yahoo! Slurp
It is my understanding that this robots.txt will refuse all (well-behaving) bots and just let Google, Yahoo! and Live Search to crawl my site. I am not interested in letting ANY other bots on my site, unless they are likely to provide a lot of traffic for me.
Are there any other search engines that I should include? Maybe ASK? What User-agent does it use? Is it worth it? Or should I just go for these top-3 ones?
[edited by: encyclo at 10:20 pm (utc) on Nov. 10, 2007]
At the other end of the scale, given that any unscrupulous bots are attempting to spider your content unscrupulously, they are hardly likely to stop their unscrupulous activities just because you ask them to. That would be like leaving your house door open with a nice polite sign on the door asking all would-be robbers to please leave your property alone.
If you want to ensure you stop all unauthoised bots then you need to take more effective measure than only using robots.txt. If your server is apaches based, then htaccess is the way to go as thegreatpretender suggests. With other servers you need to use other methods.
So there's no simple answer, and what's right for me is likely not right for you.
# Googlebots, msnbots, Yahoo, and Ask
# DMOZ/ODP, Verizon, girafa page thumbnailer, Internet Archiver
User-agent: Verizon Superpages Web Crawler
# disallow all others