Forum Moderators: goodroi
Then after that, you can allow the good ones:
User-agent: Googlebot
Allow: /
User-agent: Slurp
Allow: /
etc etc.
This keeps ALL the bad bots out (well the ones who use the robots.txt) but allows access to the good ones.
Thoughts on this?
Disallow:
ie:
User-agent: *
Disallow: /
User-agent: Googlebot
Disallow:
User-agent: Slurp
Disallow:
---
Named bots can be mentioned in any order, though it might be better to mention them before User-agent: * directive.
---
More importantly for you is to know the consequences of your actions - by only allowing a select few bots you help maintain their monopoly and prevent startups with good bots that generally obey robots.txt from having level playing field. Bad bots won't care about robots.txt at all - people who may want to scrape your site for republishing won't care about it either, so really, you are not protecting yourself in any meaningful way, yet you will reinforce existing status quo of a handful of search engines driving traffic to your site.
My advice - allow all good robots.txt obeying bots to crawl your site.
There are indeed some bots that obey robots.txt, but they are known to be bad in terms of too many requests or what not - if you check robots.txt on site like Wikipedia you will see a fair few of those: this could be a quick approach, but really, if you don't notice bots then just let them go about their business - the worst offenders (site scrapers) won't care about robots.txt.