There seem to be three different approaches by the bigger websites out there when it comes to writing robots.txt:
- Allow all (Google, nbcnews)
- Allow all except for certain "known bad bots" (Wikipedia)
- Allow only the "best of the best" search engines and disallow any other bot (Facebook, LinkedIn, Nike)
Over the years, I've built up a robots.txt file w/ more than 60 "known bad bots". It's obnoxious to try to maintain this in this manner - always adding/removing/modifying bot version numbers, etc. So I'm considering to moving to the "Allow only the 'best of the best'" model. Anyone else doing this currently? I have my cherry-picked bots that I'd like to add now but I'm sort of shy of pulling the trigger until the idea is vetted by some other folks.