---- Robots.txt strategy: allow only good, or disallow individual bad?
bigtoga - 7:06 pm on May 10, 2013 (gmt 0)
There seem to be three different approaches by the bigger websites out there when it comes to writing robots.txt:
Allow all (Google, nbcnews)
Allow all except for certain "known bad bots" (Wikipedia)
Allow only the "best of the best" search engines and disallow any other bot (Facebook, LinkedIn, Nike)
Over the years, I've built up a robots.txt file w/ more than 60 "known bad bots". It's obnoxious to try to maintain this in this manner - always adding/removing/modifying bot version numbers, etc. So I'm considering to moving to the "Allow only the 'best of the best'" model. Anyone else doing this currently? I have my cherry-picked bots that I'd like to add now but I'm sort of shy of pulling the trigger until the idea is vetted by some other folks.