Forum Moderators: open
if they don't ask for robots.txt they can't come inThat used to be the standard test whether a bot is legit or not. However with the introduction of Social Media & interest based marketing, that dynamic has changed.
Validators & retrievers have never seen themselves as required to support robots.txt.Except the w3 family of validators, who are scrupulous about the robots.txt thing. (At least the link checker; that's the only one that has to visit live sites.) The twitterbot is also scrupulous about robots.txt. Partly for this reason, it is the only robot permitted to visit my test site.
The vertical bots, those that get their targets from a list, also have never seen a need to support robots.txt.I've got one area that is visited by a slew of RSS-following robots every time something new gets added to a curated directory. The Great Divide is between the ones that ask for robots.txt and the ones that don't. If they don't ask for robots.txt, and the UA string offers no way to get information about who they are and why they're visiting, why the ### should I let them in?