Forum Moderators: open
What is need is a standard that says, "unless file x is present", robots are not allowed to access the site.
The only question, is how to do it?
Unless server software could somehow be configured to accurately recognize spider vs. human visitors, and block spiders by default. But that would fall more into the .htaccess area than the robots.txt area.
But... however this is enforced, this will have important repercussions on the concept of search engines in general: I imagine only a small proportion of webmasters (if you can call them that) are aware of the 'robots.txt' file, so as a consequence, only a very small proportion of web pages would be indexable... which beats the point of a "www" search engine, since only the enlightened people would get in the databases.
As for rogue spiders, they are 'rogue' simply because we're enforcing our exclusion / inclusion 'standards' loosely. 'Robots.txt' is a bit like a polite notice "Please don't let your dog pee on the lawn." And just like dogs can't really control their bladder, web-hackers / S.E. staff don't especially want to control their spiders.
What we need is a electrified fence with barbed wire around the lawn, and a single steal door with secure footprint identification ;) How about hard-coding the robots inclusion/exclusion standard within the web server? A few extra lines in the .htaccess would do the trick nicely. Come to think of it, isn't this already possible with apache??
Alex
That's the problem... we're talking about putting a sign on the lawn saying "Only Fido and Rover are allowed to pee here" and then expecting all the other neighborhood dogbots to obey. Unless their owner/programmers see the sign, and decide to program their dogbots accordingly, it ain't gonna work...
So using your electric barbed wire .htaccess file fence is the only thing that will keep badly trained dogbots off your lawn. Which leaves us back exactly where we are now.
"Hey, is that a lampost? Ohhhh... Aha, nice tree!"
This dog metaphore can be taken even further ;) I think the inclusion standard won't work because dogs like to mark their territory, just like SE's like to index wast quantities of pages and brag about them (Wisenut, Google :) Putting a polite sign on the lawn will reduce their territory, so why respect it?
Besides, and inclusion standard is just a lazzy alternative for what we have already (listing all exclusions :)... might aswell stick to that! If you need anything more, dive into apache.
Fair?