Robots exclusion revisted

An oldie but well worth a read:

[kollar.com...]

10 years on, spiders of every kind are visiting as many sites as they can, as often as possible. Of course, site owners should be able to turn away unwanted spiders using the robots exclusion protocol (either a robots.txt file or a robots meta tag).

How do webmasters feel about the robots exclusion protocol? Has an adequate standard been established, and is the documentation sufficient?

To quote from the article, "[webmasters] will usually let you know if they think that [robots.txt] is being accessed too often :)".

Do today's webmasters feel that they have a way to respond to the spiders visiting their sites? How can the internet community ensure that undesired or uncontrolled spidering doesn't occur?

From a personal perspective, there seems to be shortcomings with robots exclusion, e.g. support for more complex rules, e.g regular expressions. There are also useful potentially useful robot controls that are not supported by enough spiders, or are not well documented e.g. crawl-delay

Most additions to the protocol seem to come from search engines themselves, rather than from the webmasters who provide the reason for robots to visit.

Is there anything else that would help webmasters control robot activity?

Robots exclusion revisted

How far have we come with controlling robots?

pixel_juice

jbinbpt

pixel_juice

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week