The Rise & Fall of Robots.txt

Robots.txt has reached a point of *almost* becoming archaic, outdated. It makes little sense unless you wish to express specific instructions to specific agents that do support the file, mostly the major search engines. Here's why I feel this way...

For years web site owners used the /robots.txt file to give instructions about their site to web robots; this was called The Robots Exclusion Protocol [robotstxt.org] or sometimes referred to as the Robots Text Standard.

The robots.txt file was proposed by Martijn Koster when working for Nexor in February 1994. It quickly became accepted and future web crawlers were expected to follow. Early supporters included search engines: WebCrawler, Lycos and AltaVista. A few years later, when Yahoo and then Google and Microsoft search engines came into wide use, they also followed support for robots.txt, sometimes contributing their own propriety exclusions.

However, despite the major Search Engines championing this, it was never fully accepted across the web and never really became a standard. Is it really a "standard" if only a handful support it? Many website owners have never used a robots.txt file and it hasn't impacted their sucess either way. Since it never stopped agents that didn't support it, many thought the file to be a futile effort.

At one time, years ago, its support was a valid talking point and a measure of whether a User Agent was an "honest" robot. But ever since Social Media came on the scene with a vengeance, few of those social connected UAs ask for or obey robots.txt. Many of these agents do not consider themselves a robot at all since they do not actually "crawl" links. They are vertical data retrievers and link validators, but their boundaries keep getting pushed farther and farther outward towards being a spider/robot/crawler.

IMO support for robots.txt should not be an indicator whether an agent is honest or beneficial and worthy of access to one's files. Most of my UA allow list does not ask for it, they probably never even considered it, while most of my UA deny list does ask for it, and may even support the directives.

I'm certainly not trying to convince website owner to not use a robots.txt file, only to not expect much support other than from several top players and a few others.

The Rise & Fall of Robots.txt

Perspective on the Robots Text Standard

keyplyr

tangor

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week