Welcome to WebmasterWorld Guest from 54.159.94.253

Forum Moderators: goodroi

Message Too Old, No Replies

The Rise & Fall of Robots.txt

Perspective on the Robots Text Standard

     
3:47 am on Sep 8, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:11474
votes: 692


Robots.txt has reached a point of *almost* becoming archaic, outdated. It makes little sense unless you wish to express specific instructions to specific agents that do support the file, mostly the major search engines. Here's why I feel this way...

For years web site owners used the /robots.txt file to give instructions about their site to web robots; this was called The Robots Exclusion Protocol [robotstxt.org] or sometimes referred to as the Robots Text Standard.

The robots.txt file was proposed by Martijn Koster when working for Nexor in February 1994. It quickly became accepted and future web crawlers were expected to follow. Early supporters included search engines: WebCrawler, Lycos and AltaVista. A few years later, when Yahoo and then Google and Microsoft search engines came into wide use, they also followed support for robots.txt, sometimes contributing their own propriety exclusions.

However, despite the major Search Engines championing this, it was never fully accepted across the web and never really became a standard. Is it really a "standard" if only a handful support it? Many website owners have never used a robots.txt file and it hasn't impacted their sucess either way. Since it never stopped agents that didn't support it, many thought the file to be a futile effort.

At one time, years ago, its support was a valid talking point and a measure of whether a User Agent was an "honest" robot. But ever since Social Media came on the scene with a vengeance, few of those social connected UAs ask for or obey robots.txt. Many of these agents do not consider themselves a robot at all since they do not actually "crawl" links. They are vertical data retrievers and link validators, but their boundaries keep getting pushed farther and farther outward towards being a spider/robot/crawler.

IMO support for robots.txt should not be an indicator whether an agent is honest or beneficial and worthy of access to one's files. Most of my UA allow list does not ask for it, they probably never even considered it, while most of my UA deny list does ask for it, and may even support the directives.

I'm certainly not trying to convince website owner to not use a robots.txt file, only to not expect much support other than from several top players and a few others.
4:18 am on Sept 8, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:8167
votes: 609


It ain't dead, but it's not a TOOL.

Bots that respect are why the robots.txt exists. All others ignore.

Since robots.txt has no TEETH it's merely a request and we all know how that works in real life.

Never was a tool, but is part of the tool kit. Use for those who honor, otherwise, harden the site for those who do not.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members