Welcome to WebmasterWorld Guest from 54.147.158.215

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

X-Robots-Tag - controlling Googlebot via HTTP headers

     
2:12 pm on Jul 28, 2007 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 31, 2003
posts:9068
votes: 4


From the recent Google Blog posting Robots Exclusion Protocol: now with even more flexibility [googleblog.blogspot.com], Google have announced the availability of the
unavailable_after
meta element, which enables you to give an expiry date to your pages. (See the thread Google Plans a New Meta Tag - "unavailable_after" [webmasterworld.com] for more information.)

However, there is a second, more interesting, announcement in the same entry: the ability to control Googlebot behavior via HTTP headers rather than on-page meta elements: the

X-Robots-Tag
header.

We've extended our support for
META
tags so they can now be associated with any file. Simply add any supported
META
to a new
X-Robots-Tag
directive in the HTTP Header used to serve the file.

As mentioned in the post, this is very useful for non-HTML content such as PDF, Word or plain text [webmasterworld.com] files, where you cannot insert

meta
elements. You can also reduce clutter in the document itself, as well as control indexing via the server configuration rather than editing the files.

One caveat not mentioned by Google is that only Googlebot supports this syntax - unless the other search engines decide to follow suit - so you will still need

meta
elements for Yahoo or MSN. Also, how long do you reckon we'll have to wait until the first case of a hacked server being modified to send a
noindex
HTTP header with every request?
3:24 am on July 29, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 14, 2005
posts:55
votes: 0


Yes, I would definitely be concerned about the ease with which a website on a compromised server could be destroyed.
3:30 am on July 29, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 27, 2001
posts:1472
votes: 0


THANK YOU!
3:02 pm on July 30, 2007 (gmt 0)

Administrator from GB 

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month Best Post Of The Month

joined:May 9, 2000
posts:23241
votes: 357


This follows on from our earlier post on the matter.
[webmasterworld.com...]
7:44 pm on July 30, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 31, 2003
posts:1316
votes: 0


wait until the first case of a hacked server being modified to send a noindex HTTP header with every request

If your server is hacked, search engine placement is the least of your worries.
7:53 pm on July 30, 2007 (gmt 0)

Junior Member

joined:July 27, 2007
posts:125
votes: 0


Am I right in thinking that this is basically a "Is-robot: true" or "Is-robot: 1" HTTP header? So we no longer have to sniff the User-agent string and guess whether it's a bot or a human using a Web browser?
9:46 pm on July 30, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Sept 2, 2004
posts:124
votes: 0


If your server is hacked, search engine placement is the least of your worries.

LOL, no doubt! Last time my main unix server was hacked I didn't have any search engine placement worries, in fact I didn't have any websites left on it at all...thank goodness for my backup dedicated hosting the downtime was minimal.

If some one has unauthorized access to your website, there's already plenty of ways they can break down your business without any need for a new metatag, someone can already put a nofollow tag and get you out of the serps if they have access to your server.

12:10 am on July 31, 2007 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 31, 2003
posts:9068
votes: 4


someone can already put a nofollow tag and get you out of the serps if they have access to your server

The comment about hackers was merely an aside and not the main part of my post in any way, but I'll just reply to this: the HTTP header is much more unobtrusive, and therefore much harder to detect, than actually modifying the pages themselves or changing the robots.txt (something which has been reported as occuring in the past in order to remove a site from the index).

Am I right in thinking that this is basically a "Is-robot: true" or "Is-robot: 1" HTTP header? So we no longer have to sniff the User-agent string and guess whether it's a bot or a human using a Web browser?

This is not anything sent by the bot itself, so it doesn't help in identifying Googlebot - it is a HTTP header that you can add to your server's response to a GET request, which offers similar functionality to the usual robots meta elements more commonly seen. You can add the HTTP headers via a server-side scripting language (PHP, etc.) or via the server configuration (Apache httpd.conf, IIS...).

1:23 pm on July 31, 2007 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member ogletree is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 14, 2003
posts:4281
votes: 25


This is great but nobody is saying how you would do such a thing. How do you modify your http header on IIS and Apache?
1:50 pm on July 31, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 1, 2004
posts:607
votes: 0


How do you modify your http header on IIS and Apache?

That's something you'd usually handle at application level, e.g. in PHP / ASP / whatever.
9:00 pm on July 31, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 27, 2001
posts:1472
votes: 0


Here's a simple example for Apache that you can include in your .htaccess file to keep Googlebot (and hopefully others in time) from indexing image files. With some modification it can be used to control robot access to other files or file types:

<Files ~ "\.(gif¦jp[eg]¦png)$">
Header append X-Robots-Tag "noindex"
</Files>

The X-Robots-Tag directive is a small step towards making robots.txt obsolete.

9:50 pm on July 31, 2007 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member ogletree is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 14, 2003
posts:4281
votes: 25


I don't really see the need for this. Why not just stick those files in their own directory and disallow it.
10:03 pm on July 31, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 27, 2001
posts:1472
votes: 0


If your only intention is to disallow access to a file or files then a robots.txt would work just fine.

However, you can't use noarchive, nofollow, nosnippet, or unavailable_after in a robots.txt file. The header X-Robots-Tag is a much more powerful tool. It allows us to use these directives without needing to edit files. It also allows us to use these directives for media files, pdf files, etc, that can't have meta tags directives inserted in them. It can also be used for user-agent/ip delivery.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members