homepage Welcome to WebmasterWorld Guest from 54.205.144.54
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
X-Robots-Tag - controlling Googlebot via HTTP headers
encyclo




msg:3407139
 2:12 pm on Jul 28, 2007 (gmt 0)

From the recent Google Blog posting Robots Exclusion Protocol: now with even more flexibility [googleblog.blogspot.com], Google have announced the availability of the
unavailable_after meta element, which enables you to give an expiry date to your pages. (See the thread Google Plans a New Meta Tag - "unavailable_after" [webmasterworld.com] for more information.)

However, there is a second, more interesting, announcement in the same entry: the ability to control Googlebot behavior via HTTP headers rather than on-page meta elements: the X-Robots-Tag header.

We've extended our support for META tags so they can now be associated with any file. Simply add any supported META to a new X-Robots-Tag directive in the HTTP Header used to serve the file.

As mentioned in the post, this is very useful for non-HTML content such as PDF, Word or plain text [webmasterworld.com] files, where you cannot insert meta elements. You can also reduce clutter in the document itself, as well as control indexing via the server configuration rather than editing the files.

One caveat not mentioned by Google is that only Googlebot supports this syntax - unless the other search engines decide to follow suit - so you will still need meta elements for Yahoo or MSN. Also, how long do you reckon we'll have to wait until the first case of a hacked server being modified to send a noindex HTTP header with every request?

 

Inspired




msg:3407488
 3:24 am on Jul 29, 2007 (gmt 0)

Yes, I would definitely be concerned about the ease with which a website on a compromised server could be destroyed.

Key_Master




msg:3407489
 3:30 am on Jul 29, 2007 (gmt 0)

THANK YOU!

engine




msg:3408541
 3:02 pm on Jul 30, 2007 (gmt 0)

This follows on from our earlier post on the matter.
[webmasterworld.com...]

mcavic




msg:3408867
 7:44 pm on Jul 30, 2007 (gmt 0)

wait until the first case of a hacked server being modified to send a noindex HTTP header with every request

If your server is hacked, search engine placement is the least of your worries.

mikomido




msg:3408875
 7:53 pm on Jul 30, 2007 (gmt 0)

Am I right in thinking that this is basically a "Is-robot: true" or "Is-robot: 1" HTTP header? So we no longer have to sniff the User-agent string and guess whether it's a bot or a human using a Web browser?

jeffgroovy




msg:3408965
 9:46 pm on Jul 30, 2007 (gmt 0)

If your server is hacked, search engine placement is the least of your worries.

LOL, no doubt! Last time my main unix server was hacked I didn't have any search engine placement worries, in fact I didn't have any websites left on it at all...thank goodness for my backup dedicated hosting the downtime was minimal.

If some one has unauthorized access to your website, there's already plenty of ways they can break down your business without any need for a new metatag, someone can already put a nofollow tag and get you out of the serps if they have access to your server.

encyclo




msg:3409060
 12:10 am on Jul 31, 2007 (gmt 0)

someone can already put a nofollow tag and get you out of the serps if they have access to your server

The comment about hackers was merely an aside and not the main part of my post in any way, but I'll just reply to this: the HTTP header is much more unobtrusive, and therefore much harder to detect, than actually modifying the pages themselves or changing the robots.txt (something which has been reported as occuring in the past in order to remove a site from the index).

Am I right in thinking that this is basically a "Is-robot: true" or "Is-robot: 1" HTTP header? So we no longer have to sniff the User-agent string and guess whether it's a bot or a human using a Web browser?

This is not anything sent by the bot itself, so it doesn't help in identifying Googlebot - it is a HTTP header that you can add to your server's response to a GET request, which offers similar functionality to the usual robots meta elements more commonly seen. You can add the HTTP headers via a server-side scripting language (PHP, etc.) or via the server configuration (Apache httpd.conf, IIS...).

ogletree




msg:3409487
 1:23 pm on Jul 31, 2007 (gmt 0)

This is great but nobody is saying how you would do such a thing. How do you modify your http header on IIS and Apache?

zCat




msg:3409506
 1:50 pm on Jul 31, 2007 (gmt 0)

How do you modify your http header on IIS and Apache?

That's something you'd usually handle at application level, e.g. in PHP / ASP / whatever.

Key_Master




msg:3409937
 9:00 pm on Jul 31, 2007 (gmt 0)

Here's a simple example for Apache that you can include in your .htaccess file to keep Googlebot (and hopefully others in time) from indexing image files. With some modification it can be used to control robot access to other files or file types:

<Files ~ "\.(gif¦jp[eg]¦png)$">
Header append X-Robots-Tag "noindex"
</Files>

The X-Robots-Tag directive is a small step towards making robots.txt obsolete.

ogletree




msg:3409972
 9:50 pm on Jul 31, 2007 (gmt 0)

I don't really see the need for this. Why not just stick those files in their own directory and disallow it.

Key_Master




msg:3409980
 10:03 pm on Jul 31, 2007 (gmt 0)

If your only intention is to disallow access to a file or files then a robots.txt would work just fine.

However, you can't use noarchive, nofollow, nosnippet, or unavailable_after in a robots.txt file. The header X-Robots-Tag is a much more powerful tool. It allows us to use these directives without needing to edit files. It also allows us to use these directives for media files, pdf files, etc, that can't have meta tags directives inserted in them. It can also be used for user-agent/ip delivery.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved