| 9:41 am on Mar 9, 2013 (gmt 0)|
Uhm... What exactly are you trying to do? Ordinarily when you talk about x-robots you're talking about indexing. But if it isn't there, there's nothing to index. And robots don't spend a lot of time reading error documents.
| 11:25 am on Mar 9, 2013 (gmt 0)|
welcome to WebmasterWorld, Jerome!
you can set an arbitrary header in a FilesMatch or similar container if that would be sufficient to identify the requested resource(s) that are generating 410 responses.
while technically possible, you should answer lucy24's question before pursuing this.
a 410 will remove a url from the index.
(I'm assuming that's the purpose of the X-Robots-Tag)
how is the 410 being generated?
| 11:50 pm on Mar 11, 2013 (gmt 0)|
Thanks lucy24 and phranque.
What I'm trying to do is to set "X-Robots-Tag: none, nosnippet, noarchive" for all the 410 URL's we have in our site.
We buy this site with a lot of SEO pages and we decided to not maintain those SEO pages. And unfortunately most of the URL's are non standard and I can't use the FilesMatch as suggested by phranque.
| 12:14 am on Mar 12, 2013 (gmt 0)|
If the pages are already gone, nobody will ever see the header. It's like the server saying "The page you requested isn't here, but if it were here, this is the header you would get." You might be able to rig some kind of jiggery-pokery using a php script, but not in Apache alone.
I think what you really want to do is remove them via gwt. I guess the FilesMatch problem applies here too, unless all the defunct files live in the same directories.
Come to think of it, how are you identifying the pages so they can return that 410 in the first place? Unlike 404, a 410 doesn't happen on its own; you have to take some intentional action.
Can't help suspecting that when you say "410 pages" you really mean "404 pages that were intentionally removed". And that's a whole nother question.
| 12:25 am on Mar 12, 2013 (gmt 0)|
As implied above, using a robots HTTP header in addition to a 410 status code is redundant at best and open to confusion at worst. Your server response is going to look like this:
HTTP/1.1 410 Gone
Date: Tue, 12 March 2013 21:42:43 GMT
So after you say the URL is gone, you will also suggest not indexing it. But no search engine indexes pages with 4xx status codes, so there is no point in also including a robots HTTP header.
Perhaps if you explain why you think this is a good idea we can offer alternate suggestions.
| 12:27 am on Mar 12, 2013 (gmt 0)|
Thanks for the quick reply lucy24. Actually this is an SEO iniative and I just ask this question if it is possible to do it in apache instead of touching the application level.
But as you said I need also to used some php script which I think can solve my problem.
Thanks for the big help lucy24. :)
| 1:02 am on Mar 12, 2013 (gmt 0)|
Come to think of it... Would this part of the header come from the document that is requested, or the document that is actually served? Quick detour to Live Headers confirms that the "content-length" element belongs to the error document; there's no difference between asking for the doc by name and getting it as a response to a 40x request. And caching/expiration definitely belong to the file served, not the file requested. (I've taken advantage of this detail when logging.)
| 7:41 am on Mar 12, 2013 (gmt 0)|
What we really want is to remove them from GWT. And this tag will now be part of the document that is being served.
Anyway, thanks a lot for the big help really appreciate it. :)
| 8:15 am on Mar 12, 2013 (gmt 0)|
how is the 410 being generated?
if googlebot is requesting one of these 410 urls, G Search will remove the url from the index and GWT will essentially report the error (410) and ignore the header.
| 9:42 pm on Mar 12, 2013 (gmt 0)|
Come to that: It's not a bad idea to attach the no-index tag to all your error documents, just to protect yourself against awkward mistakes. That's the error document itself, not the requested document that generated the error.
| 10:31 pm on Mar 12, 2013 (gmt 0)|
|Come to that: It's not a bad idea to attach the no-index tag to all your error documents |
I'd be careful with this. Google already indexes URLs that are excluded by x-robots-tag headers and robots.txt - that's why they appear in search results (because third party references to a URL are enough to get a URL indexed). You run the risk of creating URLs for evaluation should a search engine prioritise robots directives over HTTP status codes.
| 11:42 pm on Mar 12, 2013 (gmt 0)|
|excluded by x-robots-tag headers and robots.txt |
Now, wait, that's something entirely different. Something it took me years to wrap my head around, in fact ;)
robots.txt prevents crawling. It doesn't prevent indexing.
I recently had to add a line of text to my test site's front page saying something like "sorry, folks, there's nothing here". The entire domain is roboted-out. So when a search query points straight to the domain name, it will show up in search results as a listing without a text snippet.