| Welcome to WebmasterWorld Guest from 22.214.171.124 |
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
|Pubcon Gold Sponsor 2015! |
|HTTP 320 URLs, robots.txt and Google|
I need to remove indexed HTTP 302 Redirect URLs from Google index.
I have a homepage listing a bunch of items. Each one has some information about a particular external page (image, title, description, some other info bits).
Each item also has a link to a HTTP 302 Redirect. For example: /go/122456/
On that intermediate redirect step I capture info about the user/request (such as datetime, referrer, browser, etc).
Redirect goes to the final page, always an external site.
Google has now indexed my redirect URLs in a way that it shows SERPs with the title and description from the final external pages, but with my redirect URL (again, /go/123456/)
This was reported as a bad practice by a SEO tool I am using. I want now to remove those URLs, before they hurt my site.
I added a robots.txt to the root of my website with this information:
Google webmaster tools seems now to follow the new restriction, as per the number of "Blocked URLs" on its "Blocked URLs" under "Health" section.
URLs continue to show up on SERPs, when I do a search like "site:mysite.com"
Shouldn't Google remove automatically previously indexed URLs? Or may I do it by hand ("Remove URLs" section, under "Optimization")?
Thanks in advance,
No. They will not remove the URLs since they already know they "exist". The robots.txt directive merely stops them being fetched by Google.
The way to remove the URLs from the index is to add the meta robots noindex tag to the HTML part of the redirect page.
Alternatively, using a 301 redirect in place of the 302 redirect would do the trick.
I'll bet the other sites are not at all impressed with your URLs appearing in the SERPs with their titles.
How can I add any HTML meta tag if there is no content or body on the response, as it is only a HTTP 302 (Found) response? For instance, using curl, we get something like:
$ curl -i http://mysite.com/go/3360/
HTTP/1.1 302 FOUND
Date: Fri, 15 Jun 2012 20:01:55 GMT
Content-Type: text/html; charset=utf-8
Alter your redirect script (PHP is it?) to append an HTML body after the HTTP header.
or you could add this header to your response:
X-Robots-Tag HTTP header specifications:
All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved