homepage Welcome to WebmasterWorld Guest from 54.161.185.244
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
HTTP 320 URLs, robots.txt and Google
I need to remove indexed HTTP 302 Redirect URLs from Google index.
nabucosound



 
Msg#: 4466000 posted 7:43 pm on Jun 15, 2012 (gmt 0)

Scenario

  • I have a homepage listing a bunch of items. Each one has some information about a particular external page (image, title, description, some other info bits).
  • Each item also has a link to a HTTP 302 Redirect. For example: /go/122456/
  • On that intermediate redirect step I capture info about the user/request (such as datetime, referrer, browser, etc).
  • Redirect goes to the final page, always an external site.

    Problem

  • Google has now indexed my redirect URLs in a way that it shows SERPs with the title and description from the final external pages, but with my redirect URL (again, /go/123456/)
  • This was reported as a bad practice by a SEO tool I am using. I want now to remove those URLs, before they hurt my site.

    Workaround

  • I added a robots.txt to the root of my website with this information:

    User-agent: *
    Disallow: /go/


  • Google webmaster tools seems now to follow the new restriction, as per the number of "Blocked URLs" on its "Blocked URLs" under "Health" section.
  • URLs continue to show up on SERPs, when I do a search like "site:mysite.com"

    The Question

  • Shouldn't Google remove automatically previously indexed URLs? Or may I do it by hand ("Remove URLs" section, under "Optimization")?

    Thanks in advance,

    Hector

  •  

    g1smd

    WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



     
    Msg#: 4466000 posted 7:59 pm on Jun 15, 2012 (gmt 0)

    No. They will not remove the URLs since they already know they "exist". The robots.txt directive merely stops them being fetched by Google.

    The way to remove the URLs from the index is to add the meta robots noindex tag to the HTML part of the redirect page.

    Alternatively, using a 301 redirect in place of the 302 redirect would do the trick.

    I'll bet the other sites are not at all impressed with your URLs appearing in the SERPs with their titles.

    nabucosound



     
    Msg#: 4466000 posted 8:06 pm on Jun 15, 2012 (gmt 0)

    Thanks @g1smd.

    How can I add any HTML meta tag if there is no content or body on the response, as it is only a HTTP 302 (Found) response? For instance, using curl, we get something like:

    $ curl -i http://mysite.com/go/3360/

    HTTP/1.1 302 FOUND
    Server: nginx/0.7.65
    Date: Fri, 15 Jun 2012 20:01:55 GMT
    Content-Type: text/html; charset=utf-8
    Connection: keep-alive
    Location: http://www.externalsite.com/page/slug/stuff
    Vary: Accept-Encoding
    Content-Length: 0

    g1smd

    WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



     
    Msg#: 4466000 posted 8:18 pm on Jun 15, 2012 (gmt 0)

    Alter your redirect script (PHP is it?) to append an HTML body after the HTTP header.

    phranque

    WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



     
    Msg#: 4466000 posted 12:38 am on Jun 16, 2012 (gmt 0)

    or you could add this header to your response:

    X-Robots-Tag: noindex


    X-Robots-Tag HTTP header specifications:
    http://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag [developers.google.com]

    Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
    WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
    © Webmaster World 1996-2014 all rights reserved