Welcome to WebmasterWorld Guest from 54.145.173.147

Forum Moderators: goodroi

Message Too Old, No Replies

HTTP 320 URLs, robots.txt and Google

I need to remove indexed HTTP 302 Redirect URLs from Google index.

   
7:43 pm on Jun 15, 2012 (gmt 0)



Scenario

  • I have a homepage listing a bunch of items. Each one has some information about a particular external page (image, title, description, some other info bits).
  • Each item also has a link to a HTTP 302 Redirect. For example: /go/122456/
  • On that intermediate redirect step I capture info about the user/request (such as datetime, referrer, browser, etc).
  • Redirect goes to the final page, always an external site.

    Problem

  • Google has now indexed my redirect URLs in a way that it shows SERPs with the title and description from the final external pages, but with my redirect URL (again, /go/123456/)
  • This was reported as a bad practice by a SEO tool I am using. I want now to remove those URLs, before they hurt my site.

    Workaround

  • I added a robots.txt to the root of my website with this information:

    User-agent: *
    Disallow: /go/


  • Google webmaster tools seems now to follow the new restriction, as per the number of "Blocked URLs" on its "Blocked URLs" under "Health" section.
  • URLs continue to show up on SERPs, when I do a search like "site:mysite.com"

    The Question

  • Shouldn't Google remove automatically previously indexed URLs? Or may I do it by hand ("Remove URLs" section, under "Optimization")?

    Thanks in advance,

    Hector
  • 7:59 pm on Jun 15, 2012 (gmt 0)

    WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



    No. They will not remove the URLs since they already know they "exist". The robots.txt directive merely stops them being fetched by Google.

    The way to remove the URLs from the index is to add the meta robots noindex tag to the HTML part of the redirect page.

    Alternatively, using a 301 redirect in place of the 302 redirect would do the trick.

    I'll bet the other sites are not at all impressed with your URLs appearing in the SERPs with their titles.
    8:06 pm on Jun 15, 2012 (gmt 0)



    Thanks @g1smd.

    How can I add any HTML meta tag if there is no content or body on the response, as it is only a HTTP 302 (Found) response? For instance, using curl, we get something like:

    $ curl -i http://mysite.com/go/3360/

    HTTP/1.1 302 FOUND
    Server: nginx/0.7.65
    Date: Fri, 15 Jun 2012 20:01:55 GMT
    Content-Type: text/html; charset=utf-8
    Connection: keep-alive
    Location: http://www.externalsite.com/page/slug/stuff
    Vary: Accept-Encoding
    Content-Length: 0
    8:18 pm on Jun 15, 2012 (gmt 0)

    WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



    Alter your redirect script (PHP is it?) to append an HTML body after the HTTP header.
    12:38 am on Jun 16, 2012 (gmt 0)

    WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



    or you could add this header to your response:

    X-Robots-Tag: noindex


    X-Robots-Tag HTTP header specifications:
    http://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag [developers.google.com]