HTTPS pages still not de-indexed in Google via noindex, waited a year

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

HTTPS pages still not de-indexed in Google via noindex, waited a year

amznVibe

12:50 pm on Mar 15, 2014 (gmt 0)

Dealing with a site that got indexed on both HTTP and HTTPS pages with duplicate content.

So we made HTTPS pages have the meta tag for noindex.

<meta name="robots" content="noindex" />

Except nearly a year later, google still retains those pages.

Google says NOT to use the removal tool to remove https pages
http://support.google.com/webmasters/answer/1269119 [support.google.com]
(at the bottom)
but it wouldn't be an option anyway with 40,000 pages.

Is this possibly because they are orphaned pages as google sees the "noindex" on the parent pages and stops following? Doesn't google visit orphaned pages sooner or later anyway?

netmeg

2:36 pm on Mar 15, 2014 (gmt 0)

Why would you noindex them instead of just redirecting them?

bumpski

3:01 pm on Mar 15, 2014 (gmt 0)

Is Google still crawling the HTTPS pages?

I have successfully used canonicalization to remove duplicate HTTPS pages from Google's index. In the canonical declaration you must include the protocol specifier, "http://"

<link rel="canonical" href="http://www.example.com/pathto/pagename.htm"> Putting this on the HTTPS copy of your page.

But if Google is not crawling the HTTPS pages, neither noindex or canonicalization will work.

You may be able to use webmaster tools to force Google to recrawl your entire https version of the site, but if there are no links to all the https pages, that won't help either. If you have relative links somewhere it may work. Another thought is to create another none https page that has a link to the https home page of the site, starting the WMT crawl on this page and then hopefully making the crawler use HTTPS? Webmaster tools does not allow you to specify crawling with https.

kenroar

7:57 pm on Mar 17, 2014 (gmt 0)

You want Google to index those pages so they see a 404 returned. They will then remove the pages.

lucy24

10:05 pm on Mar 17, 2014 (gmt 0)

You want Google to index those pages so they see a 404 returned.

?
Do you mean crawl the pages? A page with a <noindex> tag will still be crawled periodically; conversely a roboted-out page can still appear in search results.

It took me a very long time to wrap my brain around the difference, and I'm not letting go of it now.