homepage Welcome to WebmasterWorld Guest from 54.163.91.250
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Secure pages (https) overtaking search results
jakegotmail




msg:4653345
 4:03 pm on Mar 12, 2014 (gmt 0)

Two part question ...

1) Has anyone else observed an increasing number of results showing up as secure (httpS) pages instead of non-secure?

2) Of all of the types of duplicate content issues Google could need to try and identify and subsequently take action on, shouldn't they be able to tell that a page is an exact copy just in a secure version? (and ignore it) Errors in coding and design aside, this is by far the dumbest Google issue I've had to deal with.

I've only really began to take notice of #1 because I've been dealing with issues related to #2 in the last few weeks, so I'm unsure if the secure results have been there all along.

Is it possible that Google's push for secure searches has led them to look for and overvalue httpS pages on a site?

Our issue appears to stem from relative links within secure areas of the website, allowing bots to begin crawling a copy of all pages, just in secure form.

As a result, most pages previously ranking high on page two have been replaced by other options (related pages, child pages, etc.) farther down on pages 2-3. These option B's, if you want to call them that, are pages that are still being cached as non-secure.

The occasional page does still rank in the same or similar spot even though it is know cached as the secure version.

We are attempting to remedy the issue by offering a secure version of robots.txt, with some success in what has been recrawled so far.

Any thoughts? Suggestions? Similar experiences?

 

bumpski




msg:4653391
 6:22 pm on Mar 12, 2014 (gmt 0)

At least since mid 2013 Google has been looking for https versions of http pages. Also many webmasters don't seem to realize their webhost may have a secret domain path to their site as well. Typically some acronym of the website domain tied to the webhosts domain; like "MyWebSiteAcronym.example.com".
I have found that Google has "triple indexed" hundreds of thousands, if not millions, of sites, indexing identical pages as http, https, and acronym.example.com all showing in Google's index, even though the pages are clearly identical in overall content.

I have found simply using the rel=canonical mechanism will flush all of the duplicate pages out of Google's index. It will take a month or two. You must provide the canonical path even for the webhost's secret domain name for your site( if there is one ).

You will probably find your Webmaster tools, Google Index, Index status, Advanced, "ever crawled" statistic has taken a big jump.

ColourOfSpring




msg:4653399
 6:30 pm on Mar 12, 2014 (gmt 0)

Can someone show me an example of where an https:// page has different content to its http:// counterpart?

jakegotmail




msg:4653412
 7:04 pm on Mar 12, 2014 (gmt 0)

@ColourOfSpring

I guess that's part of my frustration ... they can properly ignore scraped content all over the place, but somehow exact copies on my own site under a secure designation causes me to drop in rankings?

The only differences between the two versions are 1) one is secure and 2) the non-secure version has much more authority (links, age, etc.).

There's no reason other than it being a "secure" page for it to outrank/replace the non-secure version (in the cases where it does), and no logical reason at all for Google to not realize the unintentional duplicate on the same domain.

@bumpski

I guess you could call it an alternate version of the www vs non-www canonical issue some sites face, but's somewhat different.

Rather than use canonicals, we're going with a change to the htaccess file to reroute visits to the secure port to robots_ssl.txt file disallowing all, while leaving other visits alone.

netmeg




msg:4653413
 7:06 pm on Mar 12, 2014 (gmt 0)

Errors in coding and design aside, this is by far the dumbest Google issue I've had to deal with.


So if I understand this correctly, your organization made a coding mistake, but it's Google's fault for crawling and indexing it the way it was configured?

Sorry, you lose me on this one.

jakegotmail




msg:4653419
 7:16 pm on Mar 12, 2014 (gmt 0)

@netmeg

Not looking for a guilty party on either end, or a scapegoat, just looking for logical explanations.

If thought about it terms of duplicate/scraped content anywhere else on the web where Google has enough common sense to know the original source, drop the others, and not change anything in SERPS, this doesn't make a single bit of sense.

And the fact that the site has always been configured this way (10+ years), but somehow only now this pops up? To me, that speaks to recent emphasis on secure searching.

netmeg




msg:4653427
 7:36 pm on Mar 12, 2014 (gmt 0)

If thought about it terms of duplicate/scraped content anywhere else on the web where Google has enough common sense to know the original source, drop the others, and not change anything in SERPS, this doesn't make a single bit of sense.


It makes sense to me, because I for one do not want Google making those kinds of decisions on how to pick up my content (any more than they already do) No, I prefer not to make Google think (more than they already do) and if I make some mistakes and they pick them up too - my bad.

Who knows why they got it now - maybe you never noticed before, or maybe someone linked to some of your erroneous https content, or maybe they expanded their crawl budget for your site, or maybe as you say they're looking for secure content.

But I don't find it odd in the least that they found it and indexed it.

dstiles




msg:4653428
 7:40 pm on Mar 12, 2014 (gmt 0)

There is almost certainly a rise in HTTPS sites, ever since last year when it was discovered the US NSA was scraping content which might prove detrimental to a site's customers (eg medication, shopping searches). HTTPS is harder to decode than HTTP by quite a long way.

The downside is that, according to at least one authority, there is more danger of an HTTPS site being retained by NSA until such time as they can decode the encryption.

It is likely that other governments are working to the same end: scraping everyone's browsing.

netmeg




msg:4653451
 8:16 pm on Mar 12, 2014 (gmt 0)

Also, SSL certificates have gotten a ton cheaper; don't overlook that aspect. Recently ran across someone who had one just because GoDaddy included it with his hosting package - he didn't *need* it, but it was thrown in so why not.

(Of course, it wasn't configured correctly and gunked the site up and I had try to fix it, but that's beside the point)

Robert Charlton




msg:4653464
 9:26 pm on Mar 12, 2014 (gmt 0)

The https/http confusion is a canonicalization issue, and in my experience, once it gets out into the wild, it must be taken care of with mod rewrite.

Here's a discussion that covers many of the basics, including likely causes...

Cross-site canonical meta tag questions
http://www.webmasterworld.com/google/4539413.htm [webmasterworld.com]

IMO, robots.txt is not the way to approach this problem.

jakegotmail




msg:4653466
 9:46 pm on Mar 12, 2014 (gmt 0)

For starters, we experimented with including manual canonical elements on a few targeted pages. The https versions of those slowly began to disappear, and our rankings returned.

Continuing from that idea, but in a way that can be applied site-wide, we are now doing a combination of both mod rewrite and robots.txt:

Adding a /robots_ssl.txt file to the site with the following:

User-agent: *
Disallow: /

and including this command in our .htaccess file:

1 RewriteEngine on
2 RewriteCond %{SERVER_PORT} ^443$
3 RewriteRule ^robots\.txt$ robots_ssl.txt [L]

Results are promising so far, although it's too early to tell for the site as a whole.

lucy24




msg:4653473
 10:20 pm on Mar 12, 2014 (gmt 0)

shouldn't they be able to tell that a page is an exact copy

They should, but they can't. This applies pretty universally, not just to http vs. https.

jakegotmail




msg:4653478
 10:43 pm on Mar 12, 2014 (gmt 0)

But yet they do with regularity, otherwise almost every site would have duplicate content issues.

There are certainly cases where they fail to do so -- plenty of threads out there about scraper sites outranking primary sources, etc. -- but I would think of all the instances where it would be easiest to do so ... ?

Dymero




msg:4653486
 11:17 pm on Mar 12, 2014 (gmt 0)

Yep, our site started showing HTTPS URLs a couple weeks ago. We're ecomm, so we need it for checkout, but the whole site was available under HTTPS. It could cause errors due to unsecured elements, though, so we had to fix it.

A redirect from HTTPS to HTTP and the canonical tag cleared up the issue.

ColourOfSpring




msg:4653631
 11:25 am on Mar 13, 2014 (gmt 0)

Personally speaking (as a programmer), it does seem like a simple problem to solve:-

http:// + mydomainname.com/mypage.html

AND

https:// + mydomainname.com/mypage.html

...if they have the same content, because they have identical URLs after the protocol definition (http or https), treat them as the same page - therefore the page with the most authority is the canonical page - no need to index the other page.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved