Welcome to WebmasterWorld Guest from 54.198.92.22

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Will Seek Out HTTPS Pages By Default

     
5:06 pm on Dec 18, 2015 (gmt 0)

Administrator from GB 

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month Best Post Of The Month

joined:May 9, 2000
posts:25917
votes: 881


Google continues to step up its need for HTTPS. Not only is google giving HTTPS a slight ranking boost [webmasterworld.com], it's now going to seek out more HTTPS pages, even if not linked from any page.

....we'd like to announce that we're adjusting our indexing system to look for more HTTPS pages. Specifically, we’ll start crawling HTTPS equivalents of HTTP pages, even when the former are not linked to from any page. When two URLs from the same domain appear to have the same content but are served over different protocol schemes, we’ll typically choose to index the HTTPS URL if:

  • It doesn’t contain insecure dependencies.
  • It isn’t blocked from crawling by robots.txt.
  • It doesn’t redirect users to or through an insecure HTTP page.
  • It doesn’t have a rel="canonical" link to the HTTP page.
  • It doesn’t contain a noindex robots meta tag.
  • The sitemaps lists the HTTPS URL, or doesn’t list the HTTP version of the URL
  • The server has a valid TLS certificate.
    Googlebot Will Seek Out HTTPS Pages By Default [googlewebmastercentral.blogspot.com]
  • 9:16 pm on Dec 18, 2015 (gmt 0)

    Senior Member from US 

    WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

    joined:Apr 9, 2011
    posts:15507
    votes: 752


    we’ll start crawling HTTPS equivalents of HTTP pages, even when the former are not linked to from any page

    I wish I had that kind of (crawl) budget, don't you? I did some casual experimenting with sites that I know for a fact don't have an https version. (My own, duh. Shared hosting.) Depending on browser version, it can take a very long time before the "can't connect" response comes back, possibly after one or more intervening steps. Does google keep an index of specific IPs that they already know don't listen on port 443 or, er, whatever the heck the number is? Or do they randomly try https://example.com/ and then write off the domain if they don't get through?

    Advice: If your site has any non-https content-- up to and including the whole site-- see if you can get there using https. If you can, tweak your domain-name-canonicalization redirect so that any one page can only be reached in one way (same as with/without www).
    10:15 pm on Dec 18, 2015 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

    joined:Aug 4, 2008
    posts:3549
    votes: 328


    Advice: If your site has any non-https content-- up to and including the whole site-- see if you can get there using https

    Lucy -- I don't understand how you could get to something that doesn't exist. Unless maybe somebody without your knowledge has created an https version of your site on the same server.

    But my real question is why the heck would google do this in the first place

    Actually I don't understand any of this very well
    2:26 am on Dec 19, 2015 (gmt 0)

    Senior Member from US 

    WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

    joined:Apr 9, 2011
    posts:15507
    votes: 752


    I don't understand how you could get to something that doesn't exist.

    It depends on how your server is set up. (As far as I can tell, mine simply doesn't listen on 443.)

    If you're in something like ecommerce, there will be parts of your site that require https-- at least I should hope so!-- and parts that don't. You need to make sure the "parts that don't" can only be reached one way.

    http and https aren't separate physical files; they're just connection methods. Same as for with-and-without www: 99 times out of 100 it's the same content.

    The exact error you get when trying to https to an http-only site depends on the browser. I don't know what's happening behind the scenes, but I kinda think the browser makes two separate stops: one to check the certificate, another to reach the actual site.
    1:21 pm on Dec 19, 2015 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

    joined:Aug 4, 2008
    posts:3549
    votes: 328


    Thanks Lucy. I think I understand it better now. The protocol won't match.
    1:38 pm on Dec 19, 2015 (gmt 0)

    Senior Member from US 

    WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

    joined:Oct 5, 2012
    posts:921
    votes: 181


    Funny, I was looking at this back in March (2015) [webmasterworld.com...]

    Glad to see they finally announced that they are doing it.
    1:41 pm on Dec 19, 2015 (gmt 0)

    Senior Member from US 

    WebmasterWorld Senior Member Top Contributors Of The Month

    joined:Nov 2, 2014
    posts:668
    votes: 330


    I've seen Google searching for https sites for a couple months now. Those on shared hosting should be cautious. Some of the major hosting sites port all ssl traffic to the ssl site. When Google searches for https sites, any add on domains under that account will also return the ssl site, with the wrong domain. This has serious implications for seo and can do quite a bit of damage (duplicate content). I'm sure some people have problems because of this and do not know it. At a bare minimum, make sure the ssl site has canonical tags. That should help to avoid duplicating the ssl site's content over multiple add on domains.

    I really wish Google would not do this. I liken it to port scanning since Google has no legitimate business or directive to search for the ssl content. And I know some big ecommerce platforms have problems with this too. The same content displays under both http and https, and the host does not allow a separate robots.txt file for each variation or the ability to 301 the traffic to the preferred domain. The most these people can do on that platform is set canonical tags and add their preferred domain to WMT. There's no guarantee Google will honor this.
    3:24 pm on Dec 19, 2015 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

    joined:Aug 4, 2008
    posts:3549
    votes: 328


    I tried it on two different http: sites hosted on shared servers at two different hosting companies. In both cases I got system error pages and no log entries, so apparently the failures took place at a higher level. So i don't know if there is anything I can do on my own to try to protect my sites from a foulup.
    4:41 pm on Dec 19, 2015 (gmt 0)

    Senior Member from US 

    WebmasterWorld Senior Member Top Contributors Of The Month

    joined:Nov 2, 2014
    posts:668
    votes: 330


    I tried it on two different http: sites hosted on shared servers at two different hosting companies.

    Bluehost is one shared hosting provider that is loading the content of ssl sites under non-https add on domains. I'm not sure if it impacts HostGator, but they are both owned by Endurance International Group.
    6:39 pm on Dec 19, 2015 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

    joined:Aug 4, 2008
    posts:3549
    votes: 328


    glakes -- Is that a case where the root domain has a certificate but the add-on domain doesn't?
    8:46 pm on Dec 19, 2015 (gmt 0)

    Senior Member from GB 

    WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

    joined:May 14, 2008
    posts:3234
    votes: 17


    I do not like G at all but there are reasons for their SSL push, which will soon be the web norm for many sites. SSL sites are (or should be if properly set up) encrypted end to end, so no-one - criminals nor government - can view the traffic between web site and browser. This is important for some people. There is also more resistance to exploit intrusion, which is very important. Also, more people are now wary of non-SSL sites due to Snowden's releases and there is a gradual move to SSL anyway.

    I recommend reading up on this: there are several good sites that explain it and a good test site at [ssllabs.com...]

    SOME web hosting services may return invalid certificates and possibly pages from an HTTPS request IF they have set up several domains on a single IP, one of which domains has an SSL cert and uses port 443. This is a sign of a dangerously inept web hosting service and must be avoided.

    The traditional way around this has been to assign one SSL site per IP. Obviously this has become impractical since it uses more IPs than are available. Web servers can now (as late as the past couple of years in some cases) be set up to host several SSL domains on a single IP using Server Name Indication.

    And now SSL certificates are available for as little as FREE, unless you're hosting on Windows where it isn't quite that cheap yet; but GBP4 a year is certainly not expensive.

    There is no longer an excuse for non-SSL sites IF CIRCUMSTANCES DICTATE IT! If you decide your site does NOT need SSL, fine. I have several of those myself, although I will migrate several of them to SSL over the next year or so.
    9:46 pm on Dec 19, 2015 (gmt 0)

    Senior Member from US 

    WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

    joined:Apr 9, 2011
    posts:15507
    votes: 752


    i don't know if there is anything I can do on my own

    If you can't reach your http content with https you don't need to do anything. It's google's time being wasted, not yours.

    While looking this up, I found that my (shared hosting) logs live inside a directory called-- or aliased-- "http". Don't know if this means that https, if it existed on my server, would be separately logged as "https". I guess it's theoretically possible to set up your vhosts so some domains on a single server can use https and some can't-- or some listen on 443 and some don't-- but it seems like more trouble than it's worth.

    Now, can anyone explain why google would look for https URLs that are linked from nowhere (their own prose)? I can't figure out who benefits.
    2:12 am on Dec 20, 2015 (gmt 0)

    Senior Member from US 

    WebmasterWorld Senior Member Top Contributors Of The Month

    joined:Nov 2, 2014
    posts:668
    votes: 330


    Is that a case where the root domain has a certificate but the add-on domain doesn't?

    Not necessarily the root domain, but the only ssl domain among many http sites under the same shared hosting account. I ran across this issue some months back when viewing the log files of a website and noticed the referring page was another domain on the same account. Canonical tags were used on the SSL site and Google did NOT list the pages with the wrong domain in their index. The other domains under the account were not under my past, then current or future tasks, and I don't know if that harmed the non http site's traffic or not.
    6:26 pm on Dec 21, 2015 (gmt 0)

    Senior Member from AU 

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

    joined:Aug 22, 2003
    posts: 2233
    votes: 142


    Now, can anyone explain why google would look for https URLs that are linked from nowhere (their own prose)? I can't figure out who benefits

    I can't even understand Google's apparent obsession with https.

    What possible beneficial reason could there be in having the "Blue Widgets Gardening Tips" site on https?

    The same could be said of a site catering to one specific locality e.g "Joe's Pizza Emporium of [insert locality]". Don't laugh we have something similar here in my town, but there are absolutely no online financial transactions.

    Absolutely pointless.
    8:58 pm on Dec 21, 2015 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member editorialguy is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

    joined:June 28, 2013
    posts:3375
    votes: 716


    What possible beneficial reason could there be in having the "Blue Widgets Gardening Tips" site on https?

    Dunno about gardening tips, but here's a New York Times blog post titled "Embracing HTTPS" that may be illuminating. It's about the use of https on journalism sites:

    [open.blogs.nytimes.com...]
    9:20 pm on Dec 21, 2015 (gmt 0)

    Senior Member from AU 

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

    joined:Aug 22, 2003
    posts: 2233
    votes: 142


    No doubt there are appropriate applications, usually though where there has always been a genuine need for https.

    However, Google's customary blanket approach to so many issues is now becoming beyond absurd. Recently we had the [largely non] issue of sites complying with EU cookie laws. Around the same time we had the in vogue becoming mobile friendly.

    The implication is always - if you don't comply - your site will be punished.
    10:05 pm on Dec 21, 2015 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member editorialguy is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

    joined:June 28, 2013
    posts:3375
    votes: 716


    But, in the case of https, Google isn't implementing a blanket approach. You can use https or not, as you see fit.
    10:34 pm on Dec 21, 2015 (gmt 0)

    Senior Member from AU 

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

    joined:Aug 22, 2003
    posts: 2233
    votes: 142


    Well as Engine said in the introduction
    Google To Give Secure HTTPS Sites A Ranking Boost

    A Ranking Boost? Why? My own view, since Google evolved about 15 years ago, is Google want things done their way. Mr. Google is of course perfectly entitled to set the parameters which suit them, however they seemed to have long ago abandoned their original ideals.

    I myself stepped off the Google dance quite a few years back. The bulk of my traffic originates from a vast network in my own genre, few of which sites would appear prominently in Google SERPS. Ironically, it also contains some of the best content [from every country] for the genre across the world - but you are unlikely to find it on Google unless you drill right down through several search pages of flim flam.
    11:30 pm on Dec 21, 2015 (gmt 0)

    Senior Member from US 

    WebmasterWorld Senior Member ogletree is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

    joined:Apr 14, 2003
    posts: 4319
    votes: 42


    So great now I have to listen to a bunch of morons talking about how you have to have https to rank for anything.

    If you have a problem with your pages not being found by google https is not going to help you any. All this is about page discovery not actual ranking.
    3:22 am on Dec 22, 2015 (gmt 0)

    Junior Member

    joined:Aug 3, 2013
    posts: 113
    votes: 32


    Been chewing on this one myself a bit lately. It easy and cheap (sometimes free) to set up a secure certificate on most hosting platforms but a lot of shared hosting people might have trouble with it.

    It a little bit mean to give https a ranking boost. Seems like another easy excuse to further demote the little guys.

    On my server it only takes a few minutes to set up http so for any site but much longer to explore the potential indexing and implementation issues. For now I have sites that seem to respond identically both ways and I've not seen any effect in rank or traffic from google yet. No signs of duplicate urls indexed or any other troubles. I definitely need to research this more and keep an eye on the search index.
    4:08 pm on Dec 22, 2015 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member editorialguy is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

    joined:June 28, 2013
    posts:3375
    votes: 716


    If you have a problem with your pages not being found by google https is not going to help you any. All this is about page discovery not actual ranking.

    Exactly. Google is simply telling us that, if we do create https versions of our pages, its crawlers will manage to find and index those pages. Why would anyone find that objectionable?
    5:12 pm on Dec 22, 2015 (gmt 0)

    Senior Member from US 

    WebmasterWorld Senior Member ogletree is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

    joined:Apr 14, 2003
    posts: 4319
    votes: 42


    It sounds like to me that Google is going to hit every URL twice now once as htttps and once as http. If it gets a https it will consider that to be the canonical page unless told otherwise.
    9:34 pm on Dec 22, 2015 (gmt 0)

    Senior Member from US 

    WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

    joined:Apr 9, 2011
    posts:15507
    votes: 752


    unless told otherwise

    Shouldn't the complete absence of links count as telling them otherwise?

    a bunch of morons talking about how you have to have https to rank for anything

    I'm not a blanket google defender, but that seems a pretty harsh descriptor. Remember, this is about a Horse's Mouth pronouncement using the phrase "ranking boost".

    Google is going to hit every URL twice

    They're going to request every URL twice. That's not necessarily the same thing, because unlike with/without www, a request using the wrong protocol will often not succeed at all.

    :: happily scurrying back to exact-text ebook searches, where ranking is not an issue because it's rare for the search to turn up more than ten results total ::
    9:52 pm on Dec 22, 2015 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member 10+ Year Member

    joined:June 2, 2006
    posts:2239
    votes: 8


    reach your http content with https


    Hm, in the case of one of my sites it says (Firefox) Untrusted Connection, then asks me if I want to continue. Explorer says Certificate Error, and also offers to continue.

    In all cases, if continued, it gets to https://www.example.com/cgi-sys/defaultwebpage.cgi

    This is the VPS setup. In case of some other sites, it says "Unable to connect"

    Is it better to redirect https to http or make it give the error right away without those warnings and choices to continue?

    Thanks
    9:55 pm on Dec 22, 2015 (gmt 0)

    Senior Member from US 

    WebmasterWorld Senior Member ogletree is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

    joined:Apr 14, 2003
    posts: 4319
    votes: 42


    My point is that Google is now going to request more URL's from our sites than they used to. I just know i'm already hearing a bunch of people saying "you have to have SSL for SEO"
    10:17 pm on Dec 22, 2015 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

    joined:Aug 4, 2008
    posts:3549
    votes: 328


    Google To Give Secure HTTPS Sites A Ranking Boost

    I just did a google search (incognito) for "ssl certificate" (without quotes). There were three adwords ads above the organic results, plus eight more ads on the right side.

    So google tells everyone that they'll get a rankings boost if they get a certificate, then people go searching for a certificate and click one of the ads. So more profits for google.
    11:03 pm on Dec 22, 2015 (gmt 0)

    Junior Member

    joined:Aug 3, 2013
    posts: 113
    votes: 32


    It seems fairly easy to get in "compliance" but... Forget for a moment that you are a hard-core coder.

    Think of yourself as a writer or publisher. You post 1 page. Google tells you you have 4 pages: www, non-www, http, https.... then just for good measure they throw some random url parameters at your static pages for you to deal with.

    Security is good. Tons of sites have been all https for a very long time.

    With that said, I predict this will cause disruption & we will see some perfectly good sites get creamed and never recover.
    12:15 am on Dec 23, 2015 (gmt 0)

    Senior Member from US 

    WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

    joined:Apr 9, 2011
    posts:15507
    votes: 752


    Is it better to redirect https to http or make it give the error right away without those warnings and choices to continue?

    I was thinking about nanoseconds. In Apache it's a change from
    RewriteCond %{HTTP_HOST} !^(example\.com)?$
    RewriteRule (.*) http://example.com/$1 [R=301,L]
    to something like
    RewriteCond %{HTTP_HOST} !^(example\.com)?$ [OR]
    RewriteCond %{HTTPS} on
    RewriteRule (.*) http://example.com/$1 [R=301,L]
    i.e. one more condition that has to be evaluated on every request ever ... even if the condition will always fail because I don't think an https request on a site without https would ever even get that far.

    Anyone know a good place to look up exactly what happens after you type "https://" into your browser's address bar?

    Come to think of it, what about non-page content? Plenty of sites have https pages but use plain http for images and stylesheets.
    1:10 am on Dec 23, 2015 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

    joined:Dec 27, 2004
    posts:1977
    votes: 68


    As I see it there is absolutely nothing wrong with throwing 403 at GBot for HTTPS requests when it is something that you don't want the bot to see. NO is NO.

    I remember when Slurp was trying to crawl all directories in URL paths, which was completely pathetic waste of time on their part.

    If they can't access it there is no dupe content issue what-so-ever.

    Like Busta Rymes once said:
    If you really wanna party with me
    Let me see just what you got for me
    Put all your hands where my eyes can see


    @raseone,

    You forgot the https for non-www ;)
    1:51 am on Dec 23, 2015 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

    joined:Dec 27, 2004
    posts:1977
    votes: 68


    RewriteRule (.*) http://example.com/$1 [R=301,L]


    I never liked that rule. That $1 at the end is a can of worms that needs to be hidden far far away. Imagine if someone a few hundred links to a proper URI on you site but with some curse words in query string. That rule says URI found and moved to a new location, then the linking page gets crawled and you server redirects it.

    I prefer to use MAP files or even better do in the App Code itself when available.
    This 45 message thread spans 2 pages: 45