Welcome to WebmasterWorld Guest from 54.157.239.93

Forum Moderators: Robert Charlton & andy langton & goodroi

Moved http to https - nothing yet in webmaster tools

     
7:10 pm on Feb 8, 2017 (gmt 0)

New User

joined:Nov 29, 2015
posts:26
votes: 2


Hi guys,

I moved my site from http to https yesterday - I followed a guide and am sure I did everything right - permant redirects, all internal links, etc.

I added the new https web property in webmaster tools and resubmitted the sitemap - but no pages are indexed and it looks like the original http site is still getting data.

Analytics looks like it's worked. But not sure about webmaster tools. Bing webmaster tools also is "pending"

Is there anything else you would check to make sure google is ok with the new https site.

Thank for the help!
10:25 pm on Feb 8, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3080
votes: 208


Have you used "Fetch as Googlebot" to see what it finds, checking both the old and new URLs for several different pages. Also, might try "Submit to Index"
7:04 am on Feb 9, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 7, 2006
posts: 955
votes: 61


A lot will depend on the size of your site and Google's usual crawl frequency (which isn't the same for all sites), but it certainly won't be instant. When I moved my site the GSC index took weeks to complete, so there isn't any cause for concern after only 24 hours.

GSC isn't always completely up-to-date anyway, so check periodically by searching site:yourdomainname.com (again, this took some time to complete with my own site, and both versions of some pages were showing in the results for much of the transition period).

You can try hurrying it along by using Request Indexing under Fetch as Google - see Aritotle's comment - in the Crawl section of GSC, but this will be laborious if you have many pages. However, an important thing you should also check is your own server logs, as these will show errors that might not otherwise be visible (for example, check that .htaccess redirects haven't created any loops): the last thing you want is for Googlebot to hit an error when it finally does try to re-index a page.

Assuming everything is OK, however, you just need to give it time.
7:29 am on Feb 9, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7742
votes: 262


Agree with Wilburforce, give it a few days.

You may wish to resubmit your updated sitemap.xml to tell Google about the changes.
6:14 am on Feb 12, 2017 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Feb 8, 2017
posts:53
votes: 1


It will take months if not years. You can certainly bang your head against fetch-as-google.and hope it will make a difference. In the end, google will take as long as it will take and will have many periods of reverting to the old http pages. Some of the pages might even permanently stay http in the search results - that certainly is the case for me.
7:02 am on Feb 12, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7742
votes: 262


@NoobOperator - that is not accurate. If that is your experience, you may have had other issues.

Done correctly reindexing should take just a few days.
7:20 am on Feb 12, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13531
votes: 403


It will take months if not years.

Mercifully, that's an exaggeration. Assuming for the sake of discussion that very few sites are less interesting to Google than mine (my name is not wikipedia dot org), I went back and checked logs from when I moved sites near the end of 2013. Barring requests from www.google.com/search (which I think is bogus but won't bother to investigate right now), redirects had trickled away to almost nothing within a couple of weeks.

The "almost" is one exceedingly obscure page that was redirected by google.ca as recently as March 2015 (more than a year after the move). Everything else dried up before the end of January 2014.

In my case, it took a few days for the most popular pages to get re-indexed. Obviously there exist sites--ahem! cough-cough!--where things get indexed much, much faster.*

Now, if you want redirects that go on forever and ever and ever, try Bing image search. To this day, I'm finding redirected requests. Even well-liked pages got redirects up to 6 months after the fact. But this is the google subforum, so no worries.

Caveat: I'm looking at a site redirect, not a protocol change. But I see no reason to think the underlying search-engine behavior would be any different; at bottom, it's still a question of indexing URLs.


* Sometimes even before they happen. Not long ago I landed on a SERP that claimed a particular page had been indexed--or at least discovered--some 12 hours before it was actually created. Impressive feat, there.
7:33 am on Feb 12, 2017 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Feb 8, 2017
posts:53
votes: 1


@keyplyr
Done correctly reindexing should take just a few days.


I guess google don't like me. I have a tiny site of 30 pages. It has a single redirect as indicated below. How can anyone fail to do it correctly ? My http's are still coming and going on search results 3 months after the redirect, and some simply won't go at all, meaning I will have them for life.

<VirtualHost *:80>
ServerName mydomain.com
ServerAlias www.mydomain.com
ServerAlias mail.mydomain.com

# redirects http to https except for a getssl request that must be http
RewriteEngine on
RewriteCond %{REQUEST_URI} !^/.well-known
RewriteRule ^/(.*) h.t.t.p.s://www.mydomain.com/$1 [R=301,L]
</VirtualHost>
7:58 am on Feb 12, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 19, 2002
posts:3268
votes: 20


h.t.t.p.s


i'm sure you were highlighting this for some reason? as there should be no 'dots' it should be just https
8:20 am on Feb 12, 2017 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Feb 8, 2017
posts:53
votes: 1


@topr8

doesn't show in post otherwise. Like I said, it's very hard to make a mistake with something simple. Unless deliberately.
8:29 am on Feb 12, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7742
votes: 262


That's why it's best to use code brackets. They're designed for that purpose. Also, please use example.com. Thanks.
8:53 am on Feb 12, 2017 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Feb 8, 2017
posts:53
votes: 1


@keyplyr

I would use code brackets if I could find them. It took me 2 days to work out how to quote for instance. As a mod, perhaps you can have a talk with people running this forum and get them to make things easier. I am a life long geek and struggling here. Some basic functions are just needed at all times. Makes no sense to hide them.
9:11 am on Feb 12, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7742
votes: 262


In addition to using the editor, you can type it.

[ code ]text here[/ code]

(remove the space before code)
9:16 pm on Feb 12, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 7, 2006
posts: 955
votes: 61


@NoobOperator

I don't know how your use of .well-known in the RewriteCond statement affects the subsequent RewriteRule, but it obviously depends on server-specific conditions that might - if any https request did not call for .well-known - create a redirection loop. Similarly, if any http request did call for .well-known, it would bypass the RewriteRule.

If loops can occur, Google might fail to index the https version of a page because every request for it results in a redirect. This could happen if, for example, the user-agent gets whatever it needs from .well-known for https://www.example.com/page1.htm, but having obtained it doesn't need to get it again in the same session for other pages. This also raises the possibility of anomalies that might be agent- or browser-specific.

More reliable, therefore - as it doesn't rely on any external conditions - would be e.g.

#If not already https
RewriteCond %{HTTPS} !=on
#Use https version
RewriteRule ^ https://www.example.com/%1%{REQUEST_URI} [L,NE,R=301]


However, as you have already waited three months without resolution, I'd be inclined to go the whole hog and make redirection specific to each page (which won't take many minutes to do for 30 pages), as follows:


RewriteEngine on
#If already using https, skip the next 30 RewriteRules
RewriteCond %{HTTPS} =on
RewriteRule ^/* - [S=30]
#Redirect each of your 30 pages
RewriteRule ^page1.htm https://www.example.com/page1.htm [R=301,L]
RewriteRule ^page2.htm https://www.example.com/page2.htm [R=301,L]
...
RewriteRule ^page30.htm https://www.example.com/page30.htm [R=301,L]
#If not already https
RewriteCond %{HTTPS} !=on
#Redirect http requests for any page not specified above
RewriteRule ^ https://www.example.com/%1%{REQUEST_URI} [L,NE,R=301]


Having done that and tested it, use GSC to Request Indexing (see my previous post), which again won't take long for 30 pages.
9:31 pm on Feb 12, 2017 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Feb 8, 2017
posts:53
votes: 1


@Wilburforce

I have a single redirect for everything which are all static and simple html pages. There's no reason redirect only fails for some of them. My rule doesn't produce a loop.

In any case it's not critical everything must come through as https from the search results because they will be redirected regardless. My comment was only that it takes time for google to work and that sometimes it may never work. But it doesn't really matter to me if google list some of my pages as http and some as https. So long as they are listed, they are all good, and will always lead to the right place.
9:48 pm on Feb 12, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:11763
votes: 270


If Google lists multiple versions, you can end up splitting your inbound link vote.
10:11 pm on Feb 12, 2017 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Feb 8, 2017
posts:53
votes: 1


@Robert Charlton

It lists only one version, either http or https although some of them swap places from time to time.

For a laugh I just did this: site:https://www.example.com

Some pages are listed as http. I wonder if google offer different level of services. I am a freeloader. Perhaps they give me lowest priority so my indexing could be correct in about 5 years.
10:23 pm on Feb 12, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13531
votes: 403


Incidentally ...
RewriteCond %{REQUEST_URI} !^/.well-known

What's the leading . for? Typo, or do you have a cluster of directories named "awellknown", "bwellknown" and so on?
10:58 pm on Feb 12, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 7, 2006
posts: 955
votes: 61


My rule doesn't produce a loop.


Because?

If Google hasn't indexed your site correctly in three months, there is an error somewhere, and it probably isn't theirs.
11:02 pm on Feb 12, 2017 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Feb 8, 2017
posts:53
votes: 1


@lucy24

.directory is a hidden directory on linux. So URL/.well-known refers to a hidden subdirectory in the html root. Since google doesn't know this subdirectory, it will never ask for it. The subdirectory has no effect on indexing.

[edited by: NoobOperator at 11:10 pm (utc) on Feb 12, 2017]

11:05 pm on Feb 12, 2017 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Feb 8, 2017
posts:53
votes: 1


@Wilburforce

It certainly is their error to list http when I ask for site:https://www.example.com.

My conclusion is google is bugged, and they are too rich and too monopolized to be bothered with fixing it.
11:15 pm on Feb 12, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:11763
votes: 270


It lists only one version, either http or https although some of them swap places from time to time.

That's the point, and is what I meant by "multiple versions".

Mods hat on: Are you being purposely obtuse?

11:22 pm on Feb 12, 2017 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Feb 8, 2017
posts:53
votes: 1


@Robert Charlton

No. I am purposefully saying there is only one variant of a page at any one time. Then google does its dance, follow by one variant disappearing and the other variant appearing in search results for a while until the next dance.
11:23 pm on Feb 12, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 7, 2006
posts: 955
votes: 61


Since google doesn't know this subdirectory, it will never ask for it.


So what effect does RewriteCond %{REQUEST_URI} !^/.well-known have on Google's request for https://www.example.com/anypage.htm?
11:48 pm on Feb 12, 2017 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Feb 8, 2017
posts:53
votes: 1


@Wilburforce

No effect. The rule only applies on a http request. All https requests are served as plain html/gif/jpg as they are.


<VirtualHost *:443>
ServerName example.com
ServerAlias www.example.com
ServerAlias mail.example.com
DocumentRoot /var/www/example.com

SSLEngine on
SSLCertificateFile \
/etc/example.com/getssl/example.com/example.com.crt
SSLCertificateKeyFile \
/etc/example.com/getssl/example.com/example.com.key
SSLCertificateChainFile \
/etc/example.com/getssl/example.com/chain.crt

Header always set Strict-Transport-Security \
"max-age=31536000; includeSubDomains"

# enforces page change check for every access
Header Set Cache-Control "no-cache"
</VirtualHost>
3:10 am on Feb 13, 2017 (gmt 0)

New User

joined:Dec 20, 2012
posts:10
votes: 1


Well, i have seen the same behavior @NoobOperator mentions, i did a http to https move on a site around 3-4 months ago and i still see some http:// and https:// URL dancing on SERPS. When i check the cache of the http:// result it's the new updated https page but in the results it shows as http even when 301 redirect is in place.

I suppose that Google needs to crawl the old page several times in order to finally catch the change, so it is not unreasonable to say that on a medium-big site it could take months.
8:53 am on Feb 13, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 7, 2006
posts: 955
votes: 61


@NoobOperator

Sorry, I had missed your use of VirtualHost, but this probably just adds more worms to the can. If you want the whole site to function as https you don't need two different hosted versions of it.

The fact - as you advised Robert Charlton - that Google keeps toggling between http and https versions of pages is a fairly clear indication that Google doesn't know which version to use.

It isn't possible to find the cause(s) of the problem without getting more forensic on what is going on under the Apache bonnet, but I am confident that having a single host using .htaccess to direct all requests to https would solve it.
5:44 pm on Feb 13, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13531
votes: 403


The complication is that, unlike most indexing queries, studying raw logs won't help. You can see if a Google request got a 301--but you don't know what they requested:
http://example.com/
https://example.com/
http://www.example.com/
et cetera.

Headers don't show protocol, do they? (I don't know, because I haven't any https content. I know that headers do show the exact hostname.)

The point of all this is that there are two aspects* to Google indexing. One is what they get when crawling; the other is what they do with the information. You want to make sure they're crawling where you want them to crawl. It wouldn't hurt to double-check your html and make sure all your internal links point only to https (most easily done by using only site-absolute links beginning in / so once you're in https you stay there).


* I almost typed “two prongs” but that leaves an undesirable mental picture.
9:28 pm on Feb 13, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7742
votes: 262


...studying raw logs won't help. You can see if a Google request got a 301--but you don't know what they requested...
@lucy24 - depends on the server config, how the admin set up the accounts & how they set up the access log reports.

Example: a well known "dreamy" shared hosting company gives us both http & https raw logs. Comparing the two would indicate which protocol (http or https) Googlebot requested.
11:40 pm on Feb 13, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 7, 2006
posts: 955
votes: 61


shared hosting company gives us both http & https raw logs


For me that is a prerequisite for my choice of hosting company (and there are many available that offer it). The one I currently use gives separate access and error logs for both http and https (the error logs list server-errors, not http 4xx responses).

These make it easy to check that redirects are working as they should, as a simple text search should reveal no 200 responses in the http logs, or unintended 301s in the https logs.

Without them, it would be much harder.
This 59 message thread spans 2 pages: 59
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members