Forum Moderators: Robert Charlton & goodroi
The first item under 'Indexed pages in your site' list in Google is the following
Title of the Website
This is the desription of the site and what it is all about, etc, etc.
[sitename.com...] - 43k - Cached - Similar pages
Note the https:// instead of [,...] previously it was the http:// in my old provider world.
It has been 6 months since I switched providers and none of my inner pages pages are being re-indexed by Google. The site is over 3 years old and has never played in bad neighborhoods. One interesting note is when I add an 's' to the url of my product pages it shows a pr, but the http version on that page does not.
This does not make any sense, it would seem that somewhere Google has been told that https:// is the main the url to follow.
There are over 1400 404 not found pages on my site, would that have any bearing? Most of these appear to be from my old provider world.
If so, go to that page and then, if there is some sort of main navigation bar, click "home", or any other page for that matter. What do you see?
Does it bring you to an https page? Often when people set up an https page, they don't realize they are also creating a situation where https is also being added to the front of all the other file names, and creating two versions of your site, one in http and the other https.
If google is indexing these https pages, they must exist.
Might not be the problem but worth a look.
Each port must have its own robots.txt file. In particular, if you serve
content via both http and https, you'll need a separate robots.txt file for each
of these protocols. For example, to allow Googlebot to index all http pages
but no https pages, you'd use the robots.txt files below.For your http protocol (http://yourserver.com/robots.txt):
User-agent: *
Allow: /For the https protocol (https://yourserver.com/robots.txt):
User-agent: *
Disallow: /[url=http://www.google.com/support/webmasters/bin/answer.py?answer=35302&query=https&topic=&type=]Webmaster Help Center
One other very sound approach (I would call this the best practice) is only to set the secure cert for a dedicated subdomain such as secure.example.com. Then there is no chance for an https confusion with your regular urls.
Once your server stops telling googlebot that a page does exist on a given url, or if it tells googlebot not to index the url through some mechanism such as robots.txt or the robots meta tag, then the repair is under way.