Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Help with https urls indexed, vs http

         

pmdono

5:46 pm on Dec 28, 2006 (gmt 0)

10+ Year Member



What would cause Google crawlers to index our site under an [www....] url verses [www....] Google has crawled and provided a PR for pages that start with https pages (only) rather than [#*$!xx.com....] Our home page for the regular (http) url has a PR of 4. I believe the pr does not mean much anymore but it still is an indication of being crawled and recognized?

The first item under 'Indexed pages in your site' list in Google is the following

Title of the Website
This is the desription of the site and what it is all about, etc, etc.
[sitename.com...] - 43k - Cached - Similar pages

Note the https:// instead of [,...] previously it was the http:// in my old provider world.

It has been 6 months since I switched providers and none of my inner pages pages are being re-indexed by Google. The site is over 3 years old and has never played in bad neighborhoods. One interesting note is when I add an 's' to the url of my product pages it shows a pr, but the http version on that page does not.

This does not make any sense, it would seem that somewhere Google has been told that https:// is the main the url to follow.

There are over 1400 404 not found pages on my site, would that have any bearing? Most of these appear to be from my old provider world.

randle

7:18 pm on Dec 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do you have any pages on your site that are https?

If so, go to that page and then, if there is some sort of main navigation bar, click "home", or any other page for that matter. What do you see?

Does it bring you to an https page? Often when people set up an https page, they don't realize they are also creating a situation where https is also being added to the front of all the other file names, and creating two versions of your site, one in http and the other https.

If google is indexing these https pages, they must exist.

Might not be the problem but worth a look.

tedster

7:29 pm on Dec 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If the https protocol urls get a page from your server with a 200 OK http header - then on your server is where the repairs need to start. There are several approaches, including this from Google:

Each port must have its own robots.txt file. In particular, if you serve
content via both http and https, you'll need a separate robots.txt file for each
of these protocols. For example, to allow Googlebot to index all http pages
but no https pages, you'd use the robots.txt files below.

For your http protocol (http://yourserver.com/robots.txt):

User-agent: *
Allow: /

For the https protocol (https://yourserver.com/robots.txt):

User-agent: *
Disallow: /

[url=http://www.google.com/support/webmasters/bin/answer.py?answer=35302&query=https&topic=&type=]Webmaster Help Center

One other very sound approach (I would call this the best practice) is only to set the secure cert for a dedicated subdomain such as secure.example.com. Then there is no chance for an https confusion with your regular urls.

Once your server stops telling googlebot that a page does exist on a given url, or if it tells googlebot not to index the url through some mechanism such as robots.txt or the robots meta tag, then the repair is under way.