Forum Moderators: phranque

Message Too Old, No Replies

SSL (HTTPS) and indexing concerns

Ensuring proper indexing and preventing duplicate content while using HTTPS

         

ClickScientist

7:42 pm on Sep 4, 2008 (gmt 0)

10+ Year Member



Hello everyone,

I've lurked here for years, but finally found myself with a need for help that I couldn't get from searching the forums. If this has been posted before, please let me know, and I apologize.

I'm having some concerns about employing SSL on a client's site, and I'd like to ask for your thoughts.

The site is a membership site (some memberships are free, some are cost money). Members can earn activity points for performing certain actions (posting in the forums, making a blog entry, referring a friend, etc). These points can be exchanged for goods and services. Therefore, it's important to protect them. I'm planning to install an SSL cert to protect the actual payment page (using Paypal Website Payments Pro), but I'm thinking it would be smart to also use the cert on the rest of the community area of the site, where points are earned/spent, in order to protect against session-stealing attacks that could result in points losses for members.

The community area, with a few exceptions, will be open to the public for reading. I want the search engines to spider and index the pages in that area as they would normally do. My concern is that if I employ the cert, the engines may not index those pages, or if they do, may hit the site with a duplicate content penalty because the pages would also be reachable via http as well as https. My concern is that although I would link them all using https, there's no way to stop anyone from linking in via http and thus causing another copy of the page to be read by the engines.

I believe the indexing is no longer a problem for Google; can anyone confirm this? What about the other major engines?

Does anyone have any thoughts on my dupe content concern?

Please note that this is a site reconstruction project. The site has been around for years and has good rankings in the major SEs, but the community area I mentioned is under construction and not yet open to the public. I mention this because there are no 301 issues with this part of the site.

Any other suggestions or input would be much appreciated. Thank you for helping! I'll do the same for you, now that I'm no longer just a lurker!

Thanks,
Bill

pageoneresults

8:23 pm on Sep 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Welcome to WebmasterWorld ClickScientist!

Does anyone have any thoughts on my dupe content concern?

Yes, be sure to have the logic in place to force http vs https when appropriate. That will prevent the duplication. Google will index both fine. You have to make sure though that you cannot browse to an address with both http and https, that is when the challenges come in.

There are some other things you can do to prevent the indexing of https by using a separate robots.txt file that is served when https is requested. We typically keep all https out of indices.

ClickScientist

9:11 pm on Sep 4, 2008 (gmt 0)

10+ Year Member



Thanks for your reply and welcome.

I don't know how to force a separate robots.txt file when https is requested... could you elaborate on this?

Are you saying that this could be done with a combination of robots.txt and mod_rewrite?

Perhaps I can get around the problem by rewriting all links in pages that are served to logged-in members with https, while leaving them as http links for anyone else, in combination with the above methods.

tedster

9:51 pm on Sep 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's one of several threads in the Google Search forum that discusses the issue:
[webmasterworld.com...]

On Apache, if you create a separate robots.txt file for the secure connection, let's call it robots_ssl.txt, then

RewriteEngine on
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots\.txt$ robots_ssl.txt [L]

ClickScientist

5:05 am on Sep 5, 2008 (gmt 0)

10+ Year Member



Thanks for the link, tedster. I understand how to exclude all HTTPS pages now. But this isn't quite what I want to do. I want the community area's content to be indexable, yet I also want to protect it against session-stealing attacks. Imagine that myspace had a points system like I've described, and needed to ensure that points exchanges are treated as securely as they'd treat cash transactions. In that case, wouldn't they have to use HTTPS? And yet they want the engines to be able to crawl the content created by the members (blogs, comments, forums, etc), so they can't just use a robots.txt to deny reading anything using HTTPS.

My understanding about protection against session-stealing attacks is that every page of the site from the login page onwards needs to be secured via HTTPS. In other words, an attacker must be prevented from reading the session ID at any point in the (logged-in) session. So I don't think it would do to have only some of the pages use HTTPS, because a logged-in user's session ID would be vulnerable when they request one of the HTTP pages.

I realize I may be overthinking this :)

pageoneresults

4:38 pm on Sep 5, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I want the community area's content to be indexable, yet I also want to protect it against session-stealing attacks.

I think you're going to have a lot more to do than what is being discussed here. It appears you are on Apache? You might find better information available in the Apache Forum. You may even want to request that this be moved to that forum so the right people see it.

ClickScientist

7:04 am on Sep 6, 2008 (gmt 0)

10+ Year Member



Ok, that makes sense. Can I just request that move right here, or do I need to send a message to an admin?

ClickScientist

6:35 pm on Sep 7, 2008 (gmt 0)

10+ Year Member



Apache users, do you have any thoughts on this?

What if I just used HTTPS on every page of the site starting with the landing page? This would solve the security issue I'm concerned with. However, the site has good rankings on many competitive keyphrases but they're all HTTP. Could I just mod_rewrite the HTTP to HTTPS and 301 all the HTTP pages to their HTTPS versions? I wonder how Google would react to that.

jdMorgan

7:39 pm on Sep 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Changing your URLs is never good. And processing SSL for pages that don't really need it may bog down your server... Both of these problems come in varying degrees, ranging from minor problem to major disaster.

We have reports here from Webmasters reporting that URL changes caused minor traffic bumps for a few weeks, while others say they've spent 9 months relegated to page six of the results because of the sudden loss of "trust" in their URLs.

Server load is so difficult to predict that I don't try. It should either be modeled (if you're a huge corporation) or tested rather than trying to guess or predict.

Figure out what you *need* to do, before trying to move to the implementation phase -- Saves a lot of grief.

If the "community area" of the site can be reliably identified based solely on characteristics seen in its URLs, then it's a relatively simple matter to redirect all "community area" requests to https, and redirect all non-community-area requests back to http. There may be some exceptions or additions to this simple rule, and its best to identify all of them before starting to code this function.

Jim