Forum Moderators: phranque
I've lurked here for years, but finally found myself with a need for help that I couldn't get from searching the forums. If this has been posted before, please let me know, and I apologize.
I'm having some concerns about employing SSL on a client's site, and I'd like to ask for your thoughts.
The site is a membership site (some memberships are free, some are cost money). Members can earn activity points for performing certain actions (posting in the forums, making a blog entry, referring a friend, etc). These points can be exchanged for goods and services. Therefore, it's important to protect them. I'm planning to install an SSL cert to protect the actual payment page (using Paypal Website Payments Pro), but I'm thinking it would be smart to also use the cert on the rest of the community area of the site, where points are earned/spent, in order to protect against session-stealing attacks that could result in points losses for members.
The community area, with a few exceptions, will be open to the public for reading. I want the search engines to spider and index the pages in that area as they would normally do. My concern is that if I employ the cert, the engines may not index those pages, or if they do, may hit the site with a duplicate content penalty because the pages would also be reachable via http as well as https. My concern is that although I would link them all using https, there's no way to stop anyone from linking in via http and thus causing another copy of the page to be read by the engines.
I believe the indexing is no longer a problem for Google; can anyone confirm this? What about the other major engines?
Does anyone have any thoughts on my dupe content concern?
Please note that this is a site reconstruction project. The site has been around for years and has good rankings in the major SEs, but the community area I mentioned is under construction and not yet open to the public. I mention this because there are no 301 issues with this part of the site.
Any other suggestions or input would be much appreciated. Thank you for helping! I'll do the same for you, now that I'm no longer just a lurker!
Thanks,
Bill
Does anyone have any thoughts on my dupe content concern?
Yes, be sure to have the logic in place to force http vs https when appropriate. That will prevent the duplication. Google will index both fine. You have to make sure though that you cannot browse to an address with both http and https, that is when the challenges come in.
There are some other things you can do to prevent the indexing of https by using a separate robots.txt file that is served when https is requested. We typically keep all https out of indices.
I don't know how to force a separate robots.txt file when https is requested... could you elaborate on this?
Are you saying that this could be done with a combination of robots.txt and mod_rewrite?
Perhaps I can get around the problem by rewriting all links in pages that are served to logged-in members with https, while leaving them as http links for anyone else, in combination with the above methods.
On Apache, if you create a separate robots.txt file for the secure connection, let's call it robots_ssl.txt, then
RewriteEngine on
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots\.txt$ robots_ssl.txt [L]
My understanding about protection against session-stealing attacks is that every page of the site from the login page onwards needs to be secured via HTTPS. In other words, an attacker must be prevented from reading the session ID at any point in the (logged-in) session. So I don't think it would do to have only some of the pages use HTTPS, because a logged-in user's session ID would be vulnerable when they request one of the HTTP pages.
I realize I may be overthinking this :)
I want the community area's content to be indexable, yet I also want to protect it against session-stealing attacks.
I think you're going to have a lot more to do than what is being discussed here. It appears you are on Apache? You might find better information available in the Apache Forum. You may even want to request that this be moved to that forum so the right people see it.
What if I just used HTTPS on every page of the site starting with the landing page? This would solve the security issue I'm concerned with. However, the site has good rankings on many competitive keyphrases but they're all HTTP. Could I just mod_rewrite the HTTP to HTTPS and 301 all the HTTP pages to their HTTPS versions? I wonder how Google would react to that.
We have reports here from Webmasters reporting that URL changes caused minor traffic bumps for a few weeks, while others say they've spent 9 months relegated to page six of the results because of the sudden loss of "trust" in their URLs.
Server load is so difficult to predict that I don't try. It should either be modeled (if you're a huge corporation) or tested rather than trying to guess or predict.
Figure out what you *need* to do, before trying to move to the implementation phase -- Saves a lot of grief.
If the "community area" of the site can be reliably identified based solely on characteristics seen in its URLs, then it's a relatively simple matter to redirect all "community area" requests to https, and redirect all non-community-area requests back to http. There may be some exceptions or additions to this simple rule, and its best to identify all of them before starting to code this function.
Jim