Forum Moderators: Robert Charlton & goodroi
I have done the following:
What else can be done to remove these pages from the cache to avoid duplicate content penalties.
Thanks,
AjiNIMC
There's section on Duplicate Content in Hot Topics, which pinned to the top of the Google Search Forum home page. Take a look particularly at this thread....
HTTPS versus HTTP [webmasterworld.com] - one more duplicate area
mod_rewrite (on Apache)
I will check the topic. I can't do any canonicalization as I want both http and https to appear with the same content for a better user experience.
Let me check the topic before adding more to it.
I can't do any canonicalization as I want both http and https to appear with the same content for a better user experience.
If you insist on this, you won't have a better search engine experience. Same content on more than one url is essentially the definition of duplicate content. I'd read through all those Hot Topics articles on dupe content carefully.
...what is a better way of removing it from cache and listing?
Again... mod_rewrite.
If you insist on this, you won't have a better search engine experience. Same content on more than one url is essentially the definition of duplicate content. I'd read through all those Hot Topics articles on dupe content carefully.
Again... mod_rewrite.
Thanks for the replies.
AjiNIMC
URL removal only works with URLs that are otherwise served ( or not served ) in order NOT to be indexed ( eg. 404/410, ROBOTS NONE metas and such ), and is only to speed up the process, but won't initiate it. They list robots.txt as one of the 'proper' signals, but... experience shows that it's much slower than in-page directives or NOT FOUND status codes.
if you don't like mod_rewrite, here's an html / programming solution...
1.: remove the robots.txt disallows so that Google takes notice of the changes...
2.: and add NOINDEX, NOARCHIVE or any other synonyms ( dynamically )
3.: when Google crawls the pages they'll drop out
4.: if cache is updated /or indexed dupes don't fall out ( they will tho ) THEN use the URL removal tool
5.: to be on the safe side, and to save bandwidth, once they're out you could add the robots.txt disallows again
...
I sometimes wonder (if they are not doing already) why can't search engines understand the common mistakes a webmaster might do with dups content.
If you're serving up both versions of the content intentionally, how can they possibly know?
By adding the WMT preferences, Google has taken one step toward reading your mind... but you've still got to contend with site visitors who might want to link to you. If multiple canonical versions of your site are available, visitors who like your site are likely to link to the version of your site that they happen to see. This helps perpetuate the error. If you've successfully blocked search engine bots from the https version but visitors still see it, you risk splitting your inbound link votes.
Going back to your previous comment about user experience...
I can't do any canonicalization as I want both http and https to appear with the same content for a better user experience.
It seems what you really want is for the user to find your site whether they type in https or http. This is, in fact, what permanent redirection with mod_rewrite would accomplish. What it would change is that the incorrect version doesn't appear in the address window. With proper setup, pages that are supposed to be https would continue to be https, and pages that are supposed to be http would be displayed that way.
With a proper setup of DNS and mod_rewrite, you can even correct typos in the number of w's in "www," so "ww" or "wwww" eg, would be rewritten to "www". Etc....
If you're serving up both versions of the content intentionally, how can they possibly know?
It seems what you really want is for the user to find your site whether they type in https or http
AjiNIMC
match the content, if it is serving same content then consider it to be same without imposing penalties.
You're not getting a penalty when PR is split between two URLs with the same content. You're just getting back a true reflection of what your server is doing. Any search engine needs to rank by the url, and not just by the "content". This is the technical nature of the web. Clarifying the technical side of your website is important if you want to be "heard" clearly and unambiguously.
Its not about it but about the avoid IE warning when you shift from http to https and vice versa.
I assume you're talking about the warning message that says that the name on the security certificate is invalid or does not match the name on the site, and asks you to click yes if you want to proceed.
This warning is most likely an indicator that you have other canonical problems on your site as well. Eg, if you had a www canonical issue and had bought your certificate for [example.com...] the certificate would not be valid on [example.com...] and you would get the message for anyone accessing your secure pages without the www --
Here's a thread that discusses that in more detail:
SSL Certificate problems
SSL not showing correctly
[webmasterworld.com...]
This is another reason to clean up your canonical issues.
If its not about the typing of https:// or http:// and all you want to do to is speed the removal of these https:// pages from the indices, its a little bit more work of course but you could make the secure site a sub domain [secure.example.com...] and place a new disallow all robots.txt file on that, and make the [example.com...] or [example.com...] return a 404/410 page. This will be a lot quicker in the removal of those files.
But it does sound as if you have other canonical issues that should also be fixed while you are doing this.
Vimes.
"warn if changing between secure and not secure mode"
Then when you click on https link from http pages it say, "You are about to view pages over a secure connection" (Click here to continue or something)
Then when you click on http from https pages it says
"You are about to leave a secure Internet connection. It will be possible for others to view information you send"
For a customer who has confusion in understanding what a browser is, it is a confusing stuff. As a marketer I will like to avoid such things as much as possible. Since I am dealing with a lot of $ on the site, it makes it even more painful to hear when customers tell you that your website is giving errors (which are basically these warnings).
AjiNIMC
If you go to IE >> Tools >> Option >> Advance and then check this option under security
AjiNIMC - You can't assume that any of your customers will go into IE and check or uncheck anything. We're suggesting you fix it on your server.
I feel your pain about how steep the learning curve on all this stuff is. You do need to do that canonicalization you're resisting. It's the only dependable way I know of to do it. I've also had no luck with the "https/ssl robots file" approach.
One thing I should add, btw... the images on your secure pages also need to be on a secure server. If your pages are on a secure server but their images aren't, that would also trigger the warning message.
Example people search for XYZ and lands on http pages where they are clicking on /signup/ on http itself, there is no warning sign.
People search for XYZ and lands on http pages where they are clicking on /signup/ on https then it prompts the above message which sometimes scares your visitor resulting in an abort.
I am in some urgent meeting so not able to post in details, will do that later tonight in a detailed way.
Thanks,
AjiNIMC