|De-indexing duplicate content related http or https|
| 7:14 pm on Apr 10, 2013 (gmt 0)|
Due to duplicate http and https content I have a redirection on every page telling if page should be http or https,
and I am seing some changes in indexing already.
However in a folder I have similar pages as in root, ie the content is equal but the menu, footer etc. are diferent. My robots.txt states that these pages should not be spidered.
When I am checking my indexed pages I see to many, these pages are indexed both as http and https, but stating in google search that there is no content due to robots.txt
In order to get the https pages deleted quicker should I let google spider them?
On the other hand I suppose google will sooner or later spider them and deindex the https pages, but I suppose it takes longer as robots.txt tells not to index these pages.
| 9:00 pm on Apr 10, 2013 (gmt 0)|
your robots.txt excludes googlebot from crawling but doesn't say anything about indexing.
you will need to allow crawling to solve the protocol canonicalization problem.
| 9:31 pm on Apr 10, 2013 (gmt 0)|
thanks, I will do so,
just hope google dont see the pages as duplicate content,
there arent 100% equal, but very similar.
| 9:34 pm on Apr 10, 2013 (gmt 0)|
|just hope google dont see the pages as duplicate content |
Even if they do all that should happen is they will pick one version of the page to show in the results. No penalty. No other huge devastating issues any more. Those have mostly been cleared up for quite some time now.
Obviously, it's almost always better to control what's considered the canonical version of a page (the one shown in the results) to make sure there's no confusion or glitches on their end and get them to show the one you want, but it's normally not a "huge big deal" any more if you have two essentially the same pages on the same site. They just do their best to pick the best one to show people.
| 10:01 pm on Apr 10, 2013 (gmt 0)|
Thanks, just given google permission.