Forum Moderators: Robert Charlton & goodroi
A year or two ago one site I work with had problems with https duplicates getting indexed. The origin of the problem was that legitimate https pages in the shopping cart were using the same templates as the rest of the site, which mostly used relative URLs for navigation.
The relative URLs meant that https pages were effectively linking to other pages as https too, so they'd get spidered as https.
When a page whose URL has unintentionally become "https-ified" is being spidered, any of its links which were relative URLs would become https too. That's how the cancer spreads and duplicate problems grow....
the other is configuring your server to respond with a 200 OK for non-canonical requests instead of 301 redirecting to the canonical url.Could you please tell me why did you say so? Or can I take it for only https:// non-canonical requests where the server returns 200 server response?
I'm guessing that you've got a shopping cart on your site, which is going to involve pages with https protocol somewhere
As phranque suggests, 301 redirecting all requests to the proper canonical form of your urls is the proper way to handle the situation
Will not that overwrite the shopping cart secure URLs with https:// to http:// as well?
Will not that overwrite the shopping cart secure URLs with https:// to http:// as well?
...have a list of URLs that should be served as https, and if URL requested is not in this list, to do a 301 redirect to http version, and vice versa (however, be careful to link internally to a correct version of the URL).
No... as aakk9999 suggests, you've got to decide which pages should be https (the specific secure shopping cart pages) and which should be http (most of the rest) of your site, and make a list of which should be which
There is a third way to address it, and this is to have a list of URLs that should be served as https, and if URL requested is not in this list, to do a 301 redirect to http version, and vice versa (however, be careful to link internally to a correct version of the URL)
But how do I redirect [example.com...] to http://www.example.com/page1.aspx
By the way, is that normal to configure server to redirect a request made for [example.com...] to http://www.example.com? Is that set by default?
In my case the HTTPS pages (all within a folder) are blocked via robots.txt and meta robots as well.How would you normally block all the https: requests through Robots.txt? Is there a specific syntax for it?
it wouldn't be normal if www.example.com was intended to be secure contenthaha yeah, it does never make sense doesn't it? I thought there might be some shopping cart type of pages or any pages that require secure log in in the website I am talking about. But there isn't any. So I can block all the https:// requests right? Can you please explain me how do I start it over?
you would ideally design your url structure so that you can easily distinguish secure and non-secure content and then use mod_rewrite techniques (for apache) or various techniques as your environment requires to make sure all non-canonical request are redirected to the canonical url.100+
there are several recent discussions about robots.txt-excluded urls which appear in the index.
How would you normally block all the https: requests through Robots.txt? Is there a specific syntax for it?
User-agent: *
Disallow: /
Can you please explain me how do I start it over?
So, it's just not the snippet where Google used to show the URL only version of a blocked content? Is is showing the complete page now?
A description for this result is not available because of this site's robots.txt - learn more [support.google.com].
I also got to know from the same forum that if we that page in the sitemap it will get crawled and indexed no matter if we block it in robots.txt or not. Is that also true?