Forum Moderators: Robert Charlton & goodroi
By the way, using the removal tool in this sutuation has made a LOT of trouble for some sites, with both secure and regular urls disappearing.
Each port must have its own robots.txt file. In particular, if you serve content via both http and https, you'll need a separate robots.txt file for each of these protocols. For example, to allow Googlebot to index all http pages but no https pages, you'd use the robots.txt files below.For your http protocol (http://yourserver.com/robots.txt):
User-agent: *
Allow: /For the https protocol (https://yourserver.com/robots.txt):
User-agent: *
Disallow: /[google.com...]
Ideally (and this is true for some domains I looked at recently) I would hope that Google will make the https: duplicates a Supplemental Result -- and then they will just gently fade away and not show up in the search results.
I'd like to think that they can sort this out for the average site without needing lots of people to go to extraordinary lengths. I'd like to.
Please confirm that if I simply replace my existing robots text with the exact:
.............................
For your http protocol (http://www.mysite.co.uk/robots.txt):
User-agent: *
Allow: /
For the https protocol (https://www.mysite.co.uk/robots.txt):
User-agent: *
Disallow: /
....................................
I will be banning the robots from the https:// version of the site and the existing pages in the index will be reomved at next crawl?
The http and http files are the SAME in the same root folder. So how do I serve a different robots.txt to the htpps and https versions?
The certificate was installed on the whole site so that any page prefixed with https loaded as secure.
So how do you instal a different robots.txt file on each "port"?
The result is zero rank and zero hits for that page as all links point to the http version.
Normally Google would not choose https over http for the same page but they have recently and ts caused us major problems.
The answer IS NOT to host secure pages on the same domain as the non secure pages.
We have changed our structure now but are still trying to get rid of the https pages in the index so that the http can return!
Yes I have found out that you can use ASAP on IIS to do exactly the above:
Make the robots.txt file an ASP script. You can then set the script to look for https or http in the header and output the correct information.