Forum Moderators: goodroi
I just inherited this huge site... The previous SEO person has an entry in the Robot.txt to disallow /https* which I am 99.9999999% sure will not work. I am guessing this only disallows www.mydomain.com/https* and mydomain.com/https* type URLs. I don't think you can disallow protocols and domains there.
One would think this would be a common request for which there would be a simple solution.
Suggestions?
PS: I'm running on a Microsoft platform Windows Server 2003, IIS 6.0...
If yes, then your basic options are:
1. More the transactional sections to a subdomain, eg.
secure.example.com, then use a robots.txt under the subdomain which excludes the bots. 2. Cloak the robots.txt file to display a different file when the request is made via https (if you can rewrite the robots.txt file to a dynamic file this can work well).
3. In your application, add the appropriate code to include a meta robots noindex element for all pages served by https.
The last option is often the easiest to set up.
If the crawler is indexing other pages in your site using HTTPS as well you may need to permanently redirect just the crawler back to HTTP to stop this.