Forum Moderators: Robert Charlton & goodroi
I have been unable to find ANY info about how search engines treat HTTPS versions of sites. The HTTP version is the "main" version of my site.
I have now put a robots "noindex" meta element on all HTTPS page. I fear that this will be misinterpreted and bad somehow for my SEO...
What do you say? What should I do? Google Sitemaps has no info on this and Google Help has no info on this at all.
I have asked tons of people but nobody has got a clue.
[edited by: tedster at 12:48 am (utc) on July 12, 2008]
Each port must have its own robots.txt file. In particular, if you serve
content via both http and https, you'll need a separate robots.txt file for each
of these protocols. For example, to allow Googlebot to index all http pages
but no https pages, you'd use the robots.txt files below.For your http protocol (http://yourserver.com/robots.txt):
User-agent: *
Allow: /For the https protocol (https://yourserver.com/robots.txt):
User-agent: *
Disallow: /
Technical details are available around the forums and can depend on what server you are using, usually [url=http://www.webmasterworld.com/forum92/]Apache [google.com] or Windows IIS [webmasterworld.com]. For instance, there's good information in this thread [webmasterworld.com] or you can find morre via Site Site Search [webmasterworld.com].
Here's another approach that also works: Serve secure versions of your pages only from a dedicated subdomain, such as secure.example.com. Then use robots.txt to disallow spidering of that subdomain.
"User-agent: *
Allow: /"
Are you sure that there is an "Allow" directive? You problably know best, but I was under the impressions that you had to do "Disallow: " to "allow any"...
The Allow extension
Googlebot recognises an extension to the robots.txt standard called Allow. This extension may not be recognized by all other search engine bots, so check with other search engines in which you are interested to find out. The Allow line works exactly like the Disallow line. Simply list a directory or page that you want to allow.[google.com...]