Forum Moderators: Robert Charlton & goodroi
After reading from the sources I have found the solution to exclude the https:// is by adding robots.txt file but what if both http:// and https:// are on the same and robots.txt file common for both?
I assumed that by blocking the https:// pages folder will not index the pages but it did.
Now to deindex the https:// I have added the below mentioned lines in common robots.txt
Disallow:https://www.example.com/secure/a.asp
......
and also added <meta name="robots" content="noindex,nofollow"> in all the pages which points [....]
Can anyone here tell me whether the method I have implemented to exclude https:// is right?
[edited by: tedster at 3:27 pm (utc) on Aug. 2, 2007]
[edit reason] moved from another location [/edit]
The best practice is to install the secure certificate on a dedicated subdomain, such as secure.example.com This also avoids having all your regular urls resolve as https - historically that has caused duplicate url problems in Google
I wish I had know what you just said about using a subdomain but this was years ago they were set up and it was just to much work to correct the problem.
I use a no follow tag on every link I have to the secure page as well use a rewrite rule to send any seach engine request on https to disallow all.
My robots text file is actually a aspx file
but sends .txt file to the engine.
It has worked so far.
<%If Request.ServerVariables("HTTPS") = "off" Then 'if not secure%>User-agent: *
Disallow: /admin/
Disallow: /bin/
Disallow: /class/
Disallow: /contentTemplates/
Disallow: /db/
Disallow: /panels/
Disallow: /poll/
Disallow: /articles/files
Disallow: /articles/
<%
else
%>User-agent: *
Disallow: /
<%
end if
%>
<meta name="robots" content="noindex,nofollow"> I don't think this is correct just use a no follow tag take out the other stuff not needed and I don't think will be readable by a bot.
So now, if on https you ask for robots.txt there's an alias (I remember I saw that on the Apache config, but not sure) and it returns a different robots.txt than in http (when I do edit them, I have robots.txt and robots-secure.txt)
But no sub-domains for the secure server, that was enough.