homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

How to disallow https urls
How to disallow https urls

5+ Year Member

Msg#: 3699828 posted 1:39 pm on Jul 16, 2008 (gmt 0)

I just realized that google has indexed https urls for the homepage of my site.


I am on a IIS server. Please give me a proper robots.txt code which I can add to the existing robots.txt file.

I need to block bots from indexing the https url of the homepage and again redirect all aliases to http://www.example.com




WebmasterWorld Administrator goodroi us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

Msg#: 3699828 posted 12:27 pm on Jul 17, 2008 (gmt 0)

Robots.txt might not be the solution here. I don't know your site configuration but in the past the people that have asked this question were mirroring their robots.txt file on https & http. In that situation blocking anything would block both. You also mention that you are interested in redirecting, you should look into using isapi rewrite. That could resolve both issues for you.

good luck


5+ Year Member

Msg#: 3699828 posted 3:16 pm on Aug 5, 2008 (gmt 0)

the best thing is to use redirect on it. Use 301 redirect to move https to http.



5+ Year Member

Msg#: 3699828 posted 7:40 am on Aug 8, 2008 (gmt 0)

what you need to do is offer up 2 different versions of robots.txt

create a second robots.txt file, robots_ssl.txt and add entries to it to block all content.

Then add the following linez to your .htaccess file ( in the root of webhosting).
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots.txt$ robots_ssl.txt

If you donít have an .htaccess file, create a new one Ė or ISAPI_Rewrite on Windows ( as .htaccess is for apache server you must be sure with the Server) be sure to put these 2 lines at the top of it:

Options +FollowSymLinks
RewriteEngine on


5+ Year Member

Msg#: 3699828 posted 1:18 pm on Aug 9, 2008 (gmt 0)

yes on appache servers it is best to use .htaccess.. but you can not restrict all url(s), robots.txt file may be the best option to restrict particular urls. I'm totally agreed with raheel that to check server before implementation.




5+ Year Member

Msg#: 3699828 posted 8:39 pm on Aug 12, 2008 (gmt 0)

I don't see any problem with Googlebot accessing your site via https. You probably have external or internal links to your homepage with 'https' prefix. Having second robots_ssl.txt looks best option if you really need to prevent https... Don't forget Mediapartners-Google. And... don't think that 'access rights' (robots.txt) are really associated with http/https; Google might have some bugs so that it won't crawl http if you restricted https...

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved