Welcome to WebmasterWorld Guest from 54.146.201.80

Forum Moderators: goodroi

Message Too Old, No Replies

How to disallow https urls

How to disallow https urls

     
1:39 pm on Jul 16, 2008 (gmt 0)

Junior Member

5+ Year Member

joined:Apr 18, 2007
posts:133
votes: 0


I just realized that google has indexed https urls for the homepage of my site.

[example.com...]
[example.com...]
http://www.example.com
http://www.example.com/index.aspx

I am on a IIS server. Please give me a proper robots.txt code which I can add to the existing robots.txt file.

I need to block bots from indexing the https url of the homepage and again redirect all aliases to http://www.example.com

Thanks

12:27 pm on July 17, 2008 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3080
votes: 67


Robots.txt might not be the solution here. I don't know your site configuration but in the past the people that have asked this question were mirroring their robots.txt file on https & http. In that situation blocking anything would block both. You also mention that you are interested in redirecting, you should look into using isapi rewrite. That could resolve both issues for you.

good luck

3:16 pm on Aug 5, 2008 (gmt 0)

Full Member

5+ Year Member

joined:Sept 11, 2007
posts: 303
votes: 0


the best thing is to use redirect on it. Use 301 redirect to move https to http.

bilalseo

7:40 am on Aug 8, 2008 (gmt 0)

Junior Member

5+ Year Member

joined:Apr 24, 2008
posts:54
votes: 0


what you need to do is offer up 2 different versions of robots.txt

create a second robots.txt file, robots_ssl.txt and add entries to it to block all content.

Then add the following linez to your .htaccess file ( in the root of webhosting).
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots.txt$ robots_ssl.txt

If you donít have an .htaccess file, create a new one Ė or ISAPI_Rewrite on Windows ( as .htaccess is for apache server you must be sure with the Server) be sure to put these 2 lines at the top of it:

Options +FollowSymLinks
RewriteEngine on

1:18 pm on Aug 9, 2008 (gmt 0)

Full Member

5+ Year Member

joined:Sept 11, 2007
posts:303
votes: 0


yes on appache servers it is best to use .htaccess.. but you can not restrict all url(s), robots.txt file may be the best option to restrict particular urls. I'm totally agreed with raheel that to check server before implementation.

thanks,

bilal

8:39 pm on Aug 12, 2008 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 11, 2008
posts:50
votes: 0


I don't see any problem with Googlebot accessing your site via https. You probably have external or internal links to your homepage with 'https' prefix. Having second robots_ssl.txt looks best option if you really need to prevent https... Don't forget Mediapartners-Google. And... don't think that 'access rights' (robots.txt) are really associated with http/https; Google might have some bugs so that it won't crawl http if you restricted https...