Welcome to WebmasterWorld Guest from 54.205.209.95

Forum Moderators: goodroi

Message Too Old, No Replies

How to disallow https urls

How to disallow https urls

   
1:39 pm on Jul 16, 2008 (gmt 0)

5+ Year Member



I just realized that google has indexed https urls for the homepage of my site.

[example.com...]
[example.com...]
http://www.example.com
http://www.example.com/index.aspx

I am on a IIS server. Please give me a proper robots.txt code which I can add to the existing robots.txt file.

I need to block bots from indexing the https url of the homepage and again redirect all aliases to http://www.example.com

Thanks

12:27 pm on Jul 17, 2008 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Robots.txt might not be the solution here. I don't know your site configuration but in the past the people that have asked this question were mirroring their robots.txt file on https & http. In that situation blocking anything would block both. You also mention that you are interested in redirecting, you should look into using isapi rewrite. That could resolve both issues for you.

good luck

3:16 pm on Aug 5, 2008 (gmt 0)

5+ Year Member



the best thing is to use redirect on it. Use 301 redirect to move https to http.

bilalseo

7:40 am on Aug 8, 2008 (gmt 0)

5+ Year Member



what you need to do is offer up 2 different versions of robots.txt

create a second robots.txt file, robots_ssl.txt and add entries to it to block all content.

Then add the following linez to your .htaccess file ( in the root of webhosting).
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots.txt$ robots_ssl.txt

If you donít have an .htaccess file, create a new one Ė or ISAPI_Rewrite on Windows ( as .htaccess is for apache server you must be sure with the Server) be sure to put these 2 lines at the top of it:

Options +FollowSymLinks
RewriteEngine on

1:18 pm on Aug 9, 2008 (gmt 0)

5+ Year Member



yes on appache servers it is best to use .htaccess.. but you can not restrict all url(s), robots.txt file may be the best option to restrict particular urls. I'm totally agreed with raheel that to check server before implementation.

thanks,

bilal

8:39 pm on Aug 12, 2008 (gmt 0)

5+ Year Member



I don't see any problem with Googlebot accessing your site via https. You probably have external or internal links to your homepage with 'https' prefix. Having second robots_ssl.txt looks best option if you really need to prevent https... Don't forget Mediapartners-Google. And... don't think that 'access rights' (robots.txt) are really associated with http/https; Google might have some bugs so that it won't crawl http if you restricted https...