homepage Welcome to WebmasterWorld Guest from 54.204.94.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
How to disallow https urls
How to disallow https urls
spiritualseo




msg:3699830
 1:39 pm on Jul 16, 2008 (gmt 0)

I just realized that google has indexed https urls for the homepage of my site.

https://www.example.com
https://www.example.com/index.aspx
http://www.example.com
http://www.example.com/index.aspx

I am on a IIS server. Please give me a proper robots.txt code which I can add to the existing robots.txt file.

I need to block bots from indexing the https url of the homepage and again redirect all aliases to http://www.example.com

Thanks

 

goodroi




msg:3700719
 12:27 pm on Jul 17, 2008 (gmt 0)

Robots.txt might not be the solution here. I don't know your site configuration but in the past the people that have asked this question were mirroring their robots.txt file on https & http. In that situation blocking anything would block both. You also mention that you are interested in redirecting, you should look into using isapi rewrite. That could resolve both issues for you.

good luck

bilalseo




msg:3715751
 3:16 pm on Aug 5, 2008 (gmt 0)

the best thing is to use redirect on it. Use 301 redirect to move https to http.

bilalseo

Raheel




msg:3718333
 7:40 am on Aug 8, 2008 (gmt 0)

what you need to do is offer up 2 different versions of robots.txt

create a second robots.txt file, robots_ssl.txt and add entries to it to block all content.

Then add the following linez to your .htaccess file ( in the root of webhosting).
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots.txt$ robots_ssl.txt

If you donít have an .htaccess file, create a new one Ė or ISAPI_Rewrite on Windows ( as .htaccess is for apache server you must be sure with the Server) be sure to put these 2 lines at the top of it:

Options +FollowSymLinks
RewriteEngine on

bilalseo




msg:3719203
 1:18 pm on Aug 9, 2008 (gmt 0)

yes on appache servers it is best to use .htaccess.. but you can not restrict all url(s), robots.txt file may be the best option to restrict particular urls. I'm totally agreed with raheel that to check server before implementation.

thanks,

bilal

Funtick




msg:3721442
 8:39 pm on Aug 12, 2008 (gmt 0)

I don't see any problem with Googlebot accessing your site via https. You probably have external or internal links to your homepage with 'https' prefix. Having second robots_ssl.txt looks best option if you really need to prevent https... Don't forget Mediapartners-Google. And... don't think that 'access rights' (robots.txt) are really associated with http/https; Google might have some bugs so that it won't crawl http if you restricted https...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved