robots.txt and https

Forum Moderators: goodroi

Message Too Old, No Replies

robots.txt and https

AzDude

3:43 am on May 15, 2006 (gmt 0)

I have an order page on my site that uses https:// The rest of my pages are http://
I hardcoded the link to my order page [orderpage.com...] so when the users click on the link they go to the secure page.

Google is following those links and then going back to my other pages so when I do site:www.mypage.com i am getting duplicate pages , one http and one https version.

can i fix this using robots.txt and tell google to just not go to that order page, or should i just hardcode the http:// into all the links on the order page ot the other pages?

if i use robots.txt to dissallow is this the correct syntax?
User-agent: *
Disallow: /orderpage.com

thanks in advance for your help

Dijkgraaf

8:56 pm on May 18, 2006 (gmt 0)

Yes, the disallow for the order page is correct.
Except it is now too late, as the bots allready know about the https:// urls.

What you may have to do is to have a dynamic robots.txt (ie. get your web server to parse in your scripting language or a URL rewrite) and server when the requesing the robots.txt with https:// to have it say
disallow: /

The other option is to reconfigure your webserver so that the https:// site points to a seperate folder and to put your orderpage in there with its own robots.txt.

But yes, do change the links in your order page as well to have the fully qualified URL with [....]