| 1:32 pm on Nov 21, 2009 (gmt 0)|
Usually, http and https are set up as two different 'servers' or 'accounts' in your web hosting. If this is the case, then put a robots.txt file into your https/SSL server with
|# Disallow all robots from fetching all resources |
Another option is to detect unwanted requests for https and redirect them using a 301-Moved Permanently redirect to the canonical URL using the http protocol instead.
How you'd do that depends on your server type and version, your coding skills, and your preferences.
Why did Google index those URLs? Because you allowed it to do so, and someone, somewhere linked to the https 'version' of your site -- either accidentally or maliciously.
| 9:40 pm on Nov 21, 2009 (gmt 0)|
I force all my mixed SSL/non-SSL sites to non-SSL with the Canonical tag. Seems to work.
I also detect the user-agent and if it's not obviously a browser I kill SSL links. As far as I'm concerned my sites' SSL pages should not be visible until the ordering process begins, so they are of no concern to SEs.
| 4:14 pm on Nov 23, 2009 (gmt 0)|
Thanks Guys.. for a great and useful replies.. :) thanks a lot
| 12:43 pm on Nov 30, 2009 (gmt 0)|
I don't like the idea of forcing redirects from SSL to non-SSL even for spiders as they will index the page with http. Because secure scripts have to run in https and should have specific code to verify this.
In other words if I run a store, I don't want a customer to get in the create account page in http. So one way to get around it is to use the meta-tags for noindex/nofollow and rel=nofollow property to the links that point to various SSL pages I don't want SEs to index. Or some text form instead of links so spiders cannot follow.
| 6:20 pm on Nov 30, 2009 (gmt 0)|
Thanks Guys.. the problem is fixed by adding two different robots.txt file... I have used one rotbos.txt file with allow option and other was used to tell sppiders to don't index pages with https:// .. this has also resolved with one robots.txt file .. but due to bothering... I just used two different rotbots.txt file to avoid bothering.. :)
| 10:01 pm on Nov 30, 2009 (gmt 0)|
Enigma1: Note my second paragraph.
Basically, on my sites only sensitive pages (eg payment) are fed via SSL. Any other page (eg product, "about") is publicly visible and fed via non-SSL and available to visitors and SE bots.
Exceptions are T&C, AUP and suchlike which I feel are seldom of any business to SEs. Hence I remove links to those pages entirely and to (eg) carts and payment pages if I sniff an SE bot IP - in fact anything that does not seems to be a browser.
Customers are switched between SSL/non-SSL according to page sensitivity so should never see (eg) a standard product page via SSL and vice versa.
The canonical is to ensure that if some toolbar (whatever) feeds an SSL page to an SE and it tries to follow it with a bot then the page is correctly reassigned to non-SSL. If the bot tries to read a cart or payment page it gets fed a 405 or similar. Ditto for most contact forms.
You cannot bet on SEs getting it right. Some of them (alright, ALL of them) are sometimes very invasive and one has to resort to "firewall" and other methods of rebuttal.