Forum Moderators: goodroi

Message Too Old, No Replies

Http: vs Https:

What are the best ways to stop crawlers?

         

thaedge

4:12 pm on Feb 16, 2006 (gmt 0)

10+ Year Member



I have been digging around and have seen some answers but nothing that convinced me of what was the best way to approach this.

Question: How best to stop crawlers from following Https: and getting dupe content?

Currently we are on IIS 5 and Windows 2000 and moving to Apache but wanted to see what everyone thought the best solution was for each? Is it just a mod rewrite that works best for someone or is there a better way possibly?

Thanks to all those that provide ideas for my wandering train of thoughts and ideas.

webdoctor

6:37 am on Feb 18, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Question: How best to stop crawlers from following Https: and getting dupe content?

(This probably isn't the answer you're looking for) but most sites achieve this by simply not offering the same content through both http and https.

Do you have a need to offer the same content both ways? Or is your server simply configured to do so?

incrediBILL

7:54 am on Feb 18, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My question would be how are they getting to your https server in the first place?

All ecommerce sites I've ever built only engage https after the cart page button is clicked to "process to checkout" and the cart page plus all subsequent pages are blocked in robots.txt so theoretically your scenario should never happen unless you have https links elsewhere.

The other way to 'fix' the problem is install a redirect in the catalog pages that checks to see if they are being crawled via https and performs a redirect to the exact same http page.