Forum Moderators: goodroi
Question: How best to stop crawlers from following Https: and getting dupe content?
Currently we are on IIS 5 and Windows 2000 and moving to Apache but wanted to see what everyone thought the best solution was for each? Is it just a mod rewrite that works best for someone or is there a better way possibly?
Thanks to all those that provide ideas for my wandering train of thoughts and ideas.
Question: How best to stop crawlers from following Https: and getting dupe content?
(This probably isn't the answer you're looking for) but most sites achieve this by simply not offering the same content through both http and https.
Do you have a need to offer the same content both ways? Or is your server simply configured to do so?
All ecommerce sites I've ever built only engage https after the cart page button is clicked to "process to checkout" and the cart page plus all subsequent pages are blocked in robots.txt so theoretically your scenario should never happen unless you have https links elsewhere.
The other way to 'fix' the problem is install a redirect in the catalog pages that checks to see if they are being crawled via https and performs a redirect to the exact same http page.