Forum Moderators: buckworks
Anyone know if MSN and Yahoo will spider a secure page (https://)
Yes they will.
The question now becomes why? There are additional processes involved with https vs http. There are many other challenges in promoting https links, they are not natural for one.
We typically recommend that you serve a robots.txt file for all https that Disallows the entire site. You really don't want anything that requires secure protocol getting indexed. At least that is how we look at it. Login pages serve no purpose in the equation. And neither do most other pages that qualify for https.
Talk to us. Why are you redirecting to https?
We typically recommend that you serve a robots.txt file for all https that Disallows the entire site.
How would a bot know to retrieve the https version of robots.txt?
Since the contents of the robots.txt file makes no distinction between the protocol used, doesn't it seem rather dangerous to offer a version of robots.txt that makes your entire site off limits?
These are the pages to which we're re-directing to (from our homepage) using the 301 mechanism.
However, timing wise the canonical tag was agreed on shortly after this, this was the ultimate solution to the problem since the tag allows you to use an absolute URL, we loaded that in and now ALL the products are getting moved over to [....] My suggestion, use the new canonical tag as google says it will work as a redirect, plus protect you from duplicate content issues.
How would a bot know to retrieve the https version of robots.txt?
Through a little bit of scripting that looks like this for us folks on Windows...
RewriteCond %HTTPS ^on$
RewriteRule /robots.txt /robots.https.txt [I,O,L] That's about as far as I can take it. I only know enough to cut and paste and double check to be sure everything is doing what it should be doing. And it seems to be working just fine. ;)