Forum Moderators: open
I have site that has some CGI scripts that use https. Yahoo has picked up a link from somewhere and thinks I have two sub domains http://www.example.com and [example.com....]
When I try to delete [example.com...] in SiteExplorer it then wants to delete all of my site, ie all of the pages that use http as well as the [example.com...] index.
As a result of this Yahoo seems to be applying a dupe penalty and my site, which used to be at #1 is now around #50 for the main target keyword term.
Is there a simple 301 redirect solution that will sort this out without harming my ranking elsewhere, I'm #2 on Google for this term, or am I better off allowing Yahoo to wither on the vine?
Many thanks for any pointers.
Cheers
Sid
I've had this exact same problem for over a year, I've also got a robot text file on my Https site that denies indexing yet Yahoo still indexes the https as a sub-domain.
As for a redirect i have a server wide 301 redirect from non www to www for years now and still Yahoo indexes some of my url's with the non www prefix. Yahoo's capability of displaying and following 301 redirects is very poor in my experience and they seem to display url's from links rather than what the server is telling the Bot to do.
try and locate the offending page that links in to the https site is the only solution that i can suggest.
I'd greatly appreciate any help if anyone else has some detailed information on how to remove these as Yahoo has a strong following still in some parts of the world.
Vimes.
Best practices to avoid this problem:
1) When linking from a secure page to a non-secure page, use a canonical URL, e.g. <a href="http://example.com/non-secure-page">, rather than page-relative or server-relative links (don't use <a href="non-secure-page">, <a href="../non-secure-page"> or <a href="/pages/non-secure-page"> )
2) When linking from a non-secure page to a secure page, again use a canonical URL., e.g. <a href="https://exmaple.com/non-secure-page">
3) Server-side, detect non-secure (i.e. HTTP) access requests to secure pages, and redirect them to change the protocol to HTTPS.
4) Server-side, detect secure (i.e. HTTPS) access requests to non-secure pages, and redirect them to change the protocol to HTTP.
5) If your secure and non-secure pages are actually stored on the server in different filespaces, then use the robots.txt files to tell search engines to stay away from https unless its appropriate.
Points 3 and 4 are easy if you tag your secure and non-secure page URLs in some way, for example, making them appear to be in separate directories under your site root. This makes determining which protocol should be used to access them very easy using Apache mod_rewrite or ISAPI Rewrite on IIS.
Jim
Vimes I've been through with a fine tooth comb and did find a couple of relative links that were in effect pointing to [....] I've cleaned these out.
JD I hoped that you might pick up on this thread. I think that, from what you have said, that my best solution might be to do the server side detection that you detail in 4. I have a snippet included in my Perl cgi files like so.
##################################
$inboundurl = $ENV{'SCRIPT_URI'};
$inboundurl =~ s/http:/https:/;
if ($ENV{'SCRIPT_URI'} =~ "http:") {
print "Location:$inboundurl \n\n";
}
##################################
This is easy on the Perl scripts but until recently would have been impossible (I think) on what were static html files in the rest of the site. I have recently started to change my static pages and have used the instructions that you gave elsewhere to have .html files parsed as php, mainly to do simple includes. I've not changed all of my pages over yet but will be doing so in the next couple of weeks. I am therefore now in a position to include some code to detect the request protocol in my pages.
I hope that you don't mind me asking a couple of cheeky questions. Please feel free to tell me to find out for myself, but if you have answers to hand they would be very much appreciated.
Can you suggest some PHP to do the opposite of what I am doing above?
Do you think that Yahoo might misinterpret this as cloaking?
Many thanks
Sid
PS I've pointed out to Yahoo that http and https are just different protocols. They suggested that this was a bug in SiteExplorer.
[webmasterworld.com...]
RewriteCond %HTTPS ^on$
RewriteRule /robots.txt /robots.https.txt [I,O,L] The robots.https.txt file...
User-agent: *
Disallow: / All of the SEs have had challenges in this area. And so have Webmasters. Its an ongoing challenge as many are just becoming aware of these types of issues. I remember doing site: searches in Google and finding home pages indexed under the https protocol.
Many thanks to all that contributed to this thread.
I've been away for a couple of days but have now implemented the second robots.txt file and .htaccess solution as suggested. This seems like a very simple and elegant solution. Paranoia makes me hope that it doesn't have any unexpected results.
Thanks again
Sid
PS I'll report back in a couple of weeks.
Hopefully the penny will fully drop with Yahoo in due course.
Cheers
Phil