homepage Welcome to WebmasterWorld Guest from 54.196.197.153
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Cross-site canonical meta tag questions
DannyS1951




msg:4539415
 2:34 am on Jan 26, 2013 (gmt 0)

Two questions on the use of the canonical tag in the page head crossing multiple URLs and sites. Here's the problem. We lost badly on a Google update at the end of September last year on an old established website, a dozen years plus. I'm tending towards Google perceiving a duplicate content problem.

First question: It's a shopping cart site on shared hosting with a www oursitename dot com URL. We also use the shared ssl certificate which has a https oursitename dot hostingname dot com URL. This creates an alias URL duplicate of our site in both secure and non-secure URLs. So is it OK to have a canonical tag in the page head pointing from the alias url towards the equivalent page on the www oursitename dot com url?

Second question: We have an even older website which has country specific url, i.e. dot co dot countrycode, aimed at our home non-US base. It contains pretty much a duplicate of what's on the dot com. The reason why they were separate websites is complicated but has nothing to do with seo, either black or white. For these dozen or so years Google seems to have been happy with this although we always push the dot com site to be the lead site with search engines by supplying sitemaps, feeds and such. Anyway, would it be acceptable practice, or acceptable to Google, to put a canonical link in the country specific website to the dot com one?

The shame is we lost 80% of our Google traffic last September, but prior to this Google seemed to be happy to use us almost as an authority site. Now we have lost all our long-tail searches to websites that really don't serve the niche we supply. These are the searches that make our money.

 

goodroi




msg:4539645
 2:49 pm on Jan 27, 2013 (gmt 0)

Before you introduce a new variable (canonical tag) to your website, I would suggest you take a moment to review your site and its seo plan.

You said:
push the dot com site to be the lead site with search engines by supplying sitemaps, feeds and such
Simply supplying sitemaps is not sufficient SEO. I didn't hear you mention anything about the content or links. Sitemaps and feeds can help you get indexed but have often have little impact on rankings. If the content and links are not being taken care of, then changing the canonical situation may not solve your ranking issues.

Are you publishing identical or near identical content on multiple urls?
What about your content makes it significantly more valuable then all other competitors?
Are you developing backlinks for all sites or just one?
Are your different sites interlinked?

DannyS1951




msg:4539650
 3:54 pm on Jan 27, 2013 (gmt 0)

Thanks Goodroi

We have always built the links and such on the dot com site. Until last September it was usually in the top 3 for the long tail searches which we depend on. To change over to the local site would be to waste too many years of work. The pages between the sites are pretty well duplicated. What I really need to know does putting in a canonical link from the local site to the dot com break any Google rules?

goodroi




msg:4539675
 8:22 pm on Jan 27, 2013 (gmt 0)

Ideally Google would like webmasters to not have any duplication. If you have it, then 301 redirect is preferred over canonical. If you can't redirect then apply canonical tags even if the content is hosted on different domain (assuming you are connecting identical or near identical content).

The general idea of cross site canonical tags is allowed by Google. Be careful that you apply it correctly or you'll end up with a big headache.

ps Hopefully I am wrong but I have a feeling that your issues are larger than just canonical

ZydoSEO




msg:4539687
 11:49 pm on Jan 27, 2013 (gmt 0)

First Question:

Regardless of whether you're sharing your hosts SSL certificate or have your own, you should use 301 redirects to enforce that your pages that do NOT require SSL (non-cart pages I'm assuming) can ONLY be accessed with HTTP and that your pages that do require SSL (your cart I'm assuming) can ONLY be accessed with HTTPS.

If you're hosted on Linux/Apache, you should be able to do this very easily w/ Mod_Rewrite, assuming you know exactly which URLs require HTTPS. Ideally the cart would be isolated to some folder and you can implement essentially two rules in your .htaccess.

Using Mod_Rewrite to look at each page request before the server renders the URL, you should be able to interrogate the %{HTTPS} variable to see if the URL is being requested with HTTP or with HTTPS as well as to interrogate the %{HTTPS_HOST} variable to see if the URL was requested for www dot oursitename dot comor oursitename dot hostname dot com. Using these two values you should be able to implement all of the necessary redirects to eliminate ALL non-canonical URLs resulting from having a secure and non-secure version of the site.

Second Question

As goodroi said, the canonical link element should only be used as a last resort if you're unable to solve duplicate content issues with 301 redirects. Pretty much every search engine on the planet understands 301 redirects and supports them as the preferred method of fixing canonicalization issues. Not all engines support the canonical link element(especially the cross domain canonical link element), and the ones that do likely don't interpret the canonical link element the same.

I'm not sure that having duplicate content on two domains that are targeting two different countries is much of an issue. For example, if you have a .com domain targeting Google dot com (US or global audience) and a separate ccTLD domain like a .co.uk domain targeting Google dot co dot uk, in Webmaster Tools you should be able to set the Geotargeting settings appropriately for each domain so that Google knows you're targeting two completely different audiences with those sites. Though the sites might render much of the same content (both are English), I'm not sure Google will treat this situation the same from a duplicate content perspective as they would two sites both trying to rank in the same view of Google's index (Google.co.uk for instance) with the same content.

A lot happened in September of 2012 from Google... two Panda updates (a refresh and an algo update) and the EMD update. Did you isolate your issues to a particular update? Did you lose 80% of your traffic on a specific date? Which site lost traffic? The .com or the ccTLD domain? Where there any major changes to the site that lost traffic in the 2 months leading up to the loss? Have you determined which keyword phrases and/or URLs have lost traffic? Was it site-wide/most all URLs that lost traffic? Was it most all keyword phrases previously sending traffic that were affected? How many thin pages does your site have? Is any of the content on your site duplicated on sites other than your two sites?

There are a LOT of things I would look at (as goodroi eluded to) before I would jump to the conclusion that it is the duplication of content between your two sites.

buckworks




msg:4539708
 2:52 am on Jan 28, 2013 (gmt 0)

A year or two ago one site I work with had problems with https duplicates getting indexed. The origin of the problem was that legitimate https pages in the shopping cart were using the same templates as the rest of the site, which mostly used relative URLs for navigation.

The relative URLs meant that https pages were effectively linking to other pages as https too, so they'd get spidered as https.

When a page whose URL has unintentionally become "https-ified" is being spidered, any of its links which were relative URLs would become https too. That's how the cancer spreads and duplicate problems grow.

We did three things to address the http/https duplication problem:

1) The web team did some rewriting or redirecting voodoo (sorry, don't know the details) to force inappropriate https URLs to the http version.

2) We used rel="canonical" throughout the site to specify the URL which should be considered canonical (usually the http version).

3) We tweaked the templates for legitimate https pages so their navigation used absolute http URLs rather than relative.

http://example.com/ rather than just "/"
http://example.com/category/ instead of /category/

Those steps achieved the result we needed. The unintended https pages disappeared from Google within a few weeks, and the problem has not recurred.

DannyS1951




msg:4539712
 3:03 am on Jan 28, 2013 (gmt 0)

Thanks Goodroi - with this problem a simple 301 would not be an easy answer.

g1smd




msg:4539713
 3:03 am on Jan 28, 2013 (gmt 0)

Yes, that sounds like a very robust set of changes. Fixing the problem needs several changes. There's not one simple fix.

The "redirect voodoo" :) has been discussed very many times in the WebmasterWorld Apache forum and there are several hundred threads on the subject with varying amounts of example code.

DannyS1951




msg:4539716
 3:27 am on Jan 28, 2013 (gmt 0)

Thanks ZydoSEO - I can see what you are saying regarding a mod-rewrite. Would another way be to put a no-index tag in the header whenever we go SSL or the alias url is in use? I think I could do that in PHP fairly easy.

I will hold off using a canonical aimed at the dot com site from the local site for now. The problem for us started with a panda update on the 27/28th September. The traffic loss was instantaneous. It's not an EMD URL so it shouldn't be connected to that. The site is a dozen years old and it was probably the first site serving our niche in English. We didn't do that much to improve SEO because it had almost been grandfathered in and we were lazy. In the old days the PR had got as high as 7, but now it's a pretty consistent 5.

In the couple of months before this I decided to expand the database with a bit more product information to make it more useful outside of the site itself and also so I could look forward to building in schema dot org microtags. I found an immediate use for this extra data by building country specific feeds into Google Books, although Google itself seemed no longer to be interested in this product of theirs.

Of course the law of unintended consequences reared its ugly head, well at least I suspect it did. Because the country specific feeds needed the currency to be correct for that country, we fed the URLs with currency code tacked on; i.e. &currency=UK and so on. After the drop in traffic I looked at Google Webmaster, not something I kept an eye on previously. It warned that it was seeing duplicate titles and such. I did program in a canonical link to get back to the 'pure' URL.

We are not recovering on our long tail searches where people are looking for a specific product. These are where we did very well in the past, usually being in the top 3 competing with Amazon for the specific titles. I know with our niche there is a lot of common (duplicate) descriptions out there. It goes with the product range pretty well, and everyone does it including Amazon. Usually we take what the publisher sends us as the description, but to write a new description on 1500 titles would be a big job, and to be honest would be a bit of a fake since it would basically be just rewording what we already have.

DannyS1951




msg:4539717
 3:31 am on Jan 28, 2013 (gmt 0)

Thanks Buckworks - I'm not that great a programmer, but I know I could build a canonical link into the alias URL pages in both its http and https guises. The problem is I would laso build in every session id and such (there's a lot of such as it's an old Oscommerce site) into the canonical URL as I'm not sure I could program well enough to strip them off.

TheMadScientist




msg:4539735
 5:07 am on Jan 28, 2013 (gmt 0)

The problem is I would laso build in every session id and such (there's a lot of such as it's an old Oscommerce site) into the canonical URL as I'm not sure I could program well enough to strip them off.

<?php
$canonical='';
if(strpos($_SERVER['REQUEST_URI'],'?')!==FALSE) {
$find_canonical=array();
$find_canonical=explode('?',$_SERVER['REQUEST_URI']);
$canonical='http://www.example.com'.$find_canonical[0];
}
else {
$canonical='http://www.example.com'.$_SERVER['REQUEST_URI'];
}
echo '<link rel="canonical" href="'.$canonical.'">';
?>

DannyS1951




msg:4539744
 6:15 am on Jan 28, 2013 (gmt 0)

Thanks MadScientist. I would have to strip off some bits but not others;-)

TheMadScientist




msg:4539745
 6:25 am on Jan 28, 2013 (gmt 0)

Ah, well it gets a bit more complicated then...

preg_match is probably the solution, but I really can't help without an example url ... Wish it was easier, but there's likely a distinct pattern you'll have to follow to get it right.

DannyS1951




msg:4539750
 6:47 am on Jan 28, 2013 (gmt 0)

I guess the easy one to strip off would be the session id. That's the worse one. It were preceded by a "?" and have no "&" following it I could take off the whole thing. If it were preceded by an "&" I could strip it off and the "&".

TheMadScientist




msg:4539755
 7:22 am on Jan 28, 2013 (gmt 0)

<?php
$canonical='';

# Remove the last & and anything after...
# Should be close to what I'm saying it does (lol)
# (It's my bed time)

if(strpos($_SERVER['REQUEST_URI'],'&')!==FALSE) {
$find_last_amp=strrpos($_SERVER['REQUEST_URI'],'&');
$find_canonical='';
$find_canonical=substr($_SERVER['REQUEST_URI'],0,$find_last_amp);
$canonical='http://www.example.com'.$find_canonical;
}

# If there's not an & keep the whole thing...
else {
$canonical='http://www.example.com'.$_SERVER['REQUEST_URI'];
}
echo '<link rel="canonical" href="'.$canonical.'">';
?>

ZydoSEO




msg:4539829
 1:06 pm on Jan 28, 2013 (gmt 0)

The Mod_Rewrite/.htaccess rules to fix this are relatively simple to code. If we had more information about things like whether the cart pages are all isolated to a single folder, we could provide examples. There are lots of ways to fix this even within Mod_Rewrite.

As Buckworks said, having URLs get indexed with both HTTPS and HTTP is typically the result of using relative URLs in the links on your pages. If every link on your site included fully qualified, absolute URLs then only individual HTTP pages mistakenly linked to from external sites with HTTPS (and visa versa) could get indexed with the wrong protocol. The 301 redirects I mentioned would then take care of forcing such improper links back to their proper protocol.

So to combat the issue, it is going to require a multi-pronged approach as g1smd said. There is no quick fix for this issue. I would stop looking for a quick fix, and do it the right way (though it might require extra work or learning something you might not be familiar with like Mod_Rewrite).

IMHO Mod_Rewrite is probably the single most powerful tool any SEO/webmaster can have in their tool belt. It is well worth the time investment required to learn at least the basics of this tool. I use it for all sorts of things. I can guarantee that for every hour you spend learning it, you will save 10x or more in the future.

DannyS1951




msg:4539843
 2:09 pm on Jan 28, 2013 (gmt 0)

Thanks ZydoSEO and MadScientist. I will mull over which way to approach this particular problem, keeping in mind that I don't know for sure it will help with the loss of our Google traffic.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved