Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Canonical Tag - Double Checking Before Implementation

         

Shark27

10:08 am on Apr 19, 2011 (gmt 0)

10+ Year Member



1. I've recently discovered that a site I'm working on has LOTS of duplicate content. The dupe content is showing up in the SERPs and outranking the original pages which have the most links thrown at them. It's obviously not optimal to have different URL structures for the same pages and having them fight each other in Google.

The owner switched which URL he wanted to have and had no idea about the harmful SEO effects. As a result, there are many indexed pages of duplicate content due to the URL structure switch.

There are about 4 different versions of each URL that have been indexed.

Example:

example.com/product/kites/273
example.com/products/kites/273
example.com/product/kite-gear/273
example.com/store/kites/273

All of these are 100% identical pages. Each version of these pages contains links to dozens of other inner, deeper page links, causing even more duplicate content. Most of them aren't indexed, however.

Example:

example.com/product/kites/273/kite-strings
example.com/products/kites/273/kite-strings
example.com/product/kite-gear/273/kite-strings
example.com/store/kites/273/kite-strings

To properly implement a canonical tag, do I need to go to each of the pages using the structure I want to be the original/SERP result and insert the tag?

2. I have also just realized that the owner of the site has several clones on different domains which display everything the original has. The only difference between ANY of the sites is the domain name. The link structure, the page URLs, etc. everything is the exact same. Any update on the original gets immediately updated by the others and sometimes indexed before the original.

Will 301 redirecting all of these (between 4-8) to the original at once cause any SEO problems?

Thanks all. I greatly appreciate the help. I have read a lot about canonical tags and such, but most of the information I have found is from 2009 when the solution first came to light. I'd rather discuss it with knowledgeable SEO's before tinkering around on my own. Thanks again!

-Shark

tedster

9:13 pm on Apr 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This sounds like exact reason that the canonical tag was created. Pick the preferred URL and use that for all the other versions. Google will even acknowledge cross-domain canonical link tags.

A 301 redirect on your server pointing from each secondary URL to your chosen primary URL is also the very best thing to do, and the canonical link would then be a back-up or failsafe action.

----

Cloned domains are still relatively common on the web - some registrars even promote this option - but they can create lots if issues for all search engines including Google. This is especially true as the clones accumulate backlinks. I would strongly advise using a domain level 301 as soon as you can so that NO urls on the alternate domain respond with a 200 OK status.

Given this complex of factors (and I suspect there may be more) it may be difficult to ensure every 301 redirect goes to the canonical address in just one step - however, that is the best situation. At the beginning, you can at least make sure that the non-canonical URLs that have significant backlinks redirect in one step. Then you could do clean up of all the other details.

g1smd

10:37 pm on Apr 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Before you make any changes at all, run Xenu LinkSleuth over the non-www and www versions of several of the sites and store the data and the final report for each for later use.

You will need a list of all URLs on the site so that you can pick some after making changes and check that the fixes you have installed are correct and work as expected.

Tedster has the "what" and the "why" nailed. Here are a few more words on the "how".

One point is that if you redirect all non-canonical URLs, then there is no need for the rel="canonical" tag at all. That tag goes only on content pages where the content is displayed at the "wrong" URL. When a redirect is installed, there are no longer any "wrong" content pages, all those requests are instead redirected to the correct URL.

The point about single step redirects is so important, that I am going to repeat it.

You have these URLs:
example.com/greenwidgets
example.com/green-widgets
www.example.com/greenwidgets
www.example.com/green-widgets
(canonical)

You need to install two redirects:
one from
(www.)example.com/greenwidgets
to
www.example.com/green-widgets
and another
from
example.com/green-widgets
to
www.example.com/green-widgets


If you get the rule order incorrect, you might have a redirect from
example.com/greenwidgets
to
www.example.com/greenwidgets
and then another to
www.example.com/green-widgets
and this is exactly what you must avoid.

Shark27

4:47 am on Apr 21, 2011 (gmt 0)

10+ Year Member



Thanks for the replies guys.

Tedster,

So I should 301 every duplicate page to the finalized version, as well as add the canonical tags to the duplicates just to be on the safe side?

Are there any drawbacks to these 301's and canonicals launching all at once on a site of this size? We'll be redirecting thousands and thousands of pages.

WRT to the cloned domains:

If I Google info:clonedsite.com it shows the URL of the original site, not the clone. Is this expected and still cause for a full 301 redirect to the original? I'd just 301 every page to the equivalent on the other site, which is only different in domain name, right?

Again, there won't be any negative SEO effects to sending all of these to the original, definitive site?

I agree with your advice on specifically tackling the duplicate pages which have links before doing a one size fits all approach to fixing all of them.

g1smd,

I downloaded LinkSleuth and am running reports on the www and non-www versions of the main site. It will take a while.

So a redirect takes out the need for canonical tags. Should I only do a 301 for these pages and not the other, or is doing both optimal in a "better to be safe than sorry" way?

As far as your steps, is this correct?

example.com/product/kites/273
example.com/products/kites/273
example.com/product/kite-gear/273

Each of these go to the canonical version: www.example.com/store/kites/273

And then the following do as well:

www.example.com/product/kites/273
www.example.com/products/kites/273
www.example.com/product/kite-gear/273

Is that right?

Thanks a LOT for the replies. I really appreciate it.

tedster

6:09 am on Apr 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, it is a case of "better to be safe than sorry." When we're dealing with technology, unintended errors have a nasty way of creeping in.

The 301 redirect is the primary action because this is completely under your control. Google decides how to handle the canonical link, and that's out of your control.

The redirects you listed look right. You can execute those types of canonical redirects with a few short lines in .htaccess if you are on an Apache server [webmasterworld.com], or with a few basic steps if you are on an IIS server [webmasterworld.com].

g1smd

7:09 am on Apr 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, the redirects are correct, but you need to be very careful to get the rules in the right order, so that for any non-canonical request there is only a single step redirect to the correct URL. No request should ever result in a multiple-step redirection chain.

Shark27

10:16 pm on Apr 21, 2011 (gmt 0)

10+ Year Member



Thanks guys.

g1,

So I just really need to make sure that both versions, www and no-www, go to the final canonical link and nothing in between, correct?

One of our larger keywords we had a good ranking for was swapped out with a clone page I have never seen before. We've never built links to it, it just updates with our on-site changes. Pretty bad. :-/

g1smd

10:25 pm on Apr 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes start at any "wrong" URL and receive a single redirect to the "right" URL.

Get the rule ordering wrong and you'll be bounced to the right URL in two or three steps or more via an unwanted redirection chain.

Shark27

10:38 pm on Apr 21, 2011 (gmt 0)

10+ Year Member



Okay, thanks. Right now non-www.domain.com doesn't redirect to www version. I specified www version in Webmaster Tools. Should this be my first order of business?

Shark27

10:57 pm on Apr 21, 2011 (gmt 0)

10+ Year Member



We now have the cloned versions going straight to the exact page on the canonical domain. Now we're switching the URL variations on the canonical domain to go to the finalized canonical version. It should be done soon.

Shark27

2:10 am on Apr 22, 2011 (gmt 0)

10+ Year Member



All of the previous versions now 301 directly to the canonical URL. Now we'll just wait for Google to act in accordance and update these new links in the SERPs.

tedster

2:39 am on Apr 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It sounds like you've got it. I'm assuming that Google was mostly sending traffic to the "with-www" version all along, with maybe an occasional "no-www" once in a while - is that correct?

g1smd

2:39 am on Apr 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Pick a URL that has multiple problems, fire up the Live HTTP Headers extension for Firefox, and make a request to your server. Examine the Headers report to see the entire HTTP transaction and check it very carefully for signs of any multiple step redirection.

Shark27

5:44 am on Apr 22, 2011 (gmt 0)

10+ Year Member



tedster,

Yeah, but the main problem was the alternate URL's which we now have redirecting to the final canonical structure. We also never had non-www redirecting to www which is something we should have taken care of long ago. That relatively competitive keyword we were #3 for that swapped with our clone version of the page sucked too, but we have that properly 301'd as well.

g1,

I'm downloading that extension now and I'll post back with the results. Thanks again for your help guys. I really do appreciate it.

Shark27

6:03 am on Apr 22, 2011 (gmt 0)

10+ Year Member



g1,

That update isn't allowed for FF 4.0. I tried uninstalling 4.0 and getting 3.6 again but it's not letting me. Is there a Chrome extension I can use?

Shark27

6:14 am on Apr 22, 2011 (gmt 0)

10+ Year Member



Sorry for the multiple replies. I can't seen an "Edit Post" button on my post. I downloaded an older FF that supports the plugin. I ran the plugin while I went from the old URL to the canonical version it automatically redirected to.

I don't know exactly how to read this, but it doesn't look like it goes to any other pages on our domain other than the original duplicate and to the canonical final version.

It does show something about classic-web.archive.org that I'm not sure about. There are only 2 URLs that belong to our domain, however, and they're the original duplicate and the final canonical that it lands and stays on.

What else should I be looking for?

Shark27

6:29 am on Apr 22, 2011 (gmt 0)

10+ Year Member



Crap. I did another one and it looks like it is going to an additional page and then another in between. I think it's because of the way the coder rewrote the redirects.

It's showing 2 301's in there and there's definitely one of the other URLs in there before it gets to the canonical. How do we solve this other than manually performing every redirect from straight URL to URL?

Right now I believe the coder made it so certain words in place automatically go to the proper ones, but it goes in a chain which is what we don't want.

g1smd

7:52 am on Apr 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You have to make sure that every rule includes the domain name in the redirect so that www is fixed at the same time the path is corrected if there is a non-www request for a URL that also has the wrong path.


The rule order is also important. Consider these two rules:

# Externally redirect to canonicalize the domain name if a non-canonical
# hostname is requested, in order to prevent duplicate-content problems
#
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
#
# Externally redirect only direct client requests for /index.php
# and /index.html and /index.htm to URL ending with slash.
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html?|php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(html?|php)$ http://www.example.com/$1 [R=301,L]


The above rules look good, but consider what happens when
example.com/index.html
is requested.

First ruleset:
example.com/index.html
-->
www.example.com/index.html


Second ruleset:
www.example.com/index.html
-->
www.example.com/


You have introduced an unwanted two-step redirection chain. PageRank does not flow through this type of redirect.


Now consider the same two rules but posted in a different order:

# Externally redirect only direct client requests for /index.php
# and /index.html and /index.htm to URL ending with slash.
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html?|php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(html?|php)$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect to canonicalize the domain name if a non-canonical
# hostname is requested, in order to prevent duplicate-content problems
#
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


First ruleset:
example.com/index.html
-->
www.example.com/
= Job Done!

Second ruleset: runs only for non-www requests NOT containing "index.html" or "index.php".

Now only one rule runs for any non-canonical request. If the non-canonical request is non-canonical in two different ways (e.g. "non-www "and "index.html") both problems are fixed at the SAME TIME.

The "edit post" button is just below your user name on every post.

Shark27

8:26 am on Apr 22, 2011 (gmt 0)

10+ Year Member



Okay, thanks. I'm going to forward that to the coder because I don't understand much of it. :-/

I do understand the point you're making about the order being very important. I'll have him look it over so we can figure this out. Thanks again g1.

Shark27

4:48 am on Apr 24, 2011 (gmt 0)

10+ Year Member



It looks like we have every URL variation going straight to the final canonical version right now. It only shows a single 301 redirect in the headers log through that FF plugin. After that is shows certain things the page is loading, Analytics, web-archive and some other Java stuff. The last line of code is a paragraph which starts off with 304 Not Modified and that's on the canonical version.

g1,

Does the aforementioned description sound right?

g1smd

7:28 am on Apr 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes it does, but test it with a URL that has as many "errors" as possible.

Canonical:
www.example.com/green-widgets/acme-widget
- the final URL.

Test URL:
example.com/greenwidget/acmewidget.html?randomjunk
- for example this URL has 6 errors.

If requesting that test URL "redirects in one" then you're done.

At this point you can unlock a powerful feature of XenuLinkSleuth. In a text editor, make a list of URLs that you want to test the responses for. List URLs with a variety of different problems and as many combinations of those problems as possible. Import the list into Xenu and run a report for that list. Check it carefully for any unexpected responses.

Shark27

9:51 am on Apr 24, 2011 (gmt 0)

10+ Year Member



g1,

I've tested it with every variation I can think of, even ones which I know aren't indexed or active, just to make sure. They're all redirecting in one single bound. I've went through all possible URL variations that I know for sure were active, indexed, and had links pointing at them and they're all showing a single 301 to the canonical, including www and non-www.

The only problem is that the coder hasn't activated the same rules for the deeper links, such as the folders within these pages, but I figure he'll have those done by Monday since it's the same type of formula.

I'll try and do that in LinkSleuth right now. The initial report I tried exporting caused an error and didn't save. :(

Thanks. I'll update when I'm done with Xenu.

Edit:

I ran the URL variations through Xenu. They all say OK. Only 3 of them list "redir", but that's under Title, which I believe is for the meta-title and thus isn't important. The meta-title of the canonical version is listed properly here so I think that's all that matters.

Is there anything specific I should be looking for other than an OK status? Thanks again.

g1smd

10:40 am on Apr 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's two places to look:
- the initial list of URLs that fly up the screen as Xenu does its work, and
- the separate HTML report page that is generated at the end and which opens in your browser; especially look at the "Redirects" section of this report.

Shark27

11:24 am on Apr 24, 2011 (gmt 0)

10+ Year Member



I saved both the initial Xenu links in the program itself as well as the final HTML report. What should I look for specifically in the latter report? The status is OK for all of the keyword variations, as well as the canonical.

The former shows that the main URL's are all redirecting once to the canonical. Again, the only problem is that the inner pages are a bit messed due to the current value's we're using for the redirects. It's altering the URL and redirecting to a new page that's being created dynamically. We should have this fixed by Monday and long before they're indexed by Google.

Does it look like this problem is fixed for the main URLs at least? I should just wash, rinse and repeat the double checking methods for the new inner folder links once the new redirection values go live for them?

Thanks again g1!

g1smd

11:32 am on Apr 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, and re-check all the URLs at all levels every time you change any of the rules at any level.

It's easy for a new rule to mess up some or all of the rules that were previously working.

Shark27

11:38 am on Apr 24, 2011 (gmt 0)

10+ Year Member



Yup. That's what's happening with the sub-folders right now. It's adding a keyword we needed to wind up somewhere else in the structure to an unnecessary plus, thus creating a new URL that we don't need. I'll be sure to double check everything the next time we update it.

Thanks a LOT for the help man. I really appreciate it. Both you and Ted have been huge helps to me!

Shark27

5:00 pm on Apr 25, 2011 (gmt 0)

10+ Year Member



So the new structure that was created accidentally wound up being indexed and is serving as the go to version in certain Google searches. We put a stop to the links redirecting to this, but I've been told by the coder that trying to redirect the new URL to the canonical would create too many problems. Fortunately it only got SOME pages in a single folder indexed and we ended the wrongful redirect before others got changed.

So since we can't 301 it properly, adding the canonical is likely our best bet, right? I meant to add it before, but I guess it didn't happen. Is this all we can do?

g1smd

6:43 pm on Apr 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The coding to add a canonical tag is likely more complex than adding mod_rewrite rules to .htaccess.

The chance of getting the canonical coding wrong is very much higher, with less chance to test it and confirm it as working - because you can never know exactly what type of "wrong" URL might be requested.

You cannot be 100% sure that the canonical tag will be added to such a page, nor that the value of the rel attribute will point to the right URL for that content.

Shark27

8:11 pm on Apr 25, 2011 (gmt 0)

10+ Year Member



I just got back and the coder has the canonicals working properly unless I'm mistaken. There is only one version of duplicate pages that can't be 301'd and all of them have the canonical tag in place to the proper URL.

Is there a way to verify and double check this?

g1smd

9:07 pm on Apr 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Request non-canonical URLs of various types. Use a selection with one, two, three, four, and more errors in the URL, and then select "View Source" for that page.
This 40 message thread spans 2 pages: 40