Search Engines Agree on "Canonical tag"

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Search Engines Agree on "Canonical tag"

youfoundjake

3:44 am on Feb 13, 2009 (gmt 0)

Not sure where to put this, but since google search is moderated, a mod will put where necessary.

Today it was annouced that the 3 big search engines have come up with a new tag to help with canoncial issues.
Announcements:
[googlewebmastercentral.blogspot.com ]
[ysearchblog.com ]
[blogs.msdn.com ]

Using the new canonical tag
Specify the canonical version using a tag in the head section of the page as follows:
<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish"/>
That’s it!
You can only use the tag on pages within a single site (subdomains and subfolders are fine).
You can use relative or absolute links, but the search engines recommend absolute links.
This tag will operate in a similar way to a 301 redirect for all URLs that display the page with this tag.
Links to all URLs will be consolidated to the one specified as canonical.
Search engines will consider this URL a “strong hint” as to the one to crawl and index.

< See also Canonical Tag Results: Share the stories - Positive / Negative / No Impact [webmasterworld.com] >

[edited by: tedster at 6:08 pm (utc) on April 2, 2009]

mcavic

5:04 pm on Feb 13, 2009 (gmt 0)

Do we all agree that fixing canonical issues "for real" is better than using this tag?

Yes and no. Personally, I've fixed my issues with a combination of 301's and robots.txt. But I'm all for giving people an easy way to eliminate dup content.

httpwebwitch

5:09 pm on Feb 13, 2009 (gmt 0)

@Tonearm, I agree with the facts though not necessarily with the sentiment :)

Just as there are many ways that canonicalization can be done poorly, there are as many ways in which this new tag can be used improperly. I think those that don't "get" canonicalization are going to be just as clueless when it comes to this new tag, so I don't foresee any major competitive disadvantages emerging.

This is more interesting than I thought

say I have a list of items, at items.php
I can sort those items, ala items.php?sortby=name
I can page those items too, ala items.php?sortby=name&page=2
I can filter those items, ala items.php?sortby=name&page=2&query=foo

Let's take the third example, items.php?sortby=name&page=2&query=foo

That's a unique representation of the data. Never mind that the items in the list are repeated in other views with different sorting and filtering - it's still unique and in my books, it's canonical.

But what happens when the URL (link, bookmark, what have you) includes extraneous QS vars?

items.php?sortby=name&page=2&query=foo&sessid=12341234&referer=mypage.htm&trackingCode=a1a1a1

Those other variables are important to the script; they probably do hidden things like analytics and affiliate attribution and whatnot. But they do not change the content of the page - that's the important thing. This shows that in a URL, some parts (variables) influence the content of the page, and others do not.

There are three kinds of variables that appear in a URL.
1) ones that affect the content of a page. like: page, query, sortby.
2) ones that do not affect the content of a page. like: sessid, referer, affid
3) ones that are just meaningless, added by the user, typos

How I've dealt with these situations is the "scrub and redirect" technique. You identify the variables that are known, but don't affect the content of the page (type 2). Put them in a SESSION. Discard any unknown variables (type 3). 301 redirect the user to the canonical page, with only the canonical variables present in the URL (type 1). The result is that the user always ends up on a canonical URL, the important extra bits are stored in the SESSION, and all the noise is eliminated.

This tag is a useful in just those situations.
I'm not going to stop my scrub-and-redirect techniques. It works extremely well. However now instead of just sending vague instructions to the client in the HTTP header, I can do both: send a 301 redirection AND send - in the document - a semantic message describing the canonicalization status.

So now your client will know semantically why they are getting a Location redirection command in the header. It's not just a "Moved Permanently", it's a "Canonicalization Correction". Semantically they are different reasons for redirecting, and in some situations, the difference makes a difference.

Since on every page load I'm already coming up with the canonical URL to which the user is (possibly) going to be redirected, I can very easily plug that URL into a new tag in the <head>.

Not all webmasters are running their servers like programmable page factories. Many sites are built with Dreamweaver and a "Publish" button, or Notepad and Filezilla. and not all webmasters have the programming chops or Execute permissions to do server-side shenanigans. For webmasters who do not have a canonicalization infrastructure managing their URLs, this tag is a fantastic utility.

nealrodriguez

5:43 pm on Feb 13, 2009 (gmt 0)

nice! now to get it embedded on over 20k pages of dupe content;

Import Export

5:59 pm on Feb 13, 2009 (gmt 0)

There may be some bigger behind the scenes issues that caused them to take this much time to implement. With that said, the amount of time that this has been going on has been far too long. Unbelievable.

*Agree with your thoughts Wheel*

compose

6:39 pm on Feb 13, 2009 (gmt 0)

I have a little question about this new tag.

Usually on e commerce sites few part of content is repeated with each product detail page like
"Size information", "Shipping details", "General FAQ's for product" etc (that is necessary too)

mostly these contents are represented using tabs (hidden divs)

So my question is, will such product pages come under duplicate content? (but except these hidden divs, which get visible on particular selection, every thing else is fresh for each product page i.e. product description) and if yes then

can this problem be shorted out using Canonical tag?

garlicjr

6:59 pm on Feb 13, 2009 (gmt 0)

I refuse to do anything specifically for the SE's benefit. I don't dance to their tune, they dance to mine.

I'm sorry, but if you're a serious SEO you're Google's bitch. White/black hat it doesn't matter.

A client comes to you. They're running IIS 6 shared hosting asp.net. They're not ranking well because www.site.com, www.site.com/default.aspx, site.com/default.aspx, www.site.com/Default.aspx are duplicate. What do you do?

Bow down before the one you serve because this canonical link is the only solution.

nealrodriguez

7:07 pm on Feb 13, 2009 (gmt 0)

wont iis admin do the trick?

[webmasterworld.com...]

tedster

7:17 pm on Feb 13, 2009 (gmt 0)

I'd say that's a pretty harsh point-of-view, garlicjr. All the search engines have this technical challenge that they did not create. It's the result of MS servers not adhering to internet standards in the first place, plus CMS makers and hosting providers being relatively clueless.

I'm delighted with this solution and the fact that it's accepted by all the majors. We are all servants of digital technology, and she is the dominant mistress here. Finally there's a way for mom and pop on shared Windows hosting to defend against all kind of crazy disruption.

Boulder90

7:19 pm on Feb 13, 2009 (gmt 0)

Wait a second...this tag has to go on every single page in your site?

smallcompany

7:21 pm on Feb 13, 2009 (gmt 0)

Wait a second...this tag has to go on every single page in your site?

Yes, or at least on each page that experiences this issue (i.e. because of trailing).

This is where other ways come in handy, like via .htaccess on Apache.

[edited by: smallcompany at 7:23 pm (utc) on Feb. 13, 2009]

pageoneresults

7:23 pm on Feb 13, 2009 (gmt 0)

I'm sorry, but if you're a serious SEO you're Google's female dog. White/black hat it doesn't matter.

For those of you on Windows, let that be a message.

A client comes to you. They're running IIS 6 shared hosting asp.net. They're not ranking well because www.example.com, www.example.com/default.aspx, example.com/default.aspx, www.example.com/Default.aspx are duplicate. What do you do? Bow down before the one you serve because this canonical link is the only solution.

For those of you on Windows, let that be another message! :)

You could also go to your host and point them to topics such as this and maybe help them understand the value of accommodating their hosting clients. This is not rocket science. It is not expensive and it takes a whole 5-10 minutes to purchase, install and be on your merry way. It is a per server configuration and any sites hosted on that server can now take advantage of the 2.0 method using httpd.ini or the 3.0 method using .htaccess. Here are the very simple rules to achieve this.

2.0 Rule

RewriteCond Host: ^example\.com
RewriteRule (.*) http\://www\.example\.com$1 [I,RP]

3.0 .htaccess Rule

rewriteCond %{HTTP_HOST} ^example.com [NC] 
rewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]

Man, all of this to accommodate something so simple. I know, I've been there. I remember the days of not knowing what the heck was going on with this whole canonical thing. Even the pronunciation of that word is challenging for many let alone dealing with the technology to fix it.

Boulder90

7:27 pm on Feb 13, 2009 (gmt 0)

Who has time to go back and put this on each one of their pages?

This seems like a giant waste of time for those who have already implemented 301's. A "solution" would be a single file in the main root of every site with simple instructons to the search engines. Having to tag every single page is laughably ridiculous.

I guess I can see how it makes sense putting this tag on new content you are about to publish, but what if you make a mistake and want to change the canonicalization?

That would suck. I'm not even going to bother using it on new content.

jeffposaka

8:26 pm on Feb 13, 2009 (gmt 0)

I think this will be very helpful with campaign tagging.

Robert Charlton

8:26 pm on Feb 13, 2009 (gmt 0)

Who has time to go back and put this on each one of their pages?
This seems like a giant waste of time for those who have already implemented 301's.

As I read it, this is an optional fix for those who don't have the knowledge, resources, or hosting setup to implement 301s. It's great news for moms and pops, and I think they should be grateful.

For those who have implemented 301s... or for those who haven't yet but can get the resources, I'd recommend sticking with 301s. I don't see anywhere that the tag is necessary, or that it gains you anything if you have properly set up 301s in place.

If your .htaccess hasn't addressed all of the possible canonical duplication issues, this tag in addition to your .htaccess might be helpful.

...and this means I can drop a part of my htaccess that might otherwise have used up resources, or complicated other htaccess rules.

If you've got .htaccess set up properly, I feel that it would be wise to stick with it.

For a more complete guide to using .htaccess for fixing dupe issues, I suggest checking out Jim Morgan's extraordinary Jan 2007 thread in the Apache forum...

A guide to fixing duplicate content & URL issues on Apache
How to canonicalize all of your URLs with a single redirect
[webmasterworld.com...]

I feel that this is still the way to go, particularly if you have a large site.

Oliver Henniges

8:29 pm on Feb 13, 2009 (gmt 0)

When I registered my commercial domain eight years ago, I had some alias domains, all residing on the same IP. Lets call them

www.mybranch1-mycompanyname.com
www.mybranch2-mycompanyname.com

In the first years I had an interesting amount of visitors searching for "mybranch2" as a keyword. Due to the canonical issue I sticked to redirecting everything tho mybranch1. Many of my pages follow a scheme like

www.mybranch1-mycompanyname.com/mybranch1/some/more/directories/widgets.html
www.mybranch1-mycompanyname.com/mybranch2/some/more/directories/widgets.html

The duplicate content issue had put so much damage to websites that I did not want to risk anything. Do you think the time is now ripe to switch back to

www.mybranch2-mycompanyname.com/mybranch2/some/more/directories/widgets.html

using the header-syntax above? Considering how important "keyword in domain-name" seems as a ranking-factor, currently, I might gain back an interesting bunch of orders.

But how will google view such broad changes?

Ocean10000

9:16 pm on Feb 13, 2009 (gmt 0)

The one thing I can think of that no one else here has mentioned at all. Is this might slow down some scrapers, who scrap your content verbatim. This tag being so new, the scraper will not know to remove it. Thus the search engines will not give them the credit for the page, when they get crawled.

steveb

9:22 pm on Feb 13, 2009 (gmt 0)

"Doesn't it depend on your Doctype? no slashie for HTML 4.0, slashie for XHTML 1.0."

Which is why I asked. The engines only mention using the slash, which makes the code invalid HTML, but works for XHTML. It would be nice if the engines would be freaking clear for one time and say a way to do something that is valid code.

This thing is no replacement for good site structure and 301s, but I'm wondering if it may be helpful for things like photo galleries, where Google has been not indexing all pages despite them having unique titles and alt text on the images. This is a way of saying photo3 is not a duplicate of photo16.

londrum

9:28 pm on Feb 13, 2009 (gmt 0)

i think it's handy just for blocking all the hundreds of extra query strings that people could easily stick on, and link to your site with, trying to lumber you with hundreds of duplicates.

i don't have access to .htaccess on my server, and i couldn't just block query strings outright because i needed them, so i struggled to find a way to block them.

but i finally found a way to do it... and then this easy way comes along to replace it.

g1smd

10:27 pm on Feb 13, 2009 (gmt 0)

This tag will do virtually nothing to help you with crawl budget issues, as Google will still have to read the content of each non-canonical URL they find on your site. With a proper redirect, there is no content for non-canonical URLs, just a status code and a new URL to be accessed, very much less data to return to the bot.

I am also of the opinion that many of those people that haven't understood enough about canoncalisation to have already properly implemented redirects, 404, and robots exclusion, to fix the problem will also botch their usage of this new tag too.

If two URLs have very different content but both have the same tag, maybe that sends a very clear "clueless webmaster" signal back to Google. What they will do with that signal I have no idea; but I guess it won't be long before we find out.

docbird

1:51 am on Feb 14, 2009 (gmt 0)

I've lately figured that Google, say, could itself be part of the problem.
This after - with Drupal sites - I initially didn't bother with htaccess code to ensure that pages were at www.example.com rather than example.com, as I didn't have any links to example.com.
Later, I was surprised to find some example.com/... pages indexed by google; and did implement the small code lines in htaccess.

Seemed to me, then, that Google had done something like, "Hey, let's try this URL, see if I can find any pages there."
I've similar problems with a photo gallery, partly from having multisite Drupal install.
gallery should be at (say) www.domain1.com/photogallery
- but google also reporting pages at www.domain2.com/photogallery [no images show up]
- this code could fix the latter, if I could figure how to implement in cms [menalto] gallery.

I might also add that don't think good search rankings should be just for folk with advanced technical skills; when I search, looking for good content, not who has hottest coding. Tho I am among folk who make some attempts to get things right; and hugely appreciate advice from webmasterworld experts.

Robert Charlton

6:09 am on Feb 14, 2009 (gmt 0)

This tag will do virtually nothing to help you with crawl budget issues....

I feel that if you have a site that's large enough to be concerned with crawl budget issues, you really shouldn't be relying on this tag... you should fix it in .htaccess.

I've lately figured that Google, say, could itself be part of the problem.

docbird - I would not blame the underlying problems on Google by any means. In this case, we're dealing with internet and server protocols, and an addressing system that existed well before Google came into being.

As pageone results said earlier, if anyone deserves blame for bad implementation, it's the hosting companies. His comments bear repeating....

...I say we put the onus on those providing the hosting services. This is something that should be part of a basic package these days but yet many hosts are freakin clueless, especially Windows hosting providers. Bunch of lazy cheap individuals who don't want to take the little bit of time that is involved to protect their network of hosting clients. Dingbats!

To his list of those who deserve some blame, I'd also add corporate IT departments, manufacturers of some visitor tracking software, designers of CMS systems, shopping cart manufacturers, and Microsoft.

phranque

10:19 am on Feb 14, 2009 (gmt 0)

you can put lipstick on a pig but that won't make me pucker up.

hint that we honor strongly... take your preference into account, in conjunction with other signals...

sounds like "probably, eventually, but maybe not".

golocal

12:55 pm on Feb 14, 2009 (gmt 0)

May I ask a few questions by a novice?
I have a 301 redirect in .htacess. Is this tag necessary?
If the tag is recommeneded does it only go on the Index.html in the Root or should I put it on every Index.html of every sub directory within the root?
And should it go in Index.php also?

g1smd

12:57 pm on Feb 14, 2009 (gmt 0)

This tag goes on every page of the site, but the value will change for each page of content. It is used to state the "correct" URL for a page of content when that exact same page of content can be accessed via a different URL. It is a way of telling Google that a page is exactly duplicated and which URL you would prefer to use.

Samizdata

2:15 pm on Feb 14, 2009 (gmt 0)

I have a 301 redirect in .htacess. Is this tag necessary?

If your .htaccess is done right then the tag is unnecessary.

It is designed for people who don't understand such things (or have no access).

Configuring the server to do it right is surely the better option.

...

[edited by: Samizdata at 2:48 pm (utc) on Feb. 14, 2009]

Murdoch

3:12 pm on Feb 14, 2009 (gmt 0)

So does this mean that Google is cool with affiliate links now? I had always thought they considered them to be gaming the system.

signor_john

4:34 pm on Feb 14, 2009 (gmt 0)

So does this mean that Google is cool with affiliate links now? I had always thought they considered them to be gaming the system

I've never seen any evidence to suggest that Google doesn't like affiliate links. "Thin affiliate" sites or pages are a different story.

As for the canonical tag, it sounds like a great idea: You can use it or not, according to your preference and needs, and it's honored by all three of the leading search engines. Why would anyone object to that? And why complain now (as some have done) just because you would have liked having it earlier? Should the search engines not implement or recognize a canonical tag in 2009 just because they didn't do it two or three years ago?

compose

5:22 pm on Feb 14, 2009 (gmt 0)

oops my question is lost, between posts of all tech gurus, so sorry pasting it here again for your consideration, can any one guide me regarding this .

I have a little question about this new tag.
Usually on e commerce sites few part of content is repeated with each product detail page like
"Size information", "Shipping details", "General FAQ's for product" etc (that is necessary too)
mostly these contents are represented using tabs (hidden divs)
So my question is, will such product pages come under duplicate content? (but except these hidden divs, which get visible on particular selection, every thing else is fresh for each product page i.e. product description) and if yes then
can this problem be shorted out using Canonical tag?

Thanks :)

tedster

5:39 pm on Feb 14, 2009 (gmt 0)

It "might" cause some kind of near-duplicate filtering, depending on how much unique text is on the pages involved. I wouldn't use hidden divs with identical content on every product page - find another solution that doesn't put the same text in every page's source code. Informational pop-ups, iframed pages, something along that line.

No, this canonical probably isn't going to be the tool for your needs.

compose

6:28 pm on Feb 14, 2009 (gmt 0)

Thanks tedster for quick reply and help. Then ajax or iframe content will be gud option.

Actually before few months google given 1 page rank to my site, but now it's more then 10 months but still page rank is not updated. So i thought it may be issue of these tab based duplicate content.

This 137 message thread spans 5 pages: 137