Forum Moderators: Robert Charlton & goodroi
Today it was annouced that the 3 big search engines have come up with a new tag to help with canoncial issues.
Announcements:
[googlewebmastercentral.blogspot.com ]
[ysearchblog.com ]
[blogs.msdn.com ]
Using the new canonical tagSpecify the canonical version using a tag in the head section of the page as follows:
<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish"/>
That’s it!You can only use the tag on pages within a single site (subdomains and subfolders are fine).
You can use relative or absolute links, but the search engines recommend absolute links.
This tag will operate in a similar way to a 301 redirect for all URLs that display the page with this tag.Links to all URLs will be consolidated to the one specified as canonical.
Search engines will consider this URL a “strong hint” as to the one to crawl and index.
< See also Canonical Tag Results: Share the stories - Positive / Negative / No Impact [webmasterworld.com] >
[edited by: tedster at 6:08 pm (utc) on April 2, 2009]
Just as there are many ways that canonicalization can be done poorly, there are as many ways in which this new tag can be used improperly. I think those that don't "get" canonicalization are going to be just as clueless when it comes to this new tag, so I don't foresee any major competitive disadvantages emerging.
This is more interesting than I thought
say I have a list of items, at items.php
I can sort those items, ala items.php?sortby=name
I can page those items too, ala items.php?sortby=name&page=2
I can filter those items, ala items.php?sortby=name&page=2&query=foo
Let's take the third example, items.php?sortby=name&page=2&query=foo
That's a unique representation of the data. Never mind that the items in the list are repeated in other views with different sorting and filtering - it's still unique and in my books, it's canonical.
But what happens when the URL (link, bookmark, what have you) includes extraneous QS vars?
items.php?sortby=name&page=2&query=foo&sessid=12341234&referer=mypage.htm&trackingCode=a1a1a1
Those other variables are important to the script; they probably do hidden things like analytics and affiliate attribution and whatnot. But they do not change the content of the page - that's the important thing. This shows that in a URL, some parts (variables) influence the content of the page, and others do not.
There are three kinds of variables that appear in a URL.
1) ones that affect the content of a page. like: page, query, sortby.
2) ones that do not affect the content of a page. like: sessid, referer, affid
3) ones that are just meaningless, added by the user, typos
How I've dealt with these situations is the "scrub and redirect" technique. You identify the variables that are known, but don't affect the content of the page (type 2). Put them in a SESSION. Discard any unknown variables (type 3). 301 redirect the user to the canonical page, with only the canonical variables present in the URL (type 1). The result is that the user always ends up on a canonical URL, the important extra bits are stored in the SESSION, and all the noise is eliminated.
This tag is a useful in just those situations.
I'm not going to stop my scrub-and-redirect techniques. It works extremely well. However now instead of just sending vague instructions to the client in the HTTP header, I can do both: send a 301 redirection AND send - in the document - a semantic message describing the canonicalization status.
So now your client will know semantically why they are getting a Location redirection command in the header. It's not just a "Moved Permanently", it's a "Canonicalization Correction". Semantically they are different reasons for redirecting, and in some situations, the difference makes a difference.
Since on every page load I'm already coming up with the canonical URL to which the user is (possibly) going to be redirected, I can very easily plug that URL into a new tag in the <head>.
Not all webmasters are running their servers like programmable page factories. Many sites are built with Dreamweaver and a "Publish" button, or Notepad and Filezilla. and not all webmasters have the programming chops or Execute permissions to do server-side shenanigans. For webmasters who do not have a canonicalization infrastructure managing their URLs, this tag is a fantastic utility.
Usually on e commerce sites few part of content is repeated with each product detail page like
"Size information", "Shipping details", "General FAQ's for product" etc (that is necessary too)
mostly these contents are represented using tabs (hidden divs)
So my question is, will such product pages come under duplicate content? (but except these hidden divs, which get visible on particular selection, every thing else is fresh for each product page i.e. product description) and if yes then
can this problem be shorted out using Canonical tag?
I refuse to do anything specifically for the SE's benefit. I don't dance to their tune, they dance to mine.
I'm sorry, but if you're a serious SEO you're Google's bitch. White/black hat it doesn't matter.
A client comes to you. They're running IIS 6 shared hosting asp.net. They're not ranking well because www.site.com, www.site.com/default.aspx, site.com/default.aspx, www.site.com/Default.aspx are duplicate. What do you do?
Bow down before the one you serve because this canonical link is the only solution.
[webmasterworld.com...]
I'm delighted with this solution and the fact that it's accepted by all the majors. We are all servants of digital technology, and she is the dominant mistress here. Finally there's a way for mom and pop on shared Windows hosting to defend against all kind of crazy disruption.
I'm sorry, but if you're a serious SEO you're Google's female dog. White/black hat it doesn't matter.
For those of you on Windows, let that be a message.
A client comes to you. They're running IIS 6 shared hosting asp.net. They're not ranking well because www.example.com, www.example.com/default.aspx, example.com/default.aspx, www.example.com/Default.aspx are duplicate. What do you do? Bow down before the one you serve because this canonical link is the only solution.
For those of you on Windows, let that be another message! :)
You could also go to your host and point them to topics such as this and maybe help them understand the value of accommodating their hosting clients. This is not rocket science. It is not expensive and it takes a whole 5-10 minutes to purchase, install and be on your merry way. It is a per server configuration and any sites hosted on that server can now take advantage of the 2.0 method using httpd.ini or the 3.0 method using .htaccess. Here are the very simple rules to achieve this.
2.0 Rule
RewriteCond Host: ^example\.com
RewriteRule (.*) http\://www\.example\.com$1 [I,RP] 3.0 .htaccess Rule
rewriteCond %{HTTP_HOST} ^example.com [NC]
rewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301] Man, all of this to accommodate something so simple. I know, I've been there. I remember the days of not knowing what the heck was going on with this whole canonical thing. Even the pronunciation of that word is challenging for many let alone dealing with the technology to fix it.
This seems like a giant waste of time for those who have already implemented 301's. A "solution" would be a single file in the main root of every site with simple instructons to the search engines. Having to tag every single page is laughably ridiculous.
I guess I can see how it makes sense putting this tag on new content you are about to publish, but what if you make a mistake and want to change the canonicalization?
That would suck. I'm not even going to bother using it on new content.
Who has time to go back and put this on each one of their pages?This seems like a giant waste of time for those who have already implemented 301's.
As I read it, this is an optional fix for those who don't have the knowledge, resources, or hosting setup to implement 301s. It's great news for moms and pops, and I think they should be grateful.
For those who have implemented 301s... or for those who haven't yet but can get the resources, I'd recommend sticking with 301s. I don't see anywhere that the tag is necessary, or that it gains you anything if you have properly set up 301s in place.
If your .htaccess hasn't addressed all of the possible canonical duplication issues, this tag in addition to your .htaccess might be helpful.
...and this means I can drop a part of my htaccess that might otherwise have used up resources, or complicated other htaccess rules.
If you've got .htaccess set up properly, I feel that it would be wise to stick with it.
For a more complete guide to using .htaccess for fixing dupe issues, I suggest checking out Jim Morgan's extraordinary Jan 2007 thread in the Apache forum...
A guide to fixing duplicate content & URL issues on Apache
How to canonicalize all of your URLs with a single redirect
[webmasterworld.com...]
I feel that this is still the way to go, particularly if you have a large site.
www.mybranch1-mycompanyname.com
www.mybranch2-mycompanyname.com
In the first years I had an interesting amount of visitors searching for "mybranch2" as a keyword. Due to the canonical issue I sticked to redirecting everything tho mybranch1. Many of my pages follow a scheme like
www.mybranch1-mycompanyname.com/mybranch1/some/more/directories/widgets.html
www.mybranch1-mycompanyname.com/mybranch2/some/more/directories/widgets.html
The duplicate content issue had put so much damage to websites that I did not want to risk anything. Do you think the time is now ripe to switch back to
www.mybranch2-mycompanyname.com/mybranch2/some/more/directories/widgets.html
using the header-syntax above? Considering how important "keyword in domain-name" seems as a ranking-factor, currently, I might gain back an interesting bunch of orders.
But how will google view such broad changes?
Which is why I asked. The engines only mention using the slash, which makes the code invalid HTML, but works for XHTML. It would be nice if the engines would be freaking clear for one time and say a way to do something that is valid code.
This thing is no replacement for good site structure and 301s, but I'm wondering if it may be helpful for things like photo galleries, where Google has been not indexing all pages despite them having unique titles and alt text on the images. This is a way of saying photo3 is not a duplicate of photo16.
i don't have access to .htaccess on my server, and i couldn't just block query strings outright because i needed them, so i struggled to find a way to block them.
but i finally found a way to do it... and then this easy way comes along to replace it.
I am also of the opinion that many of those people that haven't understood enough about canoncalisation to have already properly implemented redirects, 404, and robots exclusion, to fix the problem will also botch their usage of this new tag too.
If two URLs have very different content but both have the same tag, maybe that sends a very clear "clueless webmaster" signal back to Google. What they will do with that signal I have no idea; but I guess it won't be long before we find out.
Seemed to me, then, that Google had done something like, "Hey, let's try this URL, see if I can find any pages there."
I've similar problems with a photo gallery, partly from having multisite Drupal install.
gallery should be at (say) www.domain1.com/photogallery
- but google also reporting pages at www.domain2.com/photogallery [no images show up]
- this code could fix the latter, if I could figure how to implement in cms [menalto] gallery.
I might also add that don't think good search rankings should be just for folk with advanced technical skills; when I search, looking for good content, not who has hottest coding. Tho I am among folk who make some attempts to get things right; and hugely appreciate advice from webmasterworld experts.
This tag will do virtually nothing to help you with crawl budget issues....
I feel that if you have a site that's large enough to be concerned with crawl budget issues, you really shouldn't be relying on this tag... you should fix it in .htaccess.
I've lately figured that Google, say, could itself be part of the problem.
docbird - I would not blame the underlying problems on Google by any means. In this case, we're dealing with internet and server protocols, and an addressing system that existed well before Google came into being.
As pageone results said earlier, if anyone deserves blame for bad implementation, it's the hosting companies. His comments bear repeating....
...I say we put the onus on those providing the hosting services. This is something that should be part of a basic package these days but yet many hosts are freakin clueless, especially Windows hosting providers. Bunch of lazy cheap individuals who don't want to take the little bit of time that is involved to protect their network of hosting clients. Dingbats!
To his list of those who deserve some blame, I'd also add corporate IT departments, manufacturers of some visitor tracking software, designers of CMS systems, shopping cart manufacturers, and Microsoft.
I have a 301 redirect in .htacess. Is this tag necessary?
If your .htaccess is done right then the tag is unnecessary.
It is designed for people who don't understand such things (or have no access).
Configuring the server to do it right is surely the better option.
...
[edited by: Samizdata at 2:48 pm (utc) on Feb. 14, 2009]
So does this mean that Google is cool with affiliate links now? I had always thought they considered them to be gaming the system
I've never seen any evidence to suggest that Google doesn't like affiliate links. "Thin affiliate" sites or pages are a different story.
As for the canonical tag, it sounds like a great idea: You can use it or not, according to your preference and needs, and it's honored by all three of the leading search engines. Why would anyone object to that? And why complain now (as some have done) just because you would have liked having it earlier? Should the search engines not implement or recognize a canonical tag in 2009 just because they didn't do it two or three years ago?
I have a little question about this new tag.Usually on e commerce sites few part of content is repeated with each product detail page like
"Size information", "Shipping details", "General FAQ's for product" etc (that is necessary too)mostly these contents are represented using tabs (hidden divs)
So my question is, will such product pages come under duplicate content? (but except these hidden divs, which get visible on particular selection, every thing else is fresh for each product page i.e. product description) and if yes then
can this problem be shorted out using Canonical tag?
Thanks :)
No, this canonical probably isn't going to be the tool for your needs.