Forum Moderators: Robert Charlton & goodroi
Today it was annouced that the 3 big search engines have come up with a new tag to help with canoncial issues.
Announcements:
[googlewebmastercentral.blogspot.com ]
[ysearchblog.com ]
[blogs.msdn.com ]
Using the new canonical tagSpecify the canonical version using a tag in the head section of the page as follows:
<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish"/>
That’s it!You can only use the tag on pages within a single site (subdomains and subfolders are fine).
You can use relative or absolute links, but the search engines recommend absolute links.
This tag will operate in a similar way to a 301 redirect for all URLs that display the page with this tag.Links to all URLs will be consolidated to the one specified as canonical.
Search engines will consider this URL a “strong hint” as to the one to crawl and index.
< See also Canonical Tag Results: Share the stories - Positive / Negative / No Impact [webmasterworld.com] >
[edited by: tedster at 6:08 pm (utc) on April 2, 2009]
This is going to go right over the head of 99% of webmasters.
So should i have remove the robot rule and add this new tag
I would add the new tag as a backup (it will help in other types of canonical issues, too) but definitely would NOT remove the existing robots.txt rule or .htaccess rules that are already in place.
For one thing, you will save google some spidering cycles. For a second, the canonical tag is taken as a "hint" and not as an ironclad rule. Your own server configuration is still a more solid solution than this new tag - especially because it is still new and a bit of an unknown quantity in practice.
What's so hard about it? You have multiple URLs showing similar content. You want the users to have access to the multiple URLs, but you want search engines only to index one of them. So you stick a tag on each page pointing the SE's to the official URL.
If we can get everyone participating who has questions to understand that simplistic explanation, maybe we can cut the questions from 50 to 10 or so? :)
Let's see: The big three search engines came up with this together. Dang! I've figured it out. They're planning a merger! Quick, everybody, to the GOOG, Microsoft, and Yahoo corporate forums! :-)
[edited by: signor_john at 8:48 pm (utc) on Feb. 17, 2009]
http://www.example.com/
http://www.example.com/Default.aspx
which have identical content and I want that the http://www.example.com/ is the page chosen by the search engines, would I include forward slash at the end of the URL in the new tag? I.e., would the new tag which I would put in the head section of the http://www.example.com/Default.aspx page be:
a) <link rel="canonical" href="http://www.example.com/" />
or
b) <link rel="canonical" href="http://www.example.com" />
Thanks.
I've seen poorly executed (keyword fluffing) url rewrite schemes that can also benefit from this. For instance, the server might deliver the same content for these two urls:
example.com/21/categoryname/location/page
example.com/21/anyoldstring/location/page
Also some rewrites are not particular about the order of the virtual folder names, and now they won't need to be:
example.com/21/categoryname/location/page
example.com/21/location/categoryname/page
but what about first question how can i implent this code in my dynamic site
Let say my site urls are like this
www.example.com/category/post.php?post_id=1&cat=1
www.example.com/category/post.php?post_id=2&cat=1
www.example.com/category/post.php?post_id=1&cat=2
www.example.com/category/post.php?post_id=2&cat=2
www.example.com/category/post.php?post_id=1&cat=100
and so on
www.example.com/category/post.php?category.php?cat_id=1
www.example.com/category/post.php?category.php?cat_id=2
and so on
well my site generate dynamic links like for post it generates links like this
www.example.com/category/post.php?post_id=2&cat=1&mostviews
www.example.com/category/post.php?post_id=2&cat=1&random
www.example.com/category/post.php?post_id=2&cat=1&mostemailed
and same for categories
so will u plz help me adding this code in header
should i have to add code like this
<link rel="canonical" href="http://www.example.com/category" />
Or how , will u plz post me the exact code i can add in my site header
<link rel="canonical" href="htttp://www.example.com/category/post.php?post_id=2&cat=1">
... the href attribute's value begins with the front part of url but the unwanted parameter is dropped. I say "sounds like" because I'm naturally not familiar with your site, so I can't be sure that these extra parameters are actually duplicating the same content - that would be your job or your team's job to ensure.
By the way, I thought you were blocking those urls in robots.txt anyway. If so, any problem should already be handled and googlebot won't be requesting those urls anyway.
Do we all agree that fixing canonical issues "for real" is better than using this tag? I think this is bad news for anyone who has fixed all canonical issues on their site. More SE exposure for the lazy competition.
@Tonearm, @pageoneresults, @wheel: You can never "fix" canonical for real for a site. Take any site URL, and add
http://www.example.com/?utm_source=whatever&ovcpn=whatever
And it delivers the same content as
http://www.example.com/
Many, many, PPC and SEO tracking solutions use this type of tracking, and the search engines *may* use those duplicate inbound links as attempts at duplicate content, as it doesn't perturb the page (in most cases.)
The technical solution which works is the solution that the 3 engines came up with: Put on the page the "parameters" which the page actually pays attention to which outputs different content. (e.g. ?article=54)
The solution is simple and easy to implement.
The only way to truly "fix canonical issues" for real (as you like) is to check all incoming query string parameters, and if any invalid ones are found, issue a 404, or a 3xx and redirect to the correct page.
However, that would make most of my search marketing and SEO customers cringe as it kills most of how they track things, including inbound referrers (depending on the 3xx code.)
Cheers.
Edited: Reply to page one
@wheel: You can never "fix" canonical for real for a site.
Really, the entire issue is a non-problem. Hardly anyone has these kinds of problems that can't be fixed through either rearanging your site, using robots or htaccess files. And most sites don't even have the need for even any of that. Anyone outside that box, well, we're looking at one in a million. It's a hammer looking for a nail.
Nevertheless, I'm not disputing the technical aspect of it, anymore than I dispute the technical aspect of the nofollow tag. I am disputing that this is not what webmasters want to be doing long term. It's good for the SE's at the potential long term expense of us. We're (well, not me - it's youse guys) are building the next MS monopoly with this kind of behavior.
All that being said, I somehow doubt that this is actually going to help anyone's ranking noticeably, or make enough of a difference that couldn't be 'fixed' by some more backlinks. There's no way this is a panacea for helping people to rank. And if it's not, what the heck, you need more work to do or something :)?
Haven't we been down this road already?
You can never "fix" canonical for real for a site. Take any site URL, and add...http://www.example.com/?utm_source=whatever&ovcpn=whatever
Wouldn't you just do...
<link rel="canonical" href="http://www.example.com/" /> Ya, there are all sorts of neat things you can add onto a URI string and still have it resolve to the destination page. I've seen all sorts of tricks in this area over the years. ;)
I'll keep me eye on its usage and see how others are doing in the process. It's still way too early for me to jump on the bandwagon. We typically manage ours at the server level so it is not relevant for us. Although I may find myself looking at its use when I need to do something quickly while we get the proper solution in place. :)
Canonical Name
The actual name of a resource.
I don't have an canonical problems I need to fix. The search engines may have problems.
And again, from a technical standpoint, the engines can use cryptographic checksums (e.g. md5, sha1) on page content to automatically deduplicate content.
Also, they should be "smart" enough to know that:
http://www.example.com/Default.asp =
http://www.example.com/default.asp =
http://www.example.com/ =
http://www.example.com/DeFaUlT.AsP =
I mean, what does 6 billion in profit get you these days?
Of course, if you have dates/times, or random images, it gets trickier.
I mean, what does 6 billion in profit get you these days?
It buys you sophisticated systems for filtering duplicates out of results.
If you serve the same content via multiple URLs, search engines will choose which one to display in results. Sometimes, this results in a choice that webmasters don't like - particularly if search engines don't consolidate things like link popularity to a single URL. So, a site owner can end up with several weak URLs instead of one strong one - so-called duplicate content problems.
Search engines don't really care - it's very rare to see duplicate content in a single result set these days. This option (as our redirects) are a way for you to choose a preferred URL, as opposed to letting search engines do it for you - sometimes to your disadvantage. This isn't you helping them out, but the reverse.
Whether this element actually works or is a good implementation is a different question, but this isn't like nofollow.
but this isn't like nofollow.
I appreciate I sound like tfh guy and appreciate there's some validity to that perception. But I disagree that this tag can't be distorted exactly like nofollow was. The initial characteristics are identical.
In any event, I seem to be able to rank without using any of this stuff, even on sites that have all sorts of potential canonical issues and duplicate content.
It's exactly like nofollow
I understand where you're coming from. I have a distaste for any of the proprietary elements introduced by search engines. And for sure, SEO-types will try to figure out a way to (ab)use any element that impacts on search engines.
But for me, this is intended to fix a webmaster's problem - whereas nofollow is to fix a search engine's problem.
The only way to truly "fix canonical issues" for real (as you like) is to check all incoming query string parameters, and if any invalid ones are found, issue a 404, or a 3xx and redirect to the correct page.However, that would make most of my search marketing and SEO customers cringe as it kills most of how they track things, including inbound referrers (depending on the 3xx code.)
Doing say makes tracking more effective too - no bookmarked tracking URLs to skew statistics. One item of content per URL is a good model for reasons other than just search engines ;)