Welcome to WebmasterWorld Guest from 188.8.131.52
Today it was annouced that the 3 big search engines have come up with a new tag to help with canoncial issues.
Using the new canonical tag
Specify the canonical version using a tag in the head section of the page as follows:
<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish"/>
You can only use the tag on pages within a single site (subdomains and subfolders are fine).
You can use relative or absolute links, but the search engines recommend absolute links.
This tag will operate in a similar way to a 301 redirect for all URLs that display the page with this tag.
Links to all URLs will be consolidated to the one specified as canonical.
Search engines will consider this URL a “strong hint” as to the one to crawl and index.
< See also Canonical Tag Results: Share the stories - Positive / Negative / No Impact [webmasterworld.com] >
[edited by: tedster at 6:08 pm (utc) on April 2, 2009]
Some specific points addressed in the video:
1. The canonical tag will work across the https: and http: protocols
2. The canonical tag will work across hostnames (subdomains).
3. Avoid creating infinite loops where urlA says urlB is the canonical and then urlB says urlA is the canonical version.
4. Avoid a canonical url that returns a 404 or 410 status.
5. Avoid canonical chains, just as you should avoid 301 redirect chains.
6. The canonical url you declare does not need to get an exact content match from the server, but its content should be very similar. This means you can apply it to various "sort" versions of the same material.
The canonical url you declare does not need to get an exact content match from the server, but its content should be very similar. This means you can apply it to various "sort" versions of the same material.
This is where the utility of tag differs from having redirections in place. In other cases I agree, fix them in the beginning where possible.
Having different "versions of the same material" seems to be bending the meaning of canonical.
The customer can access the product page with various links:
The simple product page:
The product page in the category: gummy candy
I'll come up with a third:
The product page in the category: gelatine free sweets
And a forth:
The product page in the category: salty sweets
Note, that in the example Matt Cutts gives, the category and product name is displayed in big bold letters, so the visitor can immediately see two informations:
1. In which category he is on the website
2. Which products are displayed.
Now Matt Cutts happily tells us: We can resolve this duplicate content issue by picking one of those product pages and making it our "canonical page" with the new "canonical tag".
So here is my problem: I do not think those pages have a duplicate content issue. I do believe that those four pages are entirely different and for some users the essential information lies in the 1 percent of information that differs in those four examples and that this information for me as a shop owner can decide if I make the sale or not.
What if a user searches for: "Gelatine free swedish sweets"?
Will my product page even show up now in the search results if I use No 1 as my canonical URL?
And even if it does - won't my potential customer be confused if he specifically searched for "gelatine free swedish fish" and this important information does not show up in a prominent place because I directed him to my "canonical URL". What about the user who wanted "swedish salty sweets". What about the user who wants "swedish gummy candy".
What if I made the URL "salty sweets" my canonical. Won't a visitor be confused if he does a search for "gelatine free sweets" and then ends up with "salty sweets"? That wasn't the information he was looking for.
When I understand the implications of the canonical tag correctly I might loose 75% of my sales on swedish fish if I use this tag.
I checked my own online shop. Tried searches like
product_type + part of product name
brand + part of product name
For both searches Google showed me another URL. However the correct product at its correct location in my website - for the specific search. One in the "brand category" one in the "product type category" How it should be. Like a visitor would expect it.
I would be nuts to implement this tag. It's fixing something that is not broken. Or did Matt Cutts only use a bad example?
In which category he is on the website
This is always going to be problematical.
If the user browsed from a category page fine, showing category, or path thru category tree, is good a idea, be it a breadcrumb or heading.
If however the user gets there by other means, say direct from search engine, and products can be under multiple categories, showing a page suggesting that product belongs to a specific category is wrong, particularly if no category is suggested by the search terms.
Showing all categories the product belongs to is probably the best way to go.
Knowledge of how the user got to page ( search terms or referer ) could be used to modify page using script, if this is felt to be needed.
Be it a good idea or otherwise, this tag allows "where from" information to be encoded in the url, with the canonical tag telling the search engines to ignore it.
I duplicated a record in my database and created a new page with an seo url where the only difference was the use of an "_" between 2 words instead of an "-". We'll call them test_page.html (the new page I am trying to canonical redirect to) and test-page.html (the existing page). Both pages were otherwise identical in content and structure.
My initial test was to see if the canonical tag would work if the canonical url (test_page.htm) was an orphan page not linked to anywhere else on the site. After setting the canonical tag on test-page.html I waited 10 days for any results.
As hinted at by Google by their "other factors" comment, they rightly ignored the new (unlinked to) page and continued to index the old page (test-page.html).
My second test was to link to both pages in an identical place (my html sitemap). Would the canonical tag page now take precedent as intended?
3 days after adding a link to the new page (test_page.html) the canonical tag was recognised BUT the initial page was dropped from the index and the new page WAS NOT added in it's place! Err.. oh.. where has all my traffic gone?!
A further 5 days after this the canonical url has now been indexed in place of the old url in exactly the same ranking position the old page had been. (phew)
My test posed Google with the problem of a brand new url as the canonical tag target. This was as brutal a use of this tag as I could think of and probably not what it's use would be in most circumstances.
However I didn't think the loss of rank (and therefore 4 days of traffic) was ideal and it got me thinking that there must be two independent processes at work with this tag, we can cause this kind of ranking issue.
Process one drops the old url from the index as it is non canonical, and the second process picks up the new one (probably on googlebots next visit).
This makes the canonical tag a much less efficient and more dangerous than other common techniques to repair canonical issues (such as traditional 301s).
<link rel="canonical" href="http://www.example.com">
<link rel="canonical" href="http://www.example.com/">
<link rel="canonical" href="http://www.example.com/directory">
<link rel="canonical" href="http://www.example.com/directory/">
But it is quite common practice to include the slash after /page/ as well, especially when a dymnamic site does not have a native directory structure and rewrites to search-friendly urls. Google clearly can deal with that practice.
Most of all, be consistent in your canonical tagging and your internal linking.
[edited by: tedster at 6:24 pm (utc) on April 2, 2009]
For a plain address, add the slash (http://www.example.com/)
For a directory, add the slash (http://www.example.com/path/)
When in doubt, add the slash. Why?
Watch what most browsers do to the address when you enter it.
Click on http://www.example.com results in http://www.example.com/ in the browser address bar.
A plain address adds a slash after it, it's actually the standard, although every browser does the right thing there anyway.
As for the paths, if you DON'T put a slash for a directory, many web sites actually submit a 301/302 (IIS, and Apache, I believe), then redirect to the slashed directory.
The only time you shouldn't use a slash after a directory is if you have a site using mod_rewrite or ruby on rails, where every URL is passed through a handler which does normalization and page generating. In which case, the trailing slash is dependent on the app.
However, when in doubt, use the slash. If you really don't want to, use a HTTP header sniffer and see what the web server returns for the non-slashed version, if it gives a 200 then you are OK not using it.
*** Showing all categories the product belongs to is probably the best way to go. ***
Indeed, you should have search links on this product page pointing to "search for more salty sweets" and "search for more gelatine-free sweets" too.
*** http://www.example.com/%20-%2020k ***
That suggests you omitted the trailing " quote mark on the URL itself.