Maybe I'm missing something, but wouldn't it have been simpler to accomplish this by using <base href="http://www.example.com" />
Collectively, how many millions of hours have been spent by developers trying to get around this problem? This is a trivial solution that they could have put in place a long time ago.
Note to self - I wish someone would write a book about how much economic damage is done by dumb ass programming and lame software.
This solution addresses all kinds of canonical problems [webmasterworld.com], not just with-www and no-www. It even lets the search engines know not to index certain parameters might be added to the url - session ids and so on.
From the Yahoo article:
|<link rel=”canonical” href=”http://www.example.com/products” /> |
The above tag indicates to the crawler that the URL it is present on should be represented canonically as http://www.example.com/products. This would eliminate the following duplicates:
Thanks for setting me straight Ted, I should read before posting :)
good to see some action
Regarding the <base> link, the Google Webmaster Central Blog article discusses how the <base> link can be used together with the rel="canonical" if you choose to use relative rather than absolute paths....
|...relative paths are recognized as expected with the <link> tag. Also, if you include a <base> link in your document, relative paths will resolve according to the base URL. |
That said, absolute paths are recommended for the canonical tag.
I think it will address the https issue as well, which we had discussed a lot.
Good to see some action from GYM.
So, this is not "will" but can be used as of now...
If this is okay
<link rel="canonical" href="http://www.example.com/page.html" />
is this okay?
<link rel="canonical" href="http://www.example.com/page.html">
Removing the xml trailing slash?
Not just okay, the slash is forbidden...
I got a invalid warning when I tried to validate the version with a slash. the version without a slash validated.
|I got a invalid warning when I tried to validate the version with a slash. the version without a slash validated. |
Doesn't it depend on your Doctype? no slashie for HTML 4.0, slashie for XHTML 1.0.
Interesting, can I ask what may be a silly question though? I've often considered developing a more mobile friendly version of my site but been worried about duplicate content issues (among other things). Would this be an ideal use for the new canonical tag?
An important development from the search engines. I was wondering will this eliminate/fix the duplicate title and meta tags data from Google Webmaster Tools?
Am I right in thinking therefore that a link such as
will now be viewed by Google as
(assuming the tag is used of course)
If so, this will have major positive implications for merchants....
Bah. Another proprietary element to fix an underlying technical problem.
That said, it's in the toolbox. I'm curious as to whether this would work for pagination, but the FAQ implies otherwise:
|Is it okay if the canonical is not an exact duplicate of the content? |
We allow slight differences, e.g., in the sort order of a table of products. We also recognize that we may crawl the canonical and the duplicate pages at different points in time, so we may occasionally see different versions of your content. All of that is okay with us.
Page 2 is obviously going to be substantially different from page one, but this element would be ideal for that scenario.
>>Another proprietary element to fix an underlying technical problem
Well said - it's like robots.txt, nofollow, sitemap.xml, etc. The search engines own the Internet now, so why not redecorate to suit their taste.
I get what this means, but I wonder what it means, right? I guess it's a lazy way to imply a 301 redirection without actually redirecting, for content publishers who don't have access to modify the HTTP headers. I'm going to spend the rest of the week imagining ways this could be abused.
OMG, life can be so simple! :)
A handy tool for webmasters without the means to rewrite. Might free-up some of jdMorgan's time too. ;)
Right... and this means I can drop a part of my htaccess that might otherwise have used up resources, or complicated other htaccess rules.
[edited by: Asia_Expat at 1:48 pm (utc) on Feb. 13, 2009]
|Is rel="canonical" a hint or a directive? |
It's a hint that we honor strongly. We'll take your preference into account, in conjunction with other signals, when calculating the most relevant page to display in search results.
... honor it... take your preference into account...
This will be very useful all around. Thanks for passing it along. The full url is important for sites that allow a page to be requested with multiple urls, which some do-- and it would be nice to add a canonical in these cases.
I wonder if this would be useful for tagging a print version to reference its screen version for indexing too.
This is not a solution
There, I got your attention. Using this tag is merely fighting symptoms, instead of properly addressing the underlying problem. The problem in this case being poor site structure and/or server configuration. I'm already appalled by the large lazy bunch that will see this tag as an excuse to ignore the task of properly structuring their content and correctly configuring their server. The canonical tag should, in my opinion, be viewed as a supplementary tool to prevent any unwanted duplication in cases like campaign tagging.
Poor structure is not only confusing to search engines, but also to users (and linkerati)
[edited by: johnnie at 3:42 pm (utc) on Feb. 13, 2009]
So if I put this tag on every page of my site because of our extensive partner network is a source of major duplicate content. Would the 3 search engines see this as abuse?
Will Google handle this the same way it does with other blocked pages and sometimes add the physical URL in the SERPS?
What is the difference between this and the robots="NOINDEX " tag.
How is the link juice distributed via the canonical tag? Would these links be devalued or not counted at all because of the canonical tag?
I'm very hesitant to release the hounds at the moment.
Ok, maybe someone can help me with my issue. (keep in mind i starte here only 5 months ago... so the site framework was in place pre-me)
site is built around asp and .net
alot of pages within the site are linked as "/goodtimes/bettertimes.asp?id="
we use a CMS so there are several templates that users can grab to create a page, and they all have the: <base target="_top"></base> which is for frames right?
can i use the base tag to implement this new tag?
essentially, i am finding that in my Google Webmaster Tools they see:
as two separate pages on various pages throughout my site...which i would like to avoid.
how should i implement this new tag? or should i at all?
The one thing I can't do with robots.txt wildcards, is prevent blank query strings WITHOUT affecting my forum software. The only way would be to list every other directory as a 'disallow' in the robots file, or with a potentially resource intensive htaccess rule.
I've not looked at the details yet, but I think this is a well thought out addition to our toolkit (ignoring underlying technological problems of course).
[edited by: Asia_Expat at 4:12 pm (utc) on Feb. 13, 2009]
Do we all agree that fixing canonical issues "for real" is better than using this tag? I think this is bad news for anyone who has fixed all canonical issues on their site. More SE exposure for the lazy competition.
|Do we all agree that fixing canonical issues "for real" is better than using this tag? |
You've got my vote and I say we put the onus on those providing the hosting services. This is something that should be part of a basic package these days but yet many hosts are freakin clueless, especially Windows hosting providers. Bunch of lazy cheap individuals who don't want to take the little bit of time that is involved to protect their network of hosting clients. Dingbats!
Okay, so this covers the BIG 3, what about 4, 5, 6, 7, 8, 9 and 10? Did we forget about those? What about any other crawler/bot that is indexing? How come Ask wasn't included in this? They are in a close race with that #3 position. ;)
I sit here and try to imagine what happens in the overall scheme of things. The BIG 3 adhere to these standards. The others could care less. Now you've got all these other indexing entities doing who knows what. How does that effect the BIG 3's view of the target domains?
Ya, a band-aid for sure but it does help quite a few with the BIG 3 only. One of these days Webmasters will realize the importance of addressing issues such as this at the root of the cause. Not 2, 3, or 4 steps into the process. I have a feeling that this leaves the door open for a bit of confusion when looking at the global aspects of it.
More like Canonical Tag Soup. ;)
|Do we all agree that fixing canonical issues "for real" is better than using this tag? I think this is bad news for anyone who has fixed all canonical issues on their site. More SE exposure for the lazy competition. |
+1 to that.
Fix the issues yourself. That helps you first and foremost without leaving any footprints.
I refuse to do anything specifically for the SE's benefit. I don't dance to their tune, they dance to mine. And specialty tags or attributes that are search engine defined falls into the camp of dancing to their tune.
Make it part of something like the W3 standard and maybe I'd think about it. But why would standards' bodies do something specifically for three commercial companies? They won't of course.
| This 137 message thread spans 5 pages: 137 (  2 3 4 5 ) > > |