|Correct Canonical URL Format and PDFs|
| 1:22 am on Jul 6, 2011 (gmt 0)|
I have 2 questions:
1. From reading other forum posts, it's my understanding that the correct canonical URL format depends on your doctype. Before I change 10,000+ pages, can someone confirm that this format is correct for my doctype below?
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<link rel="canonical" href="http://www.blah.com/">
<link rel="canonical" href="http://www.blah.com/dir/">
<link rel="canonical" href="http://www.blah.com/page.html">
Note: My .htaccess file already redirects non-www urls to www urls and adds the trailing slash for directories. (The examples without slashes or extra slashes after the end quotes in some of the forum posts were confusing me.)
2. My second question is in regard to PDFs. If I have a pdf that is a printable version of my html page, do I put the following code in the header of my html page? How does that help if someone chooses to repost your pdf elsewhere?
Link: <http://www.blah.com/page.pdf>; rel="canonical"
| 1:37 am on Jul 6, 2011 (gmt 0)|
Those URLs do look like they could be used for the canonical links. It's not going to have anything to do with your DTD, however. The DTD is essentially directions to the browser's rendering engine and it doesn't have any affect on URL resolution.
As for PDF files, there is no way to place a canonical link in the html - because there is no html.
Last month Google announced canonical support in the http header [webmasterworld.com] and specifically mentioned PDF files.
However, that's not going to help with any copies of your PDF files that appear on other people's servers. I usually embed the original URL in the document itself, and also in its meta data.
| 2:02 am on Jul 6, 2011 (gmt 0)|
Thanks for the quick reply Tedster. I found an older forum post [webmasterworld.com...] that discussed different doctypes with the canonical URL and wasn't sure if it was outdated.
I also optimize pdfs with title, meta data, copyright info and embedded link.
I've read through the "canonical support in the http header" thread several times. I understand that there's no way to place a canonical link IN the pdf. Without giving you a major headache, how do you "send" the canonical URL in the http header? I thought I finally had it figured out by placing the canonical link element TO the PDF document IN the html document (sample given above).
p.s. Not related - but amazing...while trying to find the thread reference using Google site search on Webmasterworld, this thread is already indexed in Google! Less than 20 minutes...amazing...
| 3:30 am on Jul 6, 2011 (gmt 0)|
That earlier post is about how the html for the canonical link element needs to vary by doctype - but the canonical URL itself is not affected by your doctype, only the mark-up.
The http header is the information that your server exchanges with the browser or other user-agent BEFORE it actually sends the document. So it's a server configuration and not something you place within any of your web documents. Depending on your hosting situation, you may not have the ability to address the http headers directly.
You can read up on http header information at the W3C website: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html [w3.org]
And the specific link header that Google is now supporting is explained on the IETF website: http://tools.ietf.org/html/rfc5988#section-5 [tools.ietf.org]
| 6:42 pm on Jul 6, 2011 (gmt 0)|
The canonical LINK element syntax varies minutely with DOCTYPE.
<link rel="canonical" href="http://www.example.com/">
<link rel="canonical" href="http://www.example.com/" />
Always use "example.com" in the forum to stop the auto-linking function.
If you use Apache hosting you can quite easily amend the HTTP headers using .htaccess or PHP.
One thought is that if you add canonical information to the HTTP headers before the PDF file is sent, the canonical element in this case pointing to the URL for the HTML version, if Google finds another copy of that exact same PDF document elsewhere on the web, it might also associate it with the URL for your HTML page.
WebmasterWorld threads are often indexed in less than 60 seconds. :)
| 4:03 am on Jul 7, 2011 (gmt 0)|
Thanks for the extra info!
Could you give an example of what the code might look like in a .htaccess file for:
http://www.example.com/page.pdf (that's canonical URL should be http://www.example.com/page.html)?