canonical tag on PDF

Forum Moderators: open

Message Too Old, No Replies

canonical tag on PDF

emmab21

3:22 pm on Nov 2, 2012 (gmt 0)

Hi,

I know there is a post on Google webmaster forums on adding canonical tags to HTTP headers for PDF's but I have been told this is only a solution for download of something which is also on the site.

I want to include some downloadable PDF white papers on our site that is already been published on the Prof's own personal website.

I don't want to get hit with duplicate content issues.

can anyone suggest anything?

ergophobe

3:51 pm on Nov 2, 2012 (gmt 0)

I think the header solution can be used in your situation - for example Google says one use case is for when you are using a CDN and the hosts are different. Headers should work for you.

You could also have the PDF listing page be no-index and exclude the PDFs with robots.txt

phranque

1:42 am on Nov 3, 2012 (gmt 0)

you can use the link rel canonical header for pdfs and you can use the canonical cross-domains.

Official Google Webmaster Central Blog: Supporting rel="canonical" HTTP Headers:
http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html [googlewebmastercentral.blogspot.com]

Official Google Webmaster Central Blog: Handling legitimate cross-domain content duplication:
http://googlewebmastercentral.blogspot.com/2009/12/handling-legitimate-cross-domain.html [googlewebmastercentral.blogspot.com]

[edited by: phranque at 1:43 am (utc) on Nov 3, 2012]

phranque

1:43 am on Nov 3, 2012 (gmt 0)

no-index and exclude the PDFs with robots.txt

the robots exclusion prevents the noindex from being seen.

ergophobe

7:18 pm on Nov 4, 2012 (gmt 0)

the robots exclusion prevents the noindex from being seen.

Read my comment again. Suggestion was to

- no-index the PDF *index* pages (i.e. the lists of PDFs or whatever s/he has). This is to keep titles/snippets out of the SERPS, but the page can get crawled.

- robots.txt on the PDFs themselves so they don't get crawled at all.

phranque

10:45 pm on Nov 4, 2012 (gmt 0)

my bad - i read that too fast - missed a couple words.

emmab21

12:03 pm on Nov 5, 2012 (gmt 0)

Thanks :)

jimbeetle

4:19 pm on Nov 5, 2012 (gmt 0)

I want to include some downloadable PDF white papers on our site that is already been published on the Prof's own personal website.

Why do you see a need to do this? Why not just link to the professor's website?