Welcome to WebmasterWorld Guest from 54.205.60.49

Forum Moderators: ergophobe

Message Too Old, No Replies

canonical tag on PDF

   
3:22 pm on Nov 2, 2012 (gmt 0)



Hi,

I know there is a post on Google webmaster forums on adding canonical tags to HTTP headers for PDF's but I have been told this is only a solution for download of something which is also on the site.

I want to include some downloadable PDF white papers on our site that is already been published on the Prof's own personal website.

I don't want to get hit with duplicate content issues.

can anyone suggest anything?
3:51 pm on Nov 2, 2012 (gmt 0)

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I think the header solution can be used in your situation - for example Google says one use case is for when you are using a CDN and the hosts are different. Headers should work for you.

You could also have the PDF listing page be no-index and exclude the PDFs with robots.txt
1:42 am on Nov 3, 2012 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



you can use the link rel canonical header for pdfs and you can use the canonical cross-domains.

Official Google Webmaster Central Blog: Supporting rel="canonical" HTTP Headers:
http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html [googlewebmastercentral.blogspot.com]

Official Google Webmaster Central Blog: Handling legitimate cross-domain content duplication:
http://googlewebmastercentral.blogspot.com/2009/12/handling-legitimate-cross-domain.html [googlewebmastercentral.blogspot.com]

[edited by: phranque at 1:43 am (utc) on Nov 3, 2012]

1:43 am on Nov 3, 2012 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



no-index and exclude the PDFs with robots.txt

the robots exclusion prevents the noindex from being seen.
7:18 pm on Nov 4, 2012 (gmt 0)

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



the robots exclusion prevents the noindex from being seen.


Read my comment again. Suggestion was to

- no-index the PDF *index* pages (i.e. the lists of PDFs or whatever s/he has). This is to keep titles/snippets out of the SERPS, but the page can get crawled.

- robots.txt on the PDFs themselves so they don't get crawled at all.
10:45 pm on Nov 4, 2012 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



my bad - i read that too fast - missed a couple words.
12:03 pm on Nov 5, 2012 (gmt 0)



Thanks :)
4:19 pm on Nov 5, 2012 (gmt 0)

WebmasterWorld Senior Member jimbeetle is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I want to include some downloadable PDF white papers on our site that is already been published on the Prof's own personal website.

Why do you see a need to do this? Why not just link to the professor's website?