Welcome to WebmasterWorld Guest from 54.167.83.224

Forum Moderators: ergophobe

Message Too Old, No Replies

canonical tag on PDF

     
3:22 pm on Nov 2, 2012 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 10, 2010
posts: 45
votes: 0


Hi,

I know there is a post on Google webmaster forums on adding canonical tags to HTTP headers for PDF's but I have been told this is only a solution for download of something which is also on the site.

I want to include some downloadable PDF white papers on our site that is already been published on the Prof's own personal website.

I don't want to get hit with duplicate content issues.

can anyone suggest anything?
3:51 pm on Nov 2, 2012 (gmt 0)

Moderator This Forum

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 25, 2002
posts:8139
votes: 103


I think the header solution can be used in your situation - for example Google says one use case is for when you are using a CDN and the hosts are different. Headers should work for you.

You could also have the PDF listing page be no-index and exclude the PDFs with robots.txt
1:42 am on Nov 3, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10544
votes: 8


you can use the link rel canonical header for pdfs and you can use the canonical cross-domains.

Official Google Webmaster Central Blog: Supporting rel="canonical" HTTP Headers:
http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html [googlewebmastercentral.blogspot.com]

Official Google Webmaster Central Blog: Handling legitimate cross-domain content duplication:
http://googlewebmastercentral.blogspot.com/2009/12/handling-legitimate-cross-domain.html [googlewebmastercentral.blogspot.com]

[edited by: phranque at 1:43 am (utc) on Nov 3, 2012]

1:43 am on Nov 3, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10544
votes: 8


no-index and exclude the PDFs with robots.txt

the robots exclusion prevents the noindex from being seen.
7:18 pm on Nov 4, 2012 (gmt 0)

Moderator This Forum

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 25, 2002
posts:8139
votes: 103


the robots exclusion prevents the noindex from being seen.


Read my comment again. Suggestion was to

- no-index the PDF *index* pages (i.e. the lists of PDFs or whatever s/he has). This is to keep titles/snippets out of the SERPS, but the page can get crawled.

- robots.txt on the PDFs themselves so they don't get crawled at all.
10:45 pm on Nov 4, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10544
votes: 8


my bad - i read that too fast - missed a couple words.
12:03 pm on Nov 5, 2012 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 10, 2010
posts: 45
votes: 0


Thanks :)
4:19 pm on Nov 5, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member jimbeetle is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Oct 26, 2002
posts:3292
votes: 6


I want to include some downloadable PDF white papers on our site that is already been published on the Prof's own personal website.

Why do you see a need to do this? Why not just link to the professor's website?