Welcome to WebmasterWorld Guest from 54.163.49.19

Message Too Old, No Replies

Canonical issues re dupe content in PDFs?

     
2:02 pm on Oct 14, 2010 (gmt 0)

New User

5+ Year Member

joined:Aug 4, 2010
posts:34
votes: 0


I'm currently working on a site that contains landing pages for articles with an abstract for each article and links to the full articles, which are PDFs which have been uploaded to the site. The abstracts are generally the first paragraphs of the full article. Since PDFs are indexed by Google, I assumed that dup content issues are as relevant for them as any other web page - is this incorrect? I'm thinking of implementing a canonical link element on the articles themselves. What are your thoughts on this?

Thanks, all!
9:18 pm on Oct 14, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


To be clear, do the html contain only an abstract and not the full article?
9:03 am on Oct 15, 2010 (gmt 0)

New User

5+ Year Member

joined:Aug 4, 2010
posts:34
votes: 0


Exactly.
10:03 am on Oct 15, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


I don't think it is a major concern.

However, I don't like visitors arriving directly at a PDF because there is no obvious navigation back to the rest of the site. They view one file and leave.

I will robots.txt disallow the PDF version URLs (all the PDFs will be in one folder, or folder tree), post all the text as a HTML page and prominently link to the PDF version from there.