homepage Welcome to WebmasterWorld Guest from 107.22.45.61
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Code, Content, and Presentation / Content Management
Forum Library, Charter, Moderators: ergophobe

Content Management Forum

    
canonical tag on PDF
emmab21




msg:4515080
 3:22 pm on Nov 2, 2012 (gmt 0)

Hi,

I know there is a post on Google webmaster forums on adding canonical tags to HTTP headers for PDF's but I have been told this is only a solution for download of something which is also on the site.

I want to include some downloadable PDF white papers on our site that is already been published on the Prof's own personal website.

I don't want to get hit with duplicate content issues.

can anyone suggest anything?

 

ergophobe




msg:4515095
 3:51 pm on Nov 2, 2012 (gmt 0)

I think the header solution can be used in your situation - for example Google says one use case is for when you are using a CDN and the hosts are different. Headers should work for you.

You could also have the PDF listing page be no-index and exclude the PDFs with robots.txt

phranque




msg:4515262
 1:42 am on Nov 3, 2012 (gmt 0)

you can use the link rel canonical header for pdfs and you can use the canonical cross-domains.

Official Google Webmaster Central Blog: Supporting rel="canonical" HTTP Headers:
http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html [googlewebmastercentral.blogspot.com]

Official Google Webmaster Central Blog: Handling legitimate cross-domain content duplication:
http://googlewebmastercentral.blogspot.com/2009/12/handling-legitimate-cross-domain.html [googlewebmastercentral.blogspot.com]

[edited by: phranque at 1:43 am (utc) on Nov 3, 2012]

phranque




msg:4515263
 1:43 am on Nov 3, 2012 (gmt 0)

no-index and exclude the PDFs with robots.txt

the robots exclusion prevents the noindex from being seen.

ergophobe




msg:4515648
 7:18 pm on Nov 4, 2012 (gmt 0)

the robots exclusion prevents the noindex from being seen.


Read my comment again. Suggestion was to

- no-index the PDF *index* pages (i.e. the lists of PDFs or whatever s/he has). This is to keep titles/snippets out of the SERPS, but the page can get crawled.

- robots.txt on the PDFs themselves so they don't get crawled at all.

phranque




msg:4515666
 10:45 pm on Nov 4, 2012 (gmt 0)

my bad - i read that too fast - missed a couple words.

emmab21




msg:4515832
 12:03 pm on Nov 5, 2012 (gmt 0)

Thanks :)

jimbeetle




msg:4515911
 4:19 pm on Nov 5, 2012 (gmt 0)

I want to include some downloadable PDF white papers on our site that is already been published on the Prof's own personal website.

Why do you see a need to do this? Why not just link to the professor's website?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Content Management
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved