homepage Welcome to WebmasterWorld Guest from 204.236.254.124
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Can You "Reclaim" Text Distributed In PDF Files As Your Own?
Planet13




msg:4608144
 7:47 pm on Sep 7, 2013 (gmt 0)

Firstly, this is a RANKING question, and is NOT a copyright question nor a technical issue question.

If a site owner created quality content in pdf ebooks or pdf whitepapers, would it hurt their chance at ranking well if they used that content on their site?

The site is for a city-based service. Most of the content on their site is poorly written.

They have a bunch of PDF ebooks and whitepapers floating around the web on free ebook distribution sites which have BETTER content than appears on their own site. They've been out for about four years now.

Would they be inviting a Panda slap if they took their own content and put it on their site? Much of it is unique RESEARCH that they conducted themselves.

 

Shepherd




msg:4608163
 10:21 pm on Sep 7, 2013 (gmt 0)

Google has been getting pretty good at indexing pdf's so I would have to think that what your talking about is no different than if the content was on html pages across the web.

One caveat being if the pdf's you're talking about are images and not text.

jimbeetle




msg:4608166
 10:25 pm on Sep 7, 2013 (gmt 0)

Would they be inviting a Panda slap if they took their own content and put it on their site?

Just curious, why a Panda slap?

Planet13




msg:4608227
 6:49 am on Sep 8, 2013 (gmt 0)

@ Shepherd

"One caveat being if the pdf's you're talking about are images and not text."

These are text-based PDFs.

Any images are them are supplemental to the text content - and probably really ugly to boot.

@ jimbeetle

"Just curious, why a Panda slap?"

My thinking is that google would recognize this content in various places around the internet, and the when it saw it on their site, it would regard it as non-original / copied content.

aakk9999




msg:4608287
 12:47 pm on Sep 8, 2013 (gmt 0)

If you block PDFs from crawling via robots.txt (or return noindex response header for PDFs) then yes, you could re-use PDF text on website pages with no problems.

Planet13




msg:4608395
 2:13 am on Sep 9, 2013 (gmt 0)

If you block PDFs from crawling via robots.txt (or return noindex response header for PDFs) then yes, you could re-use PDF text on website pages with no problems.


The problem as I see it is that we CAN'T block the PDFs from being crawled because they have been distributed on to different sites around the web.

I'll see if there is some viable way for those PDFs to be taken down from the various sites that host them, but I think it might be a pretty big challenge.

JD_Toims




msg:4608398
 2:29 am on Sep 9, 2013 (gmt 0)

I'll see if there is some viable way for those PDFs to be taken down from the various sites that host them, but I think it might be a pretty big challenge.

Almost as good as the preceding in-my-opinion would be to ask the pdf content users to please give attribution and recognition to the source for search purposes by simply adding the following to their .htaccess:

<Files "the-path-on-their-server/to-the-pdf-file.pdf">
Header set Link: <http://www.example.com/the-original-pdf-file/file-name.pdf>;rel="canonical"
</Files>

Added Note: When I make a request at all like the preceding I fill in all the necessary information for people so it's a copy/paste simple task and nothing more -- Meaning if I were making the request I would find the PDF file on the site using it then put that location in the <Files ...> container and also set http://www.example.com/the-original-pdf-file/file-name.pdf to the actual location on the originating website so adding the canonical location to the .htaccess and giving credit is copy/paste nobrainer simple for anyone to do.

tangor




msg:4608501
 3:56 pm on Sep 9, 2013 (gmt 0)

Will the "reclaimed" content have significant change/updates which will result in content which is not "duplicate"? I've had good success in providing new/revised versions of earlier PDF texts as new presentation as HTML for some sites.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved