Can You "Reclaim" Text Distributed In PDF Files As Your Own?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Can You "Reclaim" Text Distributed In PDF Files As Your Own?

Planet13

7:47 pm on Sep 7, 2013 (gmt 0)

Firstly, this is a RANKING question, and is NOT a copyright question nor a technical issue question.

If a site owner created quality content in pdf ebooks or pdf whitepapers, would it hurt their chance at ranking well if they used that content on their site?

The site is for a city-based service. Most of the content on their site is poorly written.

They have a bunch of PDF ebooks and whitepapers floating around the web on free ebook distribution sites which have BETTER content than appears on their own site. They've been out for about four years now.

Would they be inviting a Panda slap if they took their own content and put it on their site? Much of it is unique RESEARCH that they conducted themselves.

Shepherd

10:21 pm on Sep 7, 2013 (gmt 0)

Google has been getting pretty good at indexing pdf's so I would have to think that what your talking about is no different than if the content was on html pages across the web.

One caveat being if the pdf's you're talking about are images and not text.

jimbeetle

10:25 pm on Sep 7, 2013 (gmt 0)

Would they be inviting a Panda slap if they took their own content and put it on their site?

Just curious, why a Panda slap?

Planet13

6:49 am on Sep 8, 2013 (gmt 0)

@ Shepherd

"One caveat being if the pdf's you're talking about are images and not text."

These are text-based PDFs.

Any images are them are supplemental to the text content - and probably really ugly to boot.

@ jimbeetle

"Just curious, why a Panda slap?"

My thinking is that google would recognize this content in various places around the internet, and the when it saw it on their site, it would regard it as non-original / copied content.

aakk9999

12:47 pm on Sep 8, 2013 (gmt 0)

If you block PDFs from crawling via robots.txt (or return noindex response header for PDFs) then yes, you could re-use PDF text on website pages with no problems.

Planet13

2:13 am on Sep 9, 2013 (gmt 0)

If you block PDFs from crawling via robots.txt (or return noindex response header for PDFs) then yes, you could re-use PDF text on website pages with no problems.

The problem as I see it is that we CAN'T block the PDFs from being crawled because they have been distributed on to different sites around the web.

I'll see if there is some viable way for those PDFs to be taken down from the various sites that host them, but I think it might be a pretty big challenge.

JD_Toims

2:29 am on Sep 9, 2013 (gmt 0)

I'll see if there is some viable way for those PDFs to be taken down from the various sites that host them, but I think it might be a pretty big challenge.

Almost as good as the preceding in-my-opinion would be to ask the pdf content users to please give attribution and recognition to the source for search purposes by simply adding the following to their .htaccess:

<Files "the-path-on-their-server/to-the-pdf-file.pdf">
Header set Link: <http://www.example.com/the-original-pdf-file/file-name.pdf>;rel="canonical"
</Files>

Added Note: When I make a request at all like the preceding I fill in all the necessary information for people so it's a copy/paste simple task and nothing more -- Meaning if I were making the request I would find the PDF file on the site using it then put that location in the <Files ...> container and also set http://www.example.com/the-original-pdf-file/file-name.pdf to the actual location on the originating website so adding the canonical location to the .htaccess and giving credit is copy/paste nobrainer simple for anyone to do.

tangor

3:56 pm on Sep 9, 2013 (gmt 0)

Will the "reclaimed" content have significant change/updates which will result in content which is not "duplicate"? I've had good success in providing new/revised versions of earlier PDF texts as new presentation as HTML for some sites.