Welcome to WebmasterWorld Guest from 54.146.201.80

Message Too Old, No Replies

Can You "Reclaim" Text Distributed In PDF Files As Your Own?

     
7:47 pm on Sep 7, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member planet13 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:June 16, 2010
posts: 3796
votes: 28


Firstly, this is a RANKING question, and is NOT a copyright question nor a technical issue question.

If a site owner created quality content in pdf ebooks or pdf whitepapers, would it hurt their chance at ranking well if they used that content on their site?

The site is for a city-based service. Most of the content on their site is poorly written.

They have a bunch of PDF ebooks and whitepapers floating around the web on free ebook distribution sites which have BETTER content than appears on their own site. They've been out for about four years now.

Would they be inviting a Panda slap if they took their own content and put it on their site? Much of it is unique RESEARCH that they conducted themselves.
10:21 pm on Sept 7, 2013 (gmt 0)

Preferred Member from US 

Top Contributors Of The Month

joined:Oct 5, 2012
posts:643
votes: 34


Google has been getting pretty good at indexing pdf's so I would have to think that what your talking about is no different than if the content was on html pages across the web.

One caveat being if the pdf's you're talking about are images and not text.
10:25 pm on Sept 7, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member jimbeetle is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Oct 26, 2002
posts:3292
votes: 6


Would they be inviting a Panda slap if they took their own content and put it on their site?

Just curious, why a Panda slap?
6:49 am on Sept 8, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member planet13 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:June 16, 2010
posts: 3796
votes: 28


@ Shepherd

"One caveat being if the pdf's you're talking about are images and not text."

These are text-based PDFs.

Any images are them are supplemental to the text content - and probably really ugly to boot.

@ jimbeetle

"Just curious, why a Panda slap?"

My thinking is that google would recognize this content in various places around the internet, and the when it saw it on their site, it would regard it as non-original / copied content.
12:47 pm on Sept 8, 2013 (gmt 0)

Moderator This Forum from GB 

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month

joined:Apr 30, 2008
posts:2511
votes: 142


If you block PDFs from crawling via robots.txt (or return noindex response header for PDFs) then yes, you could re-use PDF text on website pages with no problems.
2:13 am on Sept 9, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member planet13 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:June 16, 2010
posts: 3796
votes: 28


If you block PDFs from crawling via robots.txt (or return noindex response header for PDFs) then yes, you could re-use PDF text on website pages with no problems.


The problem as I see it is that we CAN'T block the PDFs from being crawled because they have been distributed on to different sites around the web.

I'll see if there is some viable way for those PDFs to be taken down from the various sites that host them, but I think it might be a pretty big challenge.
2:29 am on Sept 9, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:July 19, 2013
posts:1097
votes: 0


I'll see if there is some viable way for those PDFs to be taken down from the various sites that host them, but I think it might be a pretty big challenge.

Almost as good as the preceding in-my-opinion would be to ask the pdf content users to please give attribution and recognition to the source for search purposes by simply adding the following to their .htaccess:

<Files "the-path-on-their-server/to-the-pdf-file.pdf">
Header set Link: <http://www.example.com/the-original-pdf-file/file-name.pdf>;rel="canonical"
</Files>

Added Note: When I make a request at all like the preceding I fill in all the necessary information for people so it's a copy/paste simple task and nothing more -- Meaning if I were making the request I would find the PDF file on the site using it then put that location in the <Files ...> container and also set http://www.example.com/the-original-pdf-file/file-name.pdf to the actual location on the originating website so adding the canonical location to the .htaccess and giving credit is copy/paste nobrainer simple for anyone to do.
3:56 pm on Sept 9, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:6153
votes: 284


Will the "reclaimed" content have significant change/updates which will result in content which is not "duplicate"? I've had good success in providing new/revised versions of earlier PDF texts as new presentation as HTML for some sites.