Forum Moderators: not2easy
Then, I got to thinking, why not make all of my content available as a PDF? In other words, they would read the article in HTML, and then if they wanted to save the article or print it out, they could download the PDF version.
Since Google (and I presume others) can parse PDF files, will having the same content as both HTML and PDF be considered duplicate content, and thus lower my rankings?
And, if so, I presume I can just add an entry to my robots.txt file to prevent Google from parsing PDFs?
Eventually I gave up doing this as it was more work and the PDFs have a 'finality' about them that I don't like. It is easy to edit some HTML when an article needs a change but it takes more time to create a new PDF.
The PDF files should have little impact upon your rankings and I've got no conclusive proof that it would be a negative step. All my tests have proved it to be a positive step to generate articles in PDF files.
Indeed, you can block indexing easily with robots.txt, and if it duplicate content, I would consider it to be a good idea in this instance.
If anything, the only issue is that if you allow the SE (google) to index them all, you may find that (for example) the PPT version is shown in the SE, but the PDF version is obscured unless you expand the results. Visitors may click through to the wrong context (e.g. they get a PDF file with no surrounding navigation, whereas you prefer to have them enter HTML pages). The end result is that your site has inconsistently represented in the SE. That's may not be what you want.
I have since deprecated all of my content other than PDF and HTML. I did have the problem just mentioned (PPT v PDF).
Maybe it would be a better tactic to allow the SE to only pick up one version, the others are noindex. In a better world, your site would be content neutral: only one format exported to SE's, but by selecting per-page conversion (equivalent to "printable version") your server's engine would real time (or cache ...) translate and serve up the alternate format. Commercial services do this (Westlaw for one: it can serve up text, rtf, word, pdf, on demand, although argueable the content is not media rich and largely plaintext anyway).