Welcome to WebmasterWorld Guest from 220.127.116.11
My company has invested quite a bit in white papers. We allowed are partners to post them on their sites, and this may have been a factor in our being inflicted with the -30 penalty (no way of knowing that for sure of course).
Adam Lasnik has written about duplicate content. Go to the 'Official Google Webmaster Central' blog and you'll find his comments. Summary: G does not like duplicated content. Because white papers generally are content heavy, my guess is it's a very bad thing.
Our response was to have them on our site only, and to get our partners to link to us but with the PDF loading into another browser window - so that the prospect stayed on their site.
If they insist on hosting them (and they might) then put them into a separate directory on their servers, and use robots.txt to exclude spiders from that directory - just like you would 'printer friendly' pages.
Google is very clear while penalizing duplicate content, it will never penalize the original site.
Infact PDF distribution is a good way of content syndication and a good viral marketing technique too. Make sure to put a link on the PDF that takes users back to your site.
Once again, you can share your PDF with as many partners as you want if the PDF has been there on your site for some time (Google has come and seen it).
One more argument to support this - if Google could penalize in this case, people could easily download PDFs and publish them on thier sites to get competitors in bad names with Google.
But the sites hosting the pdfs may get indexed for the content instead of you even if you have had the content for a while - especially if they have much higher pagerank.
If you dont mind that then its not a problem.
robots exclusion if it is an option would solve all problems though
1. If the content is already cached on your site, there is no way Google will think it belongs to other sites you submit it to. No matter what the PR is.
2. When you can get a backlink from these documents submitted in the sites, why would you disallow search engines in robot.txt
Has anyone heard of content syndication. I know almost all SEOs submit the articles already implemented on the site to other resources to get backlings and in tern get higher Page Rank.
Let me know if the fundamentals have changed?