|Help, my content's been stolen even before I've used it!|
Will I get a dup penalty if I let Google spider this content now?
A site I recently acquired has a lot of good content that's been online for years. However, a lot of the pages are only accessible by clicking a form button. It seems Google wasn't following those links.
There are now a lot of sites using that (stolen) content. There are too many to chase down individually. If I change the linking so that Googlebot can access these few hundred content pages will Google see my content as duplicate content?
Wayback has archived some of this "hidden" content (about 10%)and I can therefore prove originality for some of the page. But trying certain text strings from these pages in Google suggests that Google hasn't indexed any of these "hidden" pages though they do seem to like the pages they can see.
The site is a very whitehat site, a PR6, thousands of inwards links, and highly respected in its industry.
How can I get Google to see that the content originated here? Do I even make this content accessible to Googlebot now? Will it mess up the rankings for all the other pages that are doing well in the SERPs?
I doubt very much you will get "penalized" (I don't believe that theory), but you may not rank with it. I would open it up to be crawled. I'm sure the others realized it wasn't being cached and that's one of the reasons they decided to steal it.
|A site I recently acquired has a lot of good content that's been online for years. |
Are you sure the person you acquired it from didn't pilfer any of it?
If the original author is willing to sign an affadavit that they wrote it, that it was original content, and you now own it, U start sending DMCA letters to the search engines and ISP to shut them down. True, there may be a lot of them, but you can start with the ones that rank at the top. However, if you don't think the content has that much value for your purposes based on your situation, then I'd drop it and move on.
Make sure that you really received the copyright for everything and that no rights were granted by the previous owner of the site.
Unless copyrights were specifically included in the sales contract, there is a good chance that you did not get them, but just the right to use them.
If you are the clear owner of the copyright, and you are certain that no rights were granted to anyone else, then you should start picking off those other companies a few at a time.
Trying to get everyone will drive you nuts, just go after those that are beating you in the SERPs you want to rank well in.
The content is his alright. I have no doubt about it. Apart from the fact that some of it is in Wayback Archive the content matches his style, his odd expressions, his very rare punctuation mistakes, the works. It's unmistakably his. I've compared it with work I know he's done. Also, the sites that are using this content are sites that are using other "borrowed" content and hardly the type of sites to have come up with the articles themselves. He was very, very meticulous with the handing over and I have copyright over all the content with nobody else having any permission to use any of it. (I even have every single email he's ever sent or received on that domain including the ones where he did give some educational institutions permission to use some of his images.)
To trace who is using each of the several hundred articles - then pursue what maybe thousands of thieves - may not be economical; the site earns about $50-$60 per day, not millions.
My concern is not with getting them shut down but the questions I had in my original post.
Macro, if I understand the way the duplicate content filter on Google works, the other pages will certainly not be penalized.
As for the ones that have been duplicated, it seems likely to me that the copy will be treated as the original by Google because it seems to be older, and your ACTUAL original will be excluded from the results. (But I wonder, if the pages have been online for years, unmodified, with an old file creation date...).
But even if Google excludes them, it seemt to me likely to be worthwhile to bring them out into public. People will find them more easily when on the site, and gradually they'll get linked to, indexed in other search engines, etc. And if you put AdSense or whatever on them, they'll start making money for you.
I agree with the others, don't let this go. Go after the most egregious offenders--the ones with the most pages stolen--or the ones that are most prominent--and start picking them off.
You could also try altering the text a bit and completely replacing the text in the navigation (if any of the scraped pages used the same navigation text). You might also consider splitting up long pages or merging related content into one page.
Any alterations will tend to reduce the odds of dupe filtering, at least somewhat.
|the site earns about $50-$60 per day, not millions. |
OK, imagine that a couple of the top theives also earn $50/day on your content, you bump them off, and suddently you're in 3 digits a day, whack 20 more and you're doing double that. I would take a shot at killing the high ranking offenders and see what impact it has on your revenue.
It might be the easiest money you ever made.
Thanks for the replies. Anybody else cares to comment on the duplicate issue as far as SERPs is concerned?
i think we more or less did....
You're trying to get SERP against your own stolen content where the theives have had more authoritative history with the search engine than your own pages. We all pretty much stated you need to go after the highest SERP ranking theives and shut them down, and I would do it before exposing your version to the search engines.
You can try exposing the content without going after the thieves, maybe if you have higher PR you will prevail regardless, no way to truly know until you do it.