Forum Moderators: Robert Charlton & goodroi
No wonder my traffic from Google has tanked. I have invested hundreds of hours researching and writing these definitions and over 10 sites that have DIRECTLY COPIED my content rank higher than mine. And my website seems to be penalized somehow and filtered to the bottom. You can imagine that I am a little frustrated.
Any ideas on how I can remedy the situation?
Fortunately, many webmasters have complied and most add links back to my website. If this is the case, shouldn't my website rank first? I remember Matt Cutts saying that if multiple pages have similar content, the site with incoming links from the similar pages should rank highest. That is certainly not the case in my situation, as my site is ranked last out of a dozen or sites and only shows up when the "Show omitted results" link is clicked.
I've got the same issue. The copied content actually has a link back to the original content and the original content isn't in google's index anymore. It dropped out after the copied content showed up. In addition, the original content has so many back links that it used to be the highest PR page on my site. Now the only hits it gets is from all the back links...and other search engines of course.
Something is definitely wrong with Google's duplicate content filter if it can't even see a link to the original content and use that as definitive proof of authorship. Duh! Call me crazy, but I still have faith that they'll figure it out and fix it.
Any ideas on how I can remedy the situation?
You should've been to this session at PubCon [pubcon.com] where we discussed some preemptive measures to thwart this in the first place.
Use the DMCA to have the ISP shut these sites down, and send copies to all the SE's to get that content knocked offline.
Then start installing some security measures to stop scrapers from hitting your site in the first place, at least automated scrapers, no reason to make it easy for them.
Scrapers = the bottom rung of people that copy you hard worked at content, solutions below ...
Solutions = first, you must file by mail the DCMA to the search engines, upon confirmation from each search engine, then file again via e-mail and mail to the web host, and their upstream ( most likely a reseller is the host ).
Now since you most likely don't have a federal copyright, you can not sue for damages, if you are like me, you will.
now also make sure that every search engine and PPC engine also knows that their is a DMCA complaint against this site.
at this point ( about 1 month later ) they surely have been regretting that they contacted you.
Also note:
before you do anything, have everything ( all legal, domain name addresses..... routed to your post office box, why? some scrapers will come knocking at your door, and, unlike most folks, I live in a guarded gated community with a high level of security. And if they try something at the post office it's a federal offense.
Small sites simply don't have the manpower to issue DMCAS to every site
That's not entirely true, I've sent a bunch of them, it depends on your motivation and I'm very motivated to keep bogus competitors away from my money. However, it pays to be selective and only send notices to the sites directly endangering your position that have cracked the top 100 results and ignore those that aren't visible in the SERPs until they bubble up, if ever.
The best part about the amateur scrapers is the churn rate as many sites vanish within a couple of months before they get any serious traction.
Not as easy as it sounds. Sometimes clone sites get visited more frequently by the googlebot because they have a better ranking with Google than the originating sites. This is a problem I have had with copying where the copying sites are big media outlets with good PR so my pages end up as supplemental results.
“Oh, come on now! Upon what kind of factual basis are you basing that assumption?”
I manage 35 of my own sites. Triple that at work. I've never had a site drop in rankings, just because some other site copied the content.
I've tried and tried to find this as a reason, only time and time again to find the reason more likely, a technical problem within my own site(s). Accidental duplicate content within a site, bad html(eg somehow deleting <body> tag., bad 404 pages etc.
Until then, 7 years of SEO tells me, that I've never once seen copied content appear above original content, unless.
(1) Copied content is improved and site has more links.
(2) Original site has problems.
(3) Original content "improved" or changed, and thus no longer duplicate.
Yes, I agree if content is stolen, and rewritten, you have a problem, but should not cause your page to be removed, that happens for another reason.
[edited by: tedster at 5:56 pm (utc) on Mar. 5, 2007]
did you check if the plagiators get spidered more often than your own site - this way making your own content look old? Some quality sites with rather stable content get a spider visit only every month or so. Others that make sure they get spidered every day might be on the index weeks before yourself and "stake the clain".
You'll find a "few" hints here on WW about growing Gs appetite for more pages.
nerd
I removed hundreds of sites over a month long period, and got our ranking from the 500-900 range back onto the first page of Google SERP where we had once been. Learn quickly how to accomplish this by reading the 2 topics I posted my info in:
[webmasterworld.com...]
This topic here is where I outline your step by step recipe for dealing with the scraper sites and obliterating them:
[webmasterworld.com...]
The basic steps are:
1) Get the offending site shut down with DMCA notice to webmaster
2) Once site is down and displaying 404 Page Not Found, you immediately submit that URL to Google's Urgent URL removal tool, and 2 days later the scraper site is out of Google's index for a minimum of 6 months. the beauty of my trick is that even if the scraper re-submits his URL to Google, it is kept out for at least 6 months.
You must wait until the offending site is 404 before submitting to the URL removal tool, otherwise Google will not remove the URL from their index. Do not wait too long, otherwise the offender may pop the site back up on a new server. Unles shis server is responding with a 404, the removal of his site from Google's index will not occur.
Hope this helps!
[edited by: JeffOstroff at 4:43 am (utc) on Mar. 10, 2007]
Reply to Marcia
“Oh, come on now! Upon what kind of factual basis are you basing that assumption?”I manage 35 of my own sites. Triple that at work. I've never had a site drop in rankings, just because some other site copied the content.
I've tried and tried to find this as a reason, only time and time again to find the reason more likely, a technical problem within my own site(s). Accidental duplicate content within a site, bad html(eg somehow deleting <body> tag., bad 404 pages etc.
Until then, 7 years of SEO tells me, that I've never once seen copied content appear above original content, unless.(1) Copied content is improved and site has more links.
(2) Original site has problems.
(3) Original content "improved" or changed, and thus no longer duplicate.
{1) The copied content was not improved one iota (one was an identical copy of a page that the site owner PAID someone for, for "outsourced content development"), and no, they do not have more links, they have far fewer - nor do they have an ODP listing.
{2) The original site has no technical problems at all. Didn't, and still doesn't.
{3) No improvement or change - it's an identical character string of 6 words that was an exact duplicate on both, which was what was used to find them.
Yes, I agree if content is stolen, and rewritten, you have a problem, but should not cause your page to be removed, that happens for another reason.
The other sites WERE ranking above the original for the specific 6-word test search string in quotes that was affected.
Maybe you have never seen stolen content from your site ona scraper site. Maybe your site was not one that hundreds of scraper sites took content from.
I think when you have a big enough attack of content scraped form your site, then it becomes a duplicate content issue.
Just yesterday we sent Google a DMCA asking them to remove a mini net we found of 860 cookie cutter Made For Adsense pages (all the exact same layout) that stole our description tag last year, and are now showing up in Google with it. Tell me that's not a problem.
If you say duplicate content is a problem from page to page on your site, then it surely makes sense that it would be a problem across the Google index from other URLS as well.
So unless you are a big site like CNN or on Google's white list, then duplicate content from multiple scraper sites surely is a problem for you. I've seen it, I've been the victim of it, I've engineered my own successful tools to combat it, I've emerged victorious in the past, I've published my reports on how to do it, and I'm living proof that scraped content from your site is a duplicate issue for your site.
[edited by: JeffOstroff at 5:39 am (utc) on Mar. 10, 2007]
It almost seems like nowadays 2 sites should be put up - one for the others, for which ranking at Google or not won't matter, and one for Google, excluding the bots from the other engines so they won't be found and scraped. It's very tempting to give it a try to test, as a matter of fact.
[edited by: Marcia at 5:45 am (utc) on Mar. 10, 2007]
Most of the time the content is being stolen and auto generated on the para-site, and most of the time the links to you are picked up. I have found that much of the time when there are links to you in a copied article it queues Google to understanding that the article is yours as any copies are all linking to you.
This help help of course when someone manually copies and pastes your content but that happens alot less IMHO.
It almost seems like nowadays 2 sites should be put up - one for the others, for which ranking at Google or not won't matter, and one for Google, excluding the bots from the other engines so they won't be found and scraped. It's very tempting to give it a try to test, as a matter of fact.
While I acept that scrapers can be a worry, I'd never see SEO suicide as a viable solution. Deal with the thieves, don't give up!
How can Google possibly know who owns the copyright?
The real pain is when a better site than yours steals your content, and while neither are 'dropped', the thief ranks higher than you.
A formal complaint is the only answer. But it is a wake-up call; YOU know the site stole your stuff, therefore you came first - and yet they ranked better. So you clearly have a design / link / other SEO problem that needs sorting ... the thief may have done you a favour!
[edited by: Quadrille at 6:30 pm (utc) on Mar. 10, 2007]
Information retrieval based on historical data [appft1.uspto.gov]
There's also another about freshness.
How can Google possibly know who owns the copyright?
clearly Google cant. However determining the original site for a line, paragraph, page of text, is totally different and at its most basic is the first cached date. It's a fundamental point really since Google wants to encourage unique content. If you get to a point where there its widespread for original content sites to rank below their own scraped content its contrary to everything Google is apparently striding to achieve.
For example, I published most of my stuff on 'free sites' before 1999; even later than that, much of my stuff has moved, as I've spread out onto more domains. Someone who copied my stuff in 1998 could all too easily, on web dates, claim to have got there first. I suspect others have similar stories. Though not all may have so obsessively kept records and dated backups.
Upload file dates, cache dates, stated dates are not a reliable guide to creation dates, and Google would be in greater trouble if they tried to stick to a date; currently, they ignore dates - which, legally, is the only safe route for them.
Copyright is important, but it is the copyright owners' job, under the law, not Google's, to protect it.
It is legally and physically impossible for Google to do it; and it's a little worrying to hear calls for Google to be the net police - usually people are saying "Hands Off Google".
[edited by: Quadrille at 11:02 pm (utc) on Mar. 10, 2007]
If you think a serps where writing original content gives that page a boost over a page scraping it for that content is 'worrying', then i am somewhat perplexed.
If there's anything at all to "historical data" then yes, first cache should be an indicator of some sort, though some opt not to allow caching, which is perfecty legitimate in many, many cases and for very good reasons.
I do not think - and did not say - a serps where writing original content gives that page a boost over a page scraping it for that content is 'worrying', I said your call for Google to be the net police was worrying <snip>
I don't like scrapers any more than you, I just believe there are ways to deal with it - as many have outlined above. But you seem to want it done for you - by Google. That ain't about to happen. Can you imagine the screams here if Google suddenly claimed to have the right to police these things?
[edited by: trillianjedi at 11:41 am (utc) on Mar. 11, 2007]
[edit reason] See Sticky [/edit]
Having such a link in the body copy seems like one good practice to fight content theft, especially theft of the automated kind. That's not something I tend to do, but I'm starting to adapt. Maybe it's time to place a "permalink" on every page, whether it's a blog or not!