Google Penalizing Original Content - copy sites rank higher - Google Search and SEO forum at WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Penalizing Original Content - copy sites rank higher

macman23

5:41 am on Feb 27, 2007 (gmt 0)

One of my websites contains a glossary of terms and definitions that I have written from scratch. When I copy a sentence from one of my definitions and put quotes around it in Google, there are 13 results. However, it only shows the top 11. My site doesn't even show up in the list unless I click the "Show omitted results" option. Then it ranks at the bottom.

No wonder my traffic from Google has tanked. I have invested hundreds of hours researching and writing these definitions and over 10 sites that have DIRECTLY COPIED my content rank higher than mine. And my website seems to be penalized somehow and filtered to the bottom. You can imagine that I am a little frustrated.

Any ideas on how I can remedy the situation?

soapystar

1:08 pm on Mar 2, 2007 (gmt 0)

hey marcia did you see the comment saying the opposite? That google may actually be treating the more frequently updated page as the newest and therefore not the original holders of the content?

freelistfool

5:52 pm on Mar 2, 2007 (gmt 0)

Fortunately, many webmasters have complied and most add links back to my website. If this is the case, shouldn't my website rank first? I remember Matt Cutts saying that if multiple pages have similar content, the site with incoming links from the similar pages should rank highest. That is certainly not the case in my situation, as my site is ranked last out of a dozen or sites and only shows up when the "Show omitted results" link is clicked.

I've got the same issue. The copied content actually has a link back to the original content and the original content isn't in google's index anymore. It dropped out after the copied content showed up. In addition, the original content has so many back links that it used to be the highest PR page on my site. Now the only hits it gets is from all the back links...and other search engines of course.

Something is definitely wrong with Google's duplicate content filter if it can't even see a link to the original content and use that as definitive proof of authorship. Duh! Call me crazy, but I still have faith that they'll figure it out and fix it.

PowerUp

2:51 pm on Mar 4, 2007 (gmt 0)

Google should hire us as consultants.

incrediBILL

2:59 pm on Mar 4, 2007 (gmt 0)

Any ideas on how I can remedy the situation?

You should've been to this session at PubCon [pubcon.com] where we discussed some preemptive measures to thwart this in the first place.

Use the DMCA to have the ISP shut these sites down, and send copies to all the SE's to get that content knocked offline.

Then start installing some security measures to stop scrapers from hitting your site in the first place, at least automated scrapers, no reason to make it easy for them.

soapystar

5:54 pm on Mar 4, 2007 (gmt 0)

missing the point of 100's or more copy and paste sites. Small sites simply don't have the manpower to issue DMCAS to every site. Where these sites mix and match their cut and paste you have to identify each and every page of the offending site. This not automated. These are guys, often in the far east, knocking out site after site and simply copying everyone elses content or mixing it with a database pull. Just focusing on the original definition of scrapers is slanting the nature of the problem. Thats without getting to the point of understanding how the original page ranks below the copy and paste pages.

mojomike

7:34 pm on Mar 4, 2007 (gmt 0)

in my favorite self written handbook " how to make someone really dislike that they have interrupted your day " I have this about scrapers

Scrapers = the bottom rung of people that copy you hard worked at content, solutions below ...

Solutions = first, you must file by mail the DCMA to the search engines, upon confirmation from each search engine, then file again via e-mail and mail to the web host, and their upstream ( most likely a reseller is the host ).

Now since you most likely don't have a federal copyright, you can not sue for damages, if you are like me, you will.

now also make sure that every search engine and PPC engine also knows that their is a DMCA complaint against this site.

at this point ( about 1 month later ) they surely have been regretting that they contacted you.

Also note:
before you do anything, have everything ( all legal, domain name addresses..... routed to your post office box, why? some scrapers will come knocking at your door, and, unlike most folks, I live in a guarded gated community with a high level of security. And if they try something at the post office it's a federal offense.

incrediBILL

9:37 pm on Mar 4, 2007 (gmt 0)

Small sites simply don't have the manpower to issue DMCAS to every site

That's not entirely true, I've sent a bunch of them, it depends on your motivation and I'm very motivated to keep bogus competitors away from my money. However, it pays to be selective and only send notices to the sites directly endangering your position that have cracked the top 100 results and ignore those that aren't visible in the SERPs until they bubble up, if ever.

The best part about the amateur scrapers is the churn rate as many sites vanish within a couple of months before they get any serious traction.

kidder

10:01 pm on Mar 4, 2007 (gmt 0)

Maybe they need a type of function in webmaster tools - a unique code or something that matches the URL & page content when new pages are submitted. Blocks of original content could be registered in this way as being unique to the domain?

davidof

10:10 pm on Mar 4, 2007 (gmt 0)

> Google's algorithm should be able to find out which site had the content first

Not as easy as it sounds. Sometimes clone sites get visited more frequently by the googlebot because they have a better ranking with Google than the originating sites. This is a problem I have had with copying where the copying sites are big media outlets with good PR so my pages end up as supplemental results.

nippi

10:39 pm on Mar 4, 2007 (gmt 0)

Reply to Marcia

“Oh, come on now! Upon what kind of factual basis are you basing that assumption?”

I manage 35 of my own sites. Triple that at work. I've never had a site drop in rankings, just because some other site copied the content.

I've tried and tried to find this as a reason, only time and time again to find the reason more likely, a technical problem within my own site(s). Accidental duplicate content within a site, bad html(eg somehow deleting <body> tag., bad 404 pages etc.

Until then, 7 years of SEO tells me, that I've never once seen copied content appear above original content, unless.

(1) Copied content is improved and site has more links.
(2) Original site has problems.
(3) Original content "improved" or changed, and thus no longer duplicate.

Yes, I agree if content is stolen, and rewritten, you have a problem, but should not cause your page to be removed, that happens for another reason.

[edited by: tedster at 5:56 pm (utc) on Mar. 5, 2007]

the_nerd

4:26 pm on Mar 5, 2007 (gmt 0)

macman,

did you check if the plagiators get spidered more often than your own site - this way making your own content look old? Some quality sites with rather stable content get a spider visit only every month or so. Others that make sure they get spidered every day might be on the index weeks before yourself and "stake the clain".

You'll find a "few" hints here on WW about growing Gs appetite for more pages.

nerd

macman23

7:30 pm on Mar 5, 2007 (gmt 0)

Nerd,

That's a good idea, but how do I check how often the other sites get spidered?

My site gets crawled at least once a week, but the pages that are copied only get crawled about once a month. Is there any way to speed this up? (The Crawl Rate option in Google's Webmaster Tools is greyed out).

Quadrille

5:03 am on Mar 6, 2007 (gmt 0)

Google is NOT the netpolice, and neither knows nor cares who owns the copyright of a piece of content - not necessarily the one with the earliest dated filename.

Go for the thieves; they are the ones who stole your content.

JeffOstroff

4:40 am on Mar 10, 2007 (gmt 0)

Hey guys, I already went through this last year with much success of removing the scraper sites off the net with DMCAs, then using Google's URL removal tool to get the sites out of Google's index.

I removed hundreds of sites over a month long period, and got our ranking from the 500-900 range back onto the first page of Google SERP where we had once been. Learn quickly how to accomplish this by reading the 2 topics I posted my info in:

[webmasterworld.com...]

This topic here is where I outline your step by step recipe for dealing with the scraper sites and obliterating them:

[webmasterworld.com...]

The basic steps are:

1) Get the offending site shut down with DMCA notice to webmaster
2) Once site is down and displaying 404 Page Not Found, you immediately submit that URL to Google's Urgent URL removal tool, and 2 days later the scraper site is out of Google's index for a minimum of 6 months. the beauty of my trick is that even if the scraper re-submits his URL to Google, it is kept out for at least 6 months.

You must wait until the offending site is 404 before submitting to the URL removal tool, otherwise Google will not remove the URL from their index. Do not wait too long, otherwise the offender may pop the site back up on a new server. Unles shis server is responding with a 404, the removal of his site from Google's index will not occur.

Hope this helps!

[edited by: JeffOstroff at 4:43 am (utc) on Mar. 10, 2007]

Marcia

5:25 am on Mar 10, 2007 (gmt 0)

OK, let's make sure this was clear enough:

Reply to Marcia
“Oh, come on now! Upon what kind of factual basis are you basing that assumption?”
I manage 35 of my own sites. Triple that at work. I've never had a site drop in rankings, just because some other site copied the content.

To repeat: The site has NOT dropped in rankings, not one bit.

I've tried and tried to find this as a reason, only time and time again to find the reason more likely, a technical problem within my own site(s). Accidental duplicate content within a site, bad html(eg somehow deleting <body> tag., bad 404 pages etc.

1. No technical problems on the site
2. Absolutely not one speck of duplicate content within the pages of the site.
3. No 404s and no bad HTML. The site validates.

Until then, 7 years of SEO tells me, that I've never once seen copied content appear above original content, unless.
(1) Copied content is improved and site has more links.
(2) Original site has problems.
(3) Original content "improved" or changed, and thus no longer duplicate.

After 8 years of SEO, I personally haven't seen it all, nor have I seen all the details on all sites out there. Just to be perfectly clear, if I wasn't before:

{1) The copied content was not improved one iota (one was an identical copy of a page that the site owner PAID someone for, for "outsourced content development"), and no, they do not have more links, they have far fewer - nor do they have an ODP listing.
{2) The original site has no technical problems at all. Didn't, and still doesn't.
{3) No improvement or change - it's an identical character string of 6 words that was an exact duplicate on both, which was what was used to find them.

Yes, I agree if content is stolen, and rewritten, you have a problem, but should not cause your page to be removed, that happens for another reason.

To emphasize clarity: My page was not removed or penalized in any way, still ranking just fine in the top ten. No ranking changes, and the other sites are not ranking for anything.

The other sites WERE ranking above the original for the specific 6-word test search string in quotes that was affected.

JeffOstroff

5:34 am on Mar 10, 2007 (gmt 0)

Well then if scraper sites duplicating your site is not a problem, explain how thousands of scraper sites can show up in Google AHEAD of your site, with information stolen off your web page. Many times the stolen text is not even on their page, they are using PHPs to feed Google's crawler when they come crawling. This tricks Google into thinking that your entire page is really on the scraper site.

Maybe you have never seen stolen content from your site ona scraper site. Maybe your site was not one that hundreds of scraper sites took content from.

I think when you have a big enough attack of content scraped form your site, then it becomes a duplicate content issue.

Just yesterday we sent Google a DMCA asking them to remove a mini net we found of 860 cookie cutter Made For Adsense pages (all the exact same layout) that stole our description tag last year, and are now showing up in Google with it. Tell me that's not a problem.

If you say duplicate content is a problem from page to page on your site, then it surely makes sense that it would be a problem across the Google index from other URLS as well.

So unless you are a big site like CNN or on Google's white list, then duplicate content from multiple scraper sites surely is a problem for you. I've seen it, I've been the victim of it, I've engineered my own successful tools to combat it, I've emerged victorious in the past, I've published my reports on how to do it, and I'm living proof that scraped content from your site is a duplicate issue for your site.

[edited by: JeffOstroff at 5:39 am (utc) on Mar. 10, 2007]

Marcia

5:44 am on Mar 10, 2007 (gmt 0)

I do believe a ton of scraper duplications can hit sites badly with Google - particularly bad for newer sites that don't have a strong IBL profile yet. Those can be doomed going out the gate.

It almost seems like nowadays 2 sites should be put up - one for the others, for which ranking at Google or not won't matter, and one for Google, excluding the bots from the other engines so they won't be found and scraped. It's very tempting to give it a try to test, as a matter of fact.

[edited by: Marcia at 5:45 am (utc) on Mar. 10, 2007]

CainIV

7:20 am on Mar 10, 2007 (gmt 0)

Something that I have noticed that helps is to use absolute linking in all of your content. Link to the same article from within the same article with a non optimized link. Then use some optimized links to other articles in your website.

Most of the time the content is being stolen and auto generated on the para-site, and most of the time the links to you are picked up. I have found that much of the time when there are links to you in a copied article it queues Google to understanding that the article is yours as any copies are all linking to you.

This help help of course when someone manually copies and pastes your content but that happens alot less IMHO.

Quadrille

10:14 am on Mar 10, 2007 (gmt 0)

It almost seems like nowadays 2 sites should be put up - one for the others, for which ranking at Google or not won't matter, and one for Google, excluding the bots from the other engines so they won't be found and scraped. It's very tempting to give it a try to test, as a matter of fact.

While I acept that scrapers can be a worry, I'd never see SEO suicide as a viable solution. Deal with the thieves, don't give up!

Martin40

5:26 pm on Mar 10, 2007 (gmt 0)

The general feel is that this happens because your pages are under a demotion or penalty. To me this is a bigger underlying problem than being under a penalty.

Soapystar, could you elaborate, please? What could be a bigger problem than being under a penalty?

soapystar

5:47 pm on Mar 10, 2007 (gmt 0)

i just meant its an issue with the way google is handling snippets of duplicate content..or identifying it....the problem is not that your page/site is penalised to the point its scoring isnt even on a level with a scraper page....the number of sites that rank below its own scraped text even on a quote search is growing continually....makes me wonder if there isnt a real core issue...

Quadrille

6:28 pm on Mar 10, 2007 (gmt 0)

If you read these threads regularly, you'll know that there's no pattern to which pages get 'accepted' and which become 'supplementary' due to duplicate issues.

How can Google possibly know who owns the copyright?

The real pain is when a better site than yours steals your content, and while neither are 'dropped', the thief ranks higher than you.

A formal complaint is the only answer. But it is a wake-up call; YOU know the site stole your stuff, therefore you came first - and yet they ranked better. So you clearly have a design / link / other SEO problem that needs sorting ... the thief may have done you a favour!

[edited by: Quadrille at 6:30 pm (utc) on Mar. 10, 2007]

Marcia

6:50 pm on Mar 10, 2007 (gmt 0)

Notice the parts in this patent that talk about freshness as a factor

Information retrieval based on historical data [appft1.uspto.gov]

There's also another about freshness.

soapystar

7:21 pm on Mar 10, 2007 (gmt 0)

How can Google possibly know who owns the copyright?

clearly Google cant. However determining the original site for a line, paragraph, page of text, is totally different and at its most basic is the first cached date. It's a fundamental point really since Google wants to encourage unique content. If you get to a point where there its widespread for original content sites to rank below their own scraped content its contrary to everything Google is apparently striding to achieve.

Quadrille

10:58 pm on Mar 10, 2007 (gmt 0)

But the "first" page is not necessarily the copyright owner, who may have published somewhere and moved, or maybe never published on the web at all.

For example, I published most of my stuff on 'free sites' before 1999; even later than that, much of my stuff has moved, as I've spread out onto more domains. Someone who copied my stuff in 1998 could all too easily, on web dates, claim to have got there first. I suspect others have similar stories. Though not all may have so obsessively kept records and dated backups.

Upload file dates, cache dates, stated dates are not a reliable guide to creation dates, and Google would be in greater trouble if they tried to stick to a date; currently, they ignore dates - which, legally, is the only safe route for them.

It is legally and physically impossible for Google to do it; and it's a little worrying to hear calls for Google to be the net police - usually people are saying "Hands Off Google".

[edited by: Quadrille at 11:02 pm (utc) on Mar. 10, 2007]

soapystar

11:11 pm on Mar 10, 2007 (gmt 0)

as i tried to say, its nothing to with copyright, its about original content. yes im sure theres many ways you can point out how circumstances will make this an inexact science. However as with the algo, its about what is correct, or best, most of the time. In most circumstances pinning first cache to a site will accurately label the original content writers. I also pointed out this was at the most basic level.

If you think a serps where writing original content gives that page a boost over a page scraping it for that content is 'worrying', then i am somewhat perplexed.

Marcia

11:25 pm on Mar 10, 2007 (gmt 0)

Agree 100% with soapystar, this has nothing whatsoever to do with copyright issues. In addition, if someone publishes one time and moves on, they are still legally the copyright owner as the original creator of the content.

If there's anything at all to "historical data" then yes, first cache should be an indicator of some sort, though some opt not to allow caching, which is perfecty legitimate in many, many cases and for very good reasons.

Quadrille

11:42 pm on Mar 10, 2007 (gmt 0)

I think we're going around in circles. Google cannot favour 'first publisher', as they risk dis-favouring copyright owners. That may not matter to you - but it has to matter to Google.

I do not think - and did not say - a serps where writing original content gives that page a boost over a page scraping it for that content is 'worrying', I said your call for Google to be the net police was worrying <snip>

I don't like scrapers any more than you, I just believe there are ways to deal with it - as many have outlined above. But you seem to want it done for you - by Google. That ain't about to happen. Can you imagine the screams here if Google suddenly claimed to have the right to police these things?

[edited by: trillianjedi at 11:41 am (utc) on Mar. 11, 2007]
[edit reason] See Sticky [/edit]

tedster

12:21 am on Mar 11, 2007 (gmt 0)

Several Google reps have often mentioned that the "link back" to the original source is one of the strongest signals they use.

Having such a link in the body copy seems like one good practice to fight content theft, especially theft of the automated kind. That's not something I tend to do, but I'm starting to adapt. Maybe it's time to place a "permalink" on every page, whether it's a blog or not!

sailorjwd

12:30 am on Mar 11, 2007 (gmt 0)

Yikes.

Occasionally I 'borrow' product descriptions from the manufacturer when doing comparisons. I always link back to the manufacturer... perhaps that is killing me?

I thought it the right thing to do.

This 70 message thread spans 3 pages: 70