homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

How to deal with a scraper that outranks us since Panda update?

 3:16 pm on Mar 7, 2011 (gmt 0)

My apologies since English is not my first language…….

Ok, I don’t post much here but this last update has been very discouraging for us. We own an 11 years old website, all the content has been written by us or by members of our small community (some are PhD), we always been white hat. We have around 5000 pages of original content develop thru these years and learn that a lot of sites copy word by word our content but Google always knew we were the first we had posted and rank higher than copycats.

Unfortunately the Panda update has hurt us with the loss of 20%-30% of the traffic so after a little investigation we found that a scraper site is now above us in mostly every search we used to be on top.

As I said we learn these years to trust Google’s algo in keeping scrapers of our site down in the search results. Around one year ago we notice a website that has been scraping ALL our content and content from thousands of other sites and posting it as if the original author did it, we didn’t care that much cause they always ranked lower than us.

With Panda this scraper now RANKS HIGHER than us in nearly every search. It doesn’t matter that our original content has been posted for 5 or 6 years before the scraper’s. They take 3,4,5 portions of an article from us and labeled with different phrases from the text.
So, for a search “red and blue widgets” that used to be:

1.Our content
2.Others content
3.Others content

Now is:

2.scraper’s version 2
3.scraper’s version 3
4.our content

To be fair the only thing we can spot is that the scrapers layout is “maybe nicer” than ours, but never thought that this is a beauty contest, just a relevancy and usability contest.

I have filled out 4 spam reports at Webmaster Tools the last days. But these reports are only on a page by page basis, and the scraper have around 1.500.000 scraped pages from hundreds of legitimate websites.

This small rant is to know if I am alone in this? How efficient a spam report to Google is? What would you do if you were in a similar situation? Is somebody at Google reading this post?

Any input would be appreciate …

Thank you



 3:22 pm on Mar 7, 2011 (gmt 0)

If they're really copied the whole (or most of) the page I would file a DMCA complaint with the site and Google ... It's way more than a spam report.


 3:46 pm on Mar 7, 2011 (gmt 0)

What happens with a DMCA? Do they take off the page or the entire domain? I'm gonna snark here and say that I found spinners and scrapers who stole content from me that are now on article directories. Would a DMCA on them bring them down?


 6:09 pm on Mar 7, 2011 (gmt 0)

You'll have to look into all the 'what happens when' yourself and how far it goes and how much it applies to, but: I do know if you file a DMCA complaint with Google they'll remove the page(s) from the rankings. If they're at a regular hosting company, they'll usually take the site or pages down too, afaik.

Make sure you know you're covered though, iow speak to an attorney is advised, because it's a legal notice and you can have issues if you wrongly file one.


 6:37 pm on Mar 7, 2011 (gmt 0)

I Should Add: you might ask or have a look around for more DMCA information in the Content, Writing & Copyright [webmasterworld.com] forum. They can't give legal advice, but you may find more people who can speak from experience and give you a better idea of what to do and how things worked for them.


 7:09 pm on Mar 7, 2011 (gmt 0)

Thanks for the answers.

The majority of the our content is submitted by the authors that give us the permission to publish it (no money involved) so we are not entitled to act on behalf of the authors.

That is why I guess I can’t file for a DMCA
Also as I said the scrapers take a page and generate around 50 to 100 different versions by just showing excerpts in each one, sadly ranking higher than us. So we would have to file about 50.000 DMCAs at least.

I was hoping that Google could look into this the same way they been doing it the last years: to determine who have published the content first and rank them likewise.


 7:40 pm on Mar 7, 2011 (gmt 0)

The majority of the our content is submitted by the authors that give us the permission to publish it (no money involved) so we are not entitled to act on behalf of the authors.

That is why I guess I can’t file for a DMCA

If the authors have given you sole permission to publish them, then you can act as an agent for the author. I raise DMCA's all the time, and some are like this.

If the site has adsense, then click on the bottom right hand corner (where it says "ads by google") and report it via that. Go for the Jugula.


 2:55 pm on Mar 9, 2011 (gmt 0)

It's never been about who published it first, but about how strong your site is. If you can buy a trusted domain or do really good link building, you can easily rank copied content above the original no matter how old it is.

It sounds to me like Google just didn't like your site based on the farmer update. The scraper sites just didn't take a hit so now they're the ones ranking.

Send a C&D to the web host. That's about all you can do. Google doesn't make it their business to police copyright stuff. It's just not their responsibility, but it is the responsibility of the ISP hosting the copyrighted content to remove it when alerted.


 7:45 pm on Mar 9, 2011 (gmt 0)

The last two updates were aimed at scrapers (duplicators) and low quality (including copied content), so it may have been the case they were historically not concerned about it, but that's not the case any longer ... They are going after the copiers and regurgitaters, it may not be as easy to just copy what someone else has today as it once was and my guess is it will continue to get tougher moving forward.


 7:53 pm on Mar 9, 2011 (gmt 0)

They're trying, but the farmer update didn't really do anything to put a dent in scrapers. Content farms yes. They want to, but it's just really hard to do algorithmically.


 8:18 pm on Mar 9, 2011 (gmt 0)

I have issues with scrapers too, although my scrapers are usually "legitimate" sites such as newspaper and television websites. (And unlike most people, my policy is that you can take anything you want off the site as long as you attribute it to me)

You simply can't rely on any programming or algorithm to deal with the problem, you have to take it upon yourself.

So my first step in this situation would be to try to figure out how they're scraping the site, and whether there's anything I can do to make it more difficult. If they're just coming along and screenshotting pages, there's not a lot you can do about that. But on the plus side, that method doesn't scale very well.

Me, I'm a log file/analytics junkie. I'm always looking for anomalies and unusual behaviors.


 12:59 am on Mar 11, 2011 (gmt 0)

Netmeg has a point. Looking for 'anomalies' in the logs gets a lot of the scrapers. DCMA complaints work only on a small problem. As you said, you can't file 50,000 of them... only G can effectively solve this issue, but unfortunately, they haven''t been very concerned with it lately...


 1:02 am on Mar 11, 2011 (gmt 0)

I concentrate on the copies that outrank my pages. After filing a DMCA or getting the infringer to take it down through direct e-mail to the owner, use the public URL removal tool to tell Google that it's gone.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved