homepage Welcome to WebmasterWorld Guest from 54.198.94.76
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
What options do you have to address 1,000s of scrapers?
Lenny2




msg:4396091
 6:08 pm on Dec 9, 2011 (gmt 0)

You are panadalized due to large numbers of product pages being scraped... your unique content all over the web. What options do you have?

Asking all webmasters to take down the content isn't an option, were talking about 1000's of pages.

What do you do?

 

nektotigra




msg:4396151
 9:28 pm on Dec 9, 2011 (gmt 0)

File DMCA requests with Google.

proboscis




msg:4396157
 9:34 pm on Dec 9, 2011 (gmt 0)

I just recently started doing dmca requests again, it's so much easier than it used to be.

If you have multiple copies of one page you can list all the urls that copy that page in one box and you can do that ten times with each request.

Spend a full day doing that...hopefully someone else can come up with something better but sometimes there isn't an easier way...

dstiles




msg:4396184
 11:37 pm on Dec 9, 2011 (gmt 0)

Stop the scrapers before you do anything else! otherwise they will all come back again.

Lenny2




msg:4396224
 2:49 am on Dec 10, 2011 (gmt 0)

what about scrapers that are providing links back to the content? DMCA request for that too? or let that one go?

incrediBILL




msg:4396230
 3:07 am on Dec 10, 2011 (gmt 0)

Scrapers are binary, they are either BLOCKED or NOT, there is no middle ground.

Besides, why would you want bad hoods linking to you?

If Google thinks they're shady your site could be guilty by association, BLOCK!

tangor




msg:4396231
 3:08 am on Dec 10, 2011 (gmt 0)

What do you do?


One of the great questions, and one which has many answers, few of which seem to actually work.

One of my sites which has interest for scapers is generally left alone... I have it set up on old style FRAMES (not IFRAMES) and that's just a bit more work for the suck it quick scrapers. I do not recommend anyone change their site, though I will say that both Bing and G can read FRAMES just fine. That site averages about 1M real human hits a year... might not be enough for an eCommerce site, but does okay for that niche.

Meanwhile, set aside an hour each day to DMCA... as high up the chain as possible to reduce the number of steps toward satisfaction.

Also, spend a little time in the Spider Identification forum [webmasterworld.com...] to determine where the bad actors reside and how to put the kibosh on them. There you can also learn about whitelisting robots.txt to take some of the heavy burden off, and how to block countries by ip.

There will always be those who take what others create, which is one reason why allcountries have some kind of Patent and Copyright laws, else the creators will chuck it in, with good reason.

incrediBILL




msg:4396268
 5:07 am on Dec 10, 2011 (gmt 0)

I have it set up on old style FRAMES (not IFRAMES) and that's just a bit more work for the suck it quick scrapers.


I'm not sure why you think FRAMES are hard as I recently wrote a crawler and it rips through frames like warm butter. I never understood why SEs had a hard time with frames, it's quite silly really, and IMO just as easy to scrape as anything else.

Frames aren't an anti-scraper strategy at any level, there must be something else about your site they don't like ;)

tangor




msg:4396276
 6:44 am on Dec 10, 2011 (gmt 0)

Didn't say it was hard. Just a remark that it is different. check all the historicals here at WW wherein Frames are indicated as anathema to search engines (suggesting scrappers don't like them, too). Just reporting that particular site does not require the same DMCA as all the others, by 90%.

I don't think WHY, just reporting FACTS in this particular case. As for the scrapers, they are there and dealt the occasional DMCA, just happens to be a very small, narrow, generally useless sort of niche. :)

Planet13




msg:4396404
 5:32 pm on Dec 10, 2011 (gmt 0)

what about scrapers that are providing links back to the content? DMCA request for that too? or let that one go?


I recently filed DMCA requests against 3 sites that had paragraphs of my content on them yet, were all linking to the same widgets.html page on my site.

One was a yahoo answers page, one was an amazon.com page, the other was an ecommerce site in the UK (I am in the USA).

Whether related or not, that page increased in rankings and traffic within a week or so of the DMCA's being processed.

Of course, I have NO insight as to whether that was the cause for the rankings and traffic increase or not.

I hope this helps. I would definitely see if you are getting TRAFFIC from those pages that scraped your content and link back to you. If you get significant traffic, I MIGHT think about leaving them be...

Elsmarc




msg:4396470
 10:41 pm on Dec 10, 2011 (gmt 0)

Typically if the content "scraped" is less than 120 words and there's a link to the original source, I *think* it would be hard to get a DCMA. I might be wrong, and if I am I'm sure someone here will correct me. As an additional thought - Personally I would contact the site hosting the material and request removal before I would report it to Google. Stuff from my sites is scraped quite often, but personally I don't care. But that's just me. My sites do well and I'm not interested in taking the time and effort to try to get scraped material removed from every site when I can be spending the time on my sites improving them. Scrapers eventually go away and die. My sites have been around for many years. My "main" site has been online for 15 years.

brinked




msg:4396523
 3:36 am on Dec 11, 2011 (gmt 0)

This happens all too frequently. People tend to think that they are being outranked by sites that scrape their content.

While I am the first to fault google, they do tend to do a pretty good job at recognizing the content source. The reason the site is suffering is likely not due to the fact that its content is being copied by other sources, but by an underlying issue that google is seeing.

Putting the blame on sites copying your content is easy to do but it is not always the reason why you're being outranked. Google may know that you are the source of the content, but maybe it sees your content as being hard to get to or maybe it sees your site as a whole spammy. Usually with issues just like yours, I notice the webmaster is in some sort of over optimization penalty.

I do feel it is wrong for google to rank scraper sites ahead of the source, regardless of what penalty they feel the originator deserves, I think the original source should always get favored over the scrapers.

santapaws




msg:4396588
 9:37 am on Dec 11, 2011 (gmt 0)

too many examples of a page coming back right after the removal of the scraped material for this to be a coincidence.

Lenny2




msg:4396656
 5:00 pm on Dec 11, 2011 (gmt 0)

I'm in agreement with Santa here. In fact I don't know why I didn't put 2 and 2 together before... but, I complained to google about a scraped page outranking my original content page and a few months later that page is appearing in the serps (where it used to be pre-panda).

enigma1




msg:4396672
 6:24 pm on Dec 11, 2011 (gmt 0)

You can put some code to guarantee using your application the main spiders index the content before anybody else. The problem is scrappers can monitor and fetch your content way faster than spiders.

There are various ways doing this without getting implicated with cloaking. For example publishing via the SEs sitemap, once the page is indexed you update the site navigation so the page can be found by everyone.

If somebody else fetches your content and publishes it first, before SEs, you have a very hard time convincing anyone its yours unfortunately.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved