homepage Welcome to WebmasterWorld Guest from 54.166.255.168
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque

Webmaster General Forum

    
Scrapers Just Want Content to Pad their Pages
aristotle

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



 
Msg#: 4399192 posted 5:45 pm on Dec 18, 2011 (gmt 0)

I've just been checking some of the backlinks that Google Webmaster Tools shows for one of my sites. You wouldn't think that scrapers of your content would link back to your site, but apparently some of them do. But mainly what I noticed is that they're just using my content to "pad" their pages to make them bigger. They combine it with content they've scraped from other sites, and insert a little original content of their own at the top. The result is a page full of jumbled rambling content of little value. I know scrapers have been doing this for a long time, but it seems to be getting worse.

 

DeeCee



 
Msg#: 4399192 posted 9:06 pm on Dec 18, 2011 (gmt 0)

I think it is the "new" way to avoid Google's Panda update.

By combining a lot of content from different sites in smaller chunks, they get normal "English" text (as opposed to what normally looks like machine made dictionary non-sense which we used to see). Often merely chunks stolen off RSS published summaries.

Google's Panda update was supposed to fix and punish the duplicate content scrapers, but is likely easily avoided by taking small chunks from multiple sites, and merging it all, making it harder for a machine to compare pages across the net.

In my opinion, this is why after Panda we saw even more sites ranking higher than original content.

a) The scraper sites have much new content every day (obvious stolen), and
b) By keyword density, it all looks non-duplicate (and ad-relevant) to a machine. Even though to a human trying to read the overall "article", it appears to be messy non-sense.

I mentioned the same in another post the other day, where content from this webmasterworld forum site has been merged into a bogus SEO blog (domain very notable as 'cagey media'). :)
They also added a (no-followed) link to webmasterworld, probably to attempt to keep it legal.

Or maybe just a way to taunt the real owners. Sort of: "Ha, we knew you would find us eventually, but we merely took a fair-use part, so we do not care". The "chunk-size excerpts" for some of them it is likely an attempt to stay under "fair-use" under US copyright laws. A DMCA request immidiately gets a site excluded from Google results, waiting for resolution. When a page consists of many small chunks from many different sites, who then reports the DMCA complaint (and would they bother), and how would Google and ChillingEffects resolve it?

timsoulo



 
Msg#: 4399192 posted 9:38 am on Dec 21, 2011 (gmt 0)

I think there's nothing we can do about it.. I'm sure Google is aware of this issue and they are working to fix it..

but once they do.. spammers will find a new way to game the system... and the battle won't end till Google obtains AI :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved