homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 50 message thread spans 2 pages: < < 50 ( 1 [2]     
Should Google Tank the Crowd Sourced Content Scrapers?

 2:20 pm on Sep 5, 2012 (gmt 0)

We know that Google looks unkindly on scrapers. This duplicated content competes against the individuals/groups creating this content, who often feature adverts from their AdSense network. Sinking these sites to the bottom of the SERPs helps everyone. It helps Google offer credible results, people to find the originators of the content, and keeps the parasites out of the game.

Should the newest parasites, crowd sourced content scrapers, be similarly halted in their tracks before it’s too late?

Crowdsourced content scrapers (Pinterest.com, Weheartit.com, Loveit.com, Ehow.com/spark) are experiencing a surge in popularity this year. Pinterest, in particular, is increasingly throwing its weight around in the SERPs.

The overwhelming majority of the content of these websites is an infringement on someone’s copyright; rare are the people posting original content on, say, Pinterest, where original content may not even amount to 1%, sitewide. For many, Pinterest results in the SERPs are a nuisance, a mere extra step to get to the source website; that is, if the source website is credited appropriately, and not mis-attributed to Tumblr, Yahoo Images, or Pinterest itself. Most people “googling” something want some text, not just pictures and a misleading link.

These crowdsourced content scrapers all have NOFOLLOW outbound links, except for Loveit.com, who might have to shut the door once spammers begin to exploit it. These links are of very little help to the authors whose content is scraped in the SERPs.

Typically, content is scraped via a button that the users install in their bookmark bar, making scraping third party content a breezy, effortless,one-click affair.

Most of the scrapers will create a page with the URL template contentscraper.com/source/yourwebsite.com that often ranks quite highly for your keywords, and your domain name in the search query. Some visitors may prefer to view on content on Pinterest, and not visit your link in the SERPs.

Early on, some webmasters were hyping miraculous referral traffic volume from these scrapers. Lately, there are reports indicating that rather than leaving the confines of the scraper to follow links to the source, scraper visitors tend to remain on the scraper. (http://adweek.com/news/technology/buzzfeed-report-publishing-partners-demonstrates-power-social-web-143194)

A minority of crowdsourced content scrapers offer unique, proprietary opt-out mechanisms.

<meta name="LoveIt" content="nolove">
<meta name="ehow" content="noclip" />
<meta name="pinterest" content="nopin" />

The proliferation of these tags forces content providers into constant vigilance in monitoring new opt-out codes as they arise, and constantly update their websites accordingly. Notably, these aren’t sitewide htaccess commands, they need to be added to every single web page. Not everyone has dynamic content!

There are ways of course to figure out tricks to block these crowdsourced scrapers with htaccess, or substitute the scraped image for a copyright warning, but Ehow’s Spark grabs a screenshot of the browser display (stealing both images and text) and is the ultimate stealth scraper. The act of someone scraper your content with Ehow's bookmark tool is undetectable in web logs, and therefore unstoppable in htaccess.

DMCA take down notices, which were once practical for against conventional scraper, are obsolete against the army of crowdsourced content scrapers, whose users scrape content feverishly, and round the clock.

Should Google level the playing field and severely penalize these crowdsourced, copyright infringement and duplicated content machines?

Or should Google allow them to rise into greater prominence, as they might under current algos?



 5:50 pm on Sep 26, 2012 (gmt 0)

An article that might be of interest to this topic:


What I couldn’t help noticing as I worked my way through the results, my photo hosted on my site was the dead last that Google showed in its results.


 6:51 pm on Sep 26, 2012 (gmt 0)

An article that might be of interest to this topic:

I can agree 100% with the author of the article. Google should/could do better. I came across a website that copied more than 400 images from my main site. A domain name registered in june and ranking like crazy in image search ... with my images. Fresh content for Google and worth getting higher rankings than established 10 year old websites? These days it seems extremely easy to rank with stolen content since it is "fresh" for Google, even when you can find the same thing somewhere else.

Also I can't understand how this can happen: Often I find hotlinked images in image search (my image on my server, embedded in an infringer's website). When I click on the thumbnail in image search I end up on the webpage of the hotlinker and not on my website/domain where the original picture is.


 7:17 pm on Sep 26, 2012 (gmt 0)

As I posted in another thread here [webmasterworld.com] Google has no trouble at all attributing images when they belong to Getty or one of the commercial image banks, their images don't find their way into Google image search..and they don't turn up as "found on the web" in Google's carousel..

That would get Google into lawsuits with the "big dogs"..they can correctly attribute images to their creators..or to the true copyright holders..they choose not to do so for the those of us who they know, do not have powerful copyright lawyers on retainer..


 10:59 pm on Sep 26, 2012 (gmt 0)

In a perfect world, Google would rank the sites that contribute the work to the crowdsourced content above, or close-to, the crowdsourced content. Unfortunately, that means that people would try and gain entry to the crowdsourced content to improve their rank.

Google needs to do better than to screw everyone in order to keep the bad guys out though. Being legitimately and organically cited via a crowdsourced scraper should be worth something to the source.


 11:16 pm on Sep 26, 2012 (gmt 0)

contribute ?
legitimately ?

Those words don't mean what you think they mean..

To "contribute" the originators would have to be willing, or to accept this crowd sourced scraping, most of them don't even know that they have been "human scraped"..and would not/ do not agree to it..

There is no way the "crowd sourcer", or the site that they are sourcing the human scraped content onto is acting "legitimately",( if they are not the creator or the copyright holder of the content ) they are both breaking many civil IP laws and in some countries IP abuse ( using copyright images without written permission from the copyright holder ) is also in breach of criminal laws..

Scrapers both manual crowd sourced ( ehow , pinterest etc ) and automatic scripted versions, should not be in SERPs, they should be removed manually and via algos, just like childpron etc..

They should not have adsense running on them, Google should not make deals with them..if Google cut the adsense from ehow it would fold in 6 months..If pinterest and the clones did not have lawyers on retainer to stall complaints in the courts and the DMCA ( "we took it down fast even though we knew it was copyright and should never have allowed it up , but that is our business model, crowd sourced IP abuse" ) to hide behind..they would not even have started..

None of them have started, in Countries whose legal systems take the IP rights of the little people as seriously as they do IP rights of the big corps , they have all started in the country that ignores the IP rights of the little people and thinks only of the profits of the mega corps and their VC backers..


 1:13 am on Sep 27, 2012 (gmt 0)

Being legitimately and organically cited via a crowdsourced scraper should be worth something to the source.

Except that crowdscrapers use nofollow links... so there's that.

And then, in my opinion, search engines need to have a list of the most popular crowdscrapers and let them rot in the SERPs' gutters.


 5:27 am on Sep 27, 2012 (gmt 0)

Scrapers both manual crowd sourced ( ehow , pinterest etc ) and automatic scripted versions, should not be in SERPs, they should be removed manually and via algos, just like childpron etc..

This is the best statement I read on this forum in years.


 12:49 pm on Sep 27, 2012 (gmt 0)

My site is retail so our products end up all over these types of sites (pinterest, kaboodle, wanelo, etc.) only this year we've been noticing that these sites are outranking us for our own products and descriptions we wrote ourselves. Not sure if this is a widespread problem that everyone is seeing or if this is an indication of a problem with our site.

I just don't understand how another site can outrank us.

this may or may not be relevant to the conversation as it is not necessarily about "scraped content" rather we've noticed a lot of sites popping up in the results this year that are allowing 'average joe anybodys' create products simply by uploading designs (could be t shirts, hoodies, stickers, iphone cases, tote bags, coffee mugs, general merchandise items). Some of these sites are big guys too: like zazzle and cafepress, but there are new guys like redbubble, skreened, etc) these sites are allowing anything including stolen licensed images. these sites are becoming so popular due to user engagement (because everyone wants to make $3 at a time for doing almost nothing and none of the legwork)and now these sites are outranking us for some of the licenses they are stealing from.

DMCA complaints don't do much because its a drop in the bucket for the license holder, but a huge hit for a distributor like us.

sorry if i got off topic, but were are being heavily effected in rankings by both of these contenders right now and they both seem highly illegitimate.


 1:24 pm on Sep 27, 2012 (gmt 0)

My site is retail so our products end up all over these types of sites (pinterest, kaboodle, wanelo, etc.) only this year we've been noticing that these sites are outranking us for our own products and descriptions we wrote ourselves. Not sure if this is a widespread problem that everyone is seeing or if this is an indication of a problem with our site.

I'm seeing this too. The problem is that it creates an extra step between you and your potential customer. Getting outranked with your own descriptions and creating an extra gap between you and customers will not improve business. More and more online retailers are posting their products on such sites but might be in for an unpleasant surprise when those pages outrank the original.


 3:03 pm on Sep 27, 2012 (gmt 0)


I am a member of both RedBubble and Zazzle. Yes there is some copyright/trademark infringement by "designers" who primary products are t-shirts, ect. however those sites have a large number of photographer and artist who use only their original work. These artist are hurt even more then most by the unauthorized use of their images which is their product.


 4:46 pm on Sep 27, 2012 (gmt 0)


I agree with you that photographers are also hurt by this. I dont think that these sites should not exist or even be panalized, I do however think they are being ranked very unfairly due to high user engagement. There is no reason in the world why an official license, hmm, lets say for arguments sake, Nike or Adidas, should be so prominent on one of these websites that they outrank Nike or Adidas or any authorized sellers of that merchandise. Its a bit out of control i think.

These sites were intended for someone like a graphic designer or a photographer to have a platform to sell their work on general merchandise items without having to do the legwork. For that, they are doing a great job, but it shouldn't be a ranking machine for stolen work.

Regardless, I dont imagine Google is going to be taking down zazzle or redbubble any time soon.

I believe the focus should be on places like Pinterest, Wanelo, etc for outranking the shops they are taking the images, titles and descriptions from. I say let the Penalties and rank drops commence!


 5:11 pm on Oct 1, 2012 (gmt 0)

Great, another Pinterest board just jumped in front of me in the SERPS for an important search term. The joke is that this board is showing a number of my photos that were stolen from my site. Thus, Pinterest is linking to the website of the copyright violator and not to the website of the creator, me.

Will Google ever take care of this Pinterest mess?


 5:47 pm on Oct 2, 2012 (gmt 0)

chrisv1963>> are you seeing the pinterest boards appearing ahead of your site EVEN ehn you type in your domain name as a part of your search query?

That's what we are seeing. could be because the set up of our titles includes our company name as in "{COMPANY NAME} - Product title" therefore, everything on pinterest from our stie actually says our company name as well..

And yes.... WHEN will google take care of this mess? Wanelo is another big offender for us as well, same type of site as pinterest.. and believe me, theres more where these two came from..


 4:35 am on Oct 12, 2012 (gmt 0)

It's starting to happen to me as well.

I'm looking up things I know are on my website with my domain name in the query, and a very specific keyword.

The SERPs deliver 4-5 Pinterest pages before getting to my content.

They have become a genuine annoyance in searches. Unlike Wikipedia, which I often want to come up first, Pinterest results are completely useless images taken out of context. When would anyone want Pinterest pages to come up first? I suspect NEVER.

Highly-ranking Pinterest pages diminish the efficiency of search engines and increase user frustration.

Remember AltaVista, who died under the crushing weight of p0rn spam? Pinterest is starting to feel annoyingly heavy in the SERPs.


 1:59 pm on Oct 22, 2012 (gmt 0)


Today, I query Google for a particular URL string, like this example:


Notice how there is no mention of "Pinterest" in my query, at all.

The first 6 results are from Pinterest! I can't figure out for the life of me how I'm getting Pinterest pages in those SERPs.

Pinterest truly is to Google what pornospam was to AltaVista. I see growing evidence of this every day.


 2:39 pm on Oct 22, 2012 (gmt 0)

I believe this thread was intended to be about whether these sites should be RANKING, rather than whether what they do is legal/ethical/right. The ranking question is much less debatable: there's just no reason non-original content should ever outrank original content.

Also, the high rankings create that feedback loop: as long as you can get noticed or make big money by throwing together UCG/scraped stuff, of course people are going to do it. But if they didn't rank so well, these sites might rethink the need for them to bring some added value to that content, or find a new business model altogether.

Either way, if these sites didn't rank so well, it would be better for original content producers. Because even if you like the traffic Pinterest sends, for example, you don't want them outranking you on your own topics.

On a side note, I'd be interested in a thread advising people how to make create watermarks. I know it sounds silly, but many of us got into content writing to content write, and only later realized we needed photographs to make our articles more visually exciting, and only later realized what a can of worms using photographs opened up, so we're not experts on Photoshop or whatever tools people use to create watermarks. :)


 5:30 pm on Oct 22, 2012 (gmt 0)

there's just no reason non-original content should ever outrank original content.

In most situations yes, but I'd argue not all.

For example, if a tabloid newspaper posted a story talking about an aeroplane discovered on the moon, would you believe it? Now if New Scientist Magazine posted the same article, would you be more likely to believe it? If you were to then go and report this on your website, you'd probably quote the latter news source rather than the tabloid for credibility.

My point being that trust (and by extension, authority) are central considerations and cloud the issue when it comes to prioritising similar content and there's a case for Google showing the article from the more trusted source at the top of the pile, irrespective of who came up with the original story.


 6:02 pm on Oct 22, 2012 (gmt 0)

Simsi you make a good point.

However, I would argue that in the case of easy-to-identify corwdscrapers like Pinterest, Pinterest should have zero authority.

The original websites usually have context around the picture that add to their credibility. Pinterest usually has nothing.


 7:00 pm on Oct 22, 2012 (gmt 0)

Helleborine: agreed. And most crowd-sourced scrapers are no more deserving. I believe they should be devalued/removed but I also think it's not quite as easy as that, as *some* of them might actually have the trust of/value to a minority.


 4:08 pm on Oct 23, 2012 (gmt 0)

Yeah to what both of you said there. :)

But it is a good point, Simsi, and I'll keep it in mind when trying to figure out who got priority listing when there's not a crowdsourced scraper in the mix.

This 50 message thread spans 2 pages: < < 50 ( 1 [2]
Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved