homepage Welcome to WebmasterWorld Guest from 54.145.243.51
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 55 message thread spans 2 pages: < < 55 ( 1 [2]     
Matt Cutts Announces Scraper Site Reporting Tool
viralvideowall



 
Msg#: 4355698 posted 10:29 pm on Aug 26, 2011 (gmt 0)

Looks like Google may FINALLY be getting a clue that scraper sites have become a problem. I'm sure the idiots there will come up with a new algorithm that punishes the wrong people like Panda did.

[twitter.com...]

@mattcutts Matt Cutts
Scrapers getting you down? Tell us about blog scrapers you see: [goo.gl...] We need datapoints for testing.


https://docs.google.com/spreadsheet/viewform?formkey=dGM4TXhIOFd3c1hZR2NHUDN1NmllU0E6MQ&ndplr=1

Google is testing algorithmic changes for scraper sites (especially blog scrapers). We are asking for examples, and may use data you submit to test and improve our algorithms.

This form does not perform a spam report or notice of copyright infringement. Use https://www.google.com/webmasters/tools/spamreport?hl=en&pli=1 to report spam or [google.com...] to report copyright complaints.

Exact query that shows a scraping problem, such as a scraper outranking the original page:

[edited by: Brett_Tabke at 1:11 pm (utc) on Aug 27, 2011]

 

outland88

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4355698 posted 12:45 am on Aug 28, 2011 (gmt 0)

As BW100 wisely ascertained there are multiple problems with the described formula. I always go back to the law which is establishing ďthe intent to copyĒ and the gold standard is copying verbatim. In all fairness though I havenít looked at the whole set of facts and it probably was spinning.

Interestingly I wrote a well detailed article that ranked very well for six months then vanished with Panda. It was perplexing. Lo and behold almost the exact article appeared in a major magazine and garnered the same ranking I once held. The author of it had contacted me about three months earlier for references for the research. I didnít reply to the e-mail because I didnít have time to do his work and mine and thatís what the mail was essentially about. Google though had listed my work as similar to theirs in the results but I no longer had a page in the results.

The gist being the larger brand had won out and with Panda it will likely stay that way. My article is number one in Bing and the copy doesnít appear there. As I said back in Nov-Dec Google is headed in the direction of professional journalism standards and brands. That likely takes precedence over indie and unknowns.

Sgt_Kickaxe

WebmasterWorld Senior Member sgt_kickaxe us a WebmasterWorld Top Contributor of All Time



 
Msg#: 4355698 posted 7:54 pm on Aug 28, 2011 (gmt 0)

Yay, another tool, but I'd fully expect that if you report a site Google will evaluate the site submitted as well as all of your sites at the same time, to gauge your intent if nothing else. My concern with that is that they, of course, get it wrong occasionally which may impact you unknowingly.

All of your data is belong to Google.

MediaGuy



 
Msg#: 4355698 posted 8:19 pm on Aug 28, 2011 (gmt 0)

You cant 'do' or penalize websites for scraping content - or you'd better sue every newspaper in the land bcos this is what they do every day eg: print the same stories.....

Its too big to police - the web is too big and I suspect Google know this and all this 'We'll fix the scrapers' is just for show and a giant publicity stunt.

buckworks

WebmasterWorld Administrator buckworks us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4355698 posted 9:20 pm on Aug 28, 2011 (gmt 0)

Many newspapers print the same stories because they have syndication agreements with the news agencies (e.g. Reuters, Associated Press, Canadian Press).

That is not at all the same thing as scraping, which is to reproduce someone else's content without permission.

MediaGuy



 
Msg#: 4355698 posted 10:05 pm on Aug 28, 2011 (gmt 0)

Many newspapers print the same stories because they have syndication agreements with the news agencies


Actually, many dont. Most operate totally independantly eg: Express, BBC, Telegraph, the Regionals etc and have their own jounalists who source stories sent to them directly by way of 'Send Us your story' forms via their websites.

so they dont need such relationships. I know this bcos I send in stories myself.

Also, the way the media works is by Journalists reading each other's stories and this is how the main stories are spread about between news sources.

jmccormac

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



 
Msg#: 4355698 posted 10:57 pm on Aug 28, 2011 (gmt 0)

@MediaGuy Perhaps that's what they teach in Media Studies but most newspapers depend on agency reporting as fillers. Articles being lifted by other journalists is quite a tradition and if you look at the Sunday newspapers today and some of the newspapers tomorrow, you will see some of the articles from the Sundays recycled. Real journalism is highly incestuous in terms of the way that stories will be recycled or lifted. Stories will often be reprinted with minimal changes - not unlike the "spinning" that happens with some websites. Unique content, just as with websites, is expensive and can take time to create. It can be a lot easier to use agency articles as fillers especially for foreign news.

Regards...jmcc

danijelzi



 
Msg#: 4355698 posted 11:18 pm on Aug 28, 2011 (gmt 0)

@mattcutts Matt Cutts
Scrapers getting you down?


Does it mean that being a victim of scrapers is actually a cause for pandalization, not a consequence?

frontpage

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4355698 posted 12:36 am on Aug 29, 2011 (gmt 0)

Are we allowed to report Google properties as well for scraping?

McMohan

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4355698 posted 9:55 am on Aug 29, 2011 (gmt 0)

Could Autoblogs be considered as scraping content?

rlange



 
Msg#: 4355698 posted 1:49 pm on Aug 29, 2011 (gmt 0)

jmccormac wrote:
This is quite worrying. Google doesn't have the mindpower to deal effectively with scraping so now it is, in effect, socialising the problem by getting the public and users to submit the details of scrapers.

Google's never really had the "mindpower" to properly rank pages and sites on its own. Wasn't one of the things that made Google "better" back in the day the fact they used backlinks as major ranking factor? That's just another form of "socializing" a more general problem.

It is a positive development in that it will solve a percentage of the problem however until Google manages to automate the process of detection, analysis and removal, it is still going to have a massive problem.

That's the point of this form. They're not feeding the submissions into an algorithm. They're simply using them to build a large enough data set that they can analyze and then use the results of their analysis to modify the existing algorithm(s).

Edit: Actually, that's not quite correct. It's pretty clear (right there in the OP, heh) that they already have changes to the algorithm and they're looking for user-submitted examples to test those changes against. It's too early...

--
Ryan

[edited by: rlange at 2:36 pm (utc) on Aug 29, 2011]

mikko123

5+ Year Member



 
Msg#: 4355698 posted 2:21 pm on Aug 29, 2011 (gmt 0)

Does anyone know how this effects retail sites that sell products using product descriptions and specifications pulled directly from the manufacturer's site? An official reseller would be using that content with permission and in some cases it's required to use the official product description. Would Google potentially penalize for this?

MediaGuy



 
Msg#: 4355698 posted 2:39 pm on Aug 29, 2011 (gmt 0)

Scraping is nonsense, Google IS the biggest scraper going - every day it sends out its bot and takes/collects extracts of web pages without permission, been doing it for years and nobody cares as they get free exposure.

Bethlk

5+ Year Member



 
Msg#: 4355698 posted 12:50 am on Aug 30, 2011 (gmt 0)

What about sites (such as mine - I admit) who have original authors permission to post their articles as long as I give them credit. BUT their articles are also posted on other (including the writers) web sites with and with out the writers permission.
Does Google take everyone down, even those like me who have permissions?

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4355698 posted 2:48 am on Aug 30, 2011 (gmt 0)

Does Google take everyone down, even those like me who have permissions?

If the only thing you offer is content that was originally published on other sites, then you're not likely to rank very well at all, even with permission. But if those articles are only used to enhance the rest of your site, which offers unique and original value, you'll be fine.

koan

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4355698 posted 3:33 am on Aug 30, 2011 (gmt 0)

The guy was just spinning other people's content and his whole network got nailed.


That really warms my heart. To think these people have the audacity to complain to Google once they get caught.

Scraping is nonsense, Google IS the biggest scraper going - every day it sends out its bot and takes/collects extracts of web pages without permission, been doing it for years and nobody cares as they get free exposure.


Scapers don't send traffic to your site, they rob it from you.

CainIV

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4355698 posted 5:21 am on Aug 30, 2011 (gmt 0)

Unfortunately this barely touches the iceberg. The problem is more widespread than a handful of network "bad guys" actively scraping pages. In the meantime, real revenues for real hardworking people are, and have been in jeopardy.

This change is simply window dressing after the Panda update since many webmasters rightfully complained about scraper sites outranking them.

Counter with a web form.

If Google wanted to get serious, with its mammoth revenue, and if preventing scrapers somehow generated revenue for Google, a system would have already been developed long ago.

The fact that webmasters are given a simple form, when in fact the algorithm should have long addressed content ownership, is rather deflating.

newsphinx

10+ Year Member



 
Msg#: 4355698 posted 7:31 am on Aug 30, 2011 (gmt 0)

chrisv1963
Maybe Google should use the DMCAs submitted to them as data for the algo. DMCAs are reviewed manually by Google people and should be trustworthy data.


Agree with Chrisv1963. But G really prefer to hit scrappers? I think that we are all wrong. They has enough data or DMCA, but now they socialize it.

wheel

WebmasterWorld Senior Member wheel us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4355698 posted 12:29 pm on Aug 30, 2011 (gmt 0)

The fact that webmasters are given a simple form, when in fact the algorithm should have long addressed content ownership, is rather deflating.

An algo change to address this has perhaps the biggest opportunity for fallout of any algo change they've done. You win or lose rankings, that's one thing. If others are ranking on your content as deliberate and specific part of the algo, that's another. I would be curious about legal implications.

Bethlk

5+ Year Member



 
Msg#: 4355698 posted 4:37 pm on Aug 30, 2011 (gmt 0)

Thank You Tedster - much appreciated :)

macas



 
Msg#: 4355698 posted 5:34 pm on Aug 30, 2011 (gmt 0)

@CainIV
You're definitely right .

Looks like we have AGAIN half-baked idea ...

Since I tracking scraper sites which abusing Google Images , I would say that I spot slight improvments with images and rankings but there is so much this to do / to stop manipulations of google images script.

This should be fixed / working on :

1.There is need to devalue Blogger/BlogSpot rankings because outstanding abuse of this service.

2.Remove/Banned foreign websites/forums/blog platforms which scraping top images and rankings on main Google Images and then serve on certain foreign languages / Goolge Images engines ( such as French , Latin America , ect )

3.Remove/Devalue appspot.com from Google Images engines because service is full of hackers/ hijackers / web scrapers.

4. Remove/Ban this type of <free images mixed with stuff for sale> website for good <snip>
This is typical example of websites which are abusing Google Images abusing thumbnails to outrank real source .

[edited by: Robert_Charlton at 6:49 am (utc) on Aug 31, 2011]
[edit reason] removed specific [/edit]

viralvideowall



 
Msg#: 4355698 posted 4:26 pm on Aug 31, 2011 (gmt 0)

I'd love to see how big this scraper report file is... hopefully they will get a clue

loner

5+ Year Member



 
Msg#: 4355698 posted 1:28 am on Sep 1, 2011 (gmt 0)

If Goo were at all serious about this they would have quit scraping images from my site and using them out of context for their own profit from the get-go. Until they fix their attitude problem it's all self-centered idiocy, hypocrisy and money-sucking greed.

kalseo



 
Msg#: 4355698 posted 2:34 am on Sep 1, 2011 (gmt 0)

Well finally, I don't like being cheated by lazy webmasters that are scraping my content and getting benefits from this.

nomis5

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4355698 posted 9:00 pm on Sep 2, 2011 (gmt 0)

What surprises me is the first question on the form about what problem the scraper is causing and they cite the example of the scraper ranking higher than the original site.

Over half my images have been scraped well into double numbers. As far as I am aware none of them rank higher than my original page. But that's still a problem for me. Why should they benefit anything at all from my pics? Even if they don't outrank me, they are still riding on my work for free.

koan

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4355698 posted 10:05 pm on Sep 2, 2011 (gmt 0)

What surprises me is the first question on the form about what problem the scraper is causing


The form is to help them improve their algorithm to rank the original first, a problem that got worse with Panda. But for everything else, you gotta go through the usual DMCA channel.

This 55 message thread spans 2 pages: < < 55 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved