As BW100 wisely ascertained there are multiple problems with the described formula. I always go back to the law which is establishing ďthe intent to copyĒ and the gold standard is copying verbatim. In all fairness though I havenít looked at the whole set of facts and it probably was spinning.
Interestingly I wrote a well detailed article that ranked very well for six months then vanished with Panda. It was perplexing. Lo and behold almost the exact article appeared in a major magazine and garnered the same ranking I once held. The author of it had contacted me about three months earlier for references for the research. I didnít reply to the e-mail because I didnít have time to do his work and mine and thatís what the mail was essentially about. Google though had listed my work as similar to theirs in the results but I no longer had a page in the results.
The gist being the larger brand had won out and with Panda it will likely stay that way. My article is number one in Bing and the copy doesnít appear there. As I said back in Nov-Dec Google is headed in the direction of professional journalism standards and brands. That likely takes precedence over indie and unknowns.
Yay, another tool, but I'd fully expect that if you report a site Google will evaluate the site submitted as well as all of your sites at the same time, to gauge your intent if nothing else. My concern with that is that they, of course, get it wrong occasionally which may impact you unknowingly.
All of your data is belong to Google.
You cant 'do' or penalize websites for scraping content - or you'd better sue every newspaper in the land bcos this is what they do every day eg: print the same stories.....
Its too big to police - the web is too big and I suspect Google know this and all this 'We'll fix the scrapers' is just for show and a giant publicity stunt.
Many newspapers print the same stories because they have syndication agreements with the news agencies (e.g. Reuters, Associated Press, Canadian Press).
That is not at all the same thing as scraping, which is to reproduce someone else's content without permission.
|Many newspapers print the same stories because they have syndication agreements with the news agencies |
Actually, many dont. Most operate totally independantly eg: Express, BBC, Telegraph, the Regionals etc and have their own jounalists who source stories sent to them directly by way of 'Send Us your story' forms via their websites.
so they dont need such relationships. I know this bcos I send in stories myself.
Also, the way the media works is by Journalists reading each other's stories and this is how the main stories are spread about between news sources.
@MediaGuy Perhaps that's what they teach in Media Studies but most newspapers depend on agency reporting as fillers. Articles being lifted by other journalists is quite a tradition and if you look at the Sunday newspapers today and some of the newspapers tomorrow, you will see some of the articles from the Sundays recycled. Real journalism is highly incestuous in terms of the way that stories will be recycled or lifted. Stories will often be reprinted with minimal changes - not unlike the "spinning" that happens with some websites. Unique content, just as with websites, is expensive and can take time to create. It can be a lot easier to use agency articles as fillers especially for foreign news.
|@mattcutts Matt Cutts |
Scrapers getting you down?
Does it mean that being a victim of scrapers is actually a cause for pandalization, not a consequence?
Are we allowed to report Google properties as well for scraping?
Could Autoblogs be considered as scraping content?
|jmccormac wrote: |
This is quite worrying. Google doesn't have the mindpower to deal effectively with scraping so now it is, in effect, socialising the problem by getting the public and users to submit the details of scrapers.
Google's never really had the "mindpower" to properly rank pages and sites on its own. Wasn't one of the things that made Google "better" back in the day the fact they used backlinks as major ranking factor? That's just another form of "socializing" a more general problem.
|It is a positive development in that it will solve a percentage of the problem however until Google manages to automate the process of detection, analysis and removal, it is still going to have a massive problem. |
That's the point of this form. They're not feeding the submissions into an algorithm. They're simply using them to build a large enough data set that they can analyze and then use the results of their analysis to modify the existing algorithm(s).
Edit: Actually, that's not quite correct. It's pretty clear (right there in the OP, heh) that they already have changes to the algorithm and they're looking for user-submitted examples to test those changes against. It's too early...
[edited by: rlange at 2:36 pm (utc) on Aug 29, 2011]
Does anyone know how this effects retail sites that sell products using product descriptions and specifications pulled directly from the manufacturer's site? An official reseller would be using that content with permission and in some cases it's required to use the official product description. Would Google potentially penalize for this?
Scraping is nonsense, Google IS the biggest scraper going - every day it sends out its bot and takes/collects extracts of web pages without permission, been doing it for years and nobody cares as they get free exposure.
What about sites (such as mine - I admit) who have original authors permission to post their articles as long as I give them credit. BUT their articles are also posted on other (including the writers) web sites with and with out the writers permission.
Does Google take everyone down, even those like me who have permissions?
|Does Google take everyone down, even those like me who have permissions? |
If the only thing you offer is content that was originally published on other sites, then you're not likely to rank very well at all, even with permission. But if those articles are only used to enhance the rest of your site, which offers unique and original value, you'll be fine.
|The guy was just spinning other people's content and his whole network got nailed. |
That really warms my heart. To think these people have the audacity to complain to Google once they get caught.
|Scraping is nonsense, Google IS the biggest scraper going - every day it sends out its bot and takes/collects extracts of web pages without permission, been doing it for years and nobody cares as they get free exposure. |
Scapers don't send traffic to your site, they rob it from you.
Unfortunately this barely touches the iceberg. The problem is more widespread than a handful of network "bad guys" actively scraping pages. In the meantime, real revenues for real hardworking people are, and have been in jeopardy.
This change is simply window dressing after the Panda update since many webmasters rightfully complained about scraper sites outranking them.
Counter with a web form.
If Google wanted to get serious, with its mammoth revenue, and if preventing scrapers somehow generated revenue for Google, a system would have already been developed long ago.
The fact that webmasters are given a simple form, when in fact the algorithm should have long addressed content ownership, is rather deflating.
Maybe Google should use the DMCAs submitted to them as data for the algo. DMCAs are reviewed manually by Google people and should be trustworthy data.
Agree with Chrisv1963. But G really prefer to hit scrappers? I think that we are all wrong. They has enough data or DMCA, but now they socialize it.
|The fact that webmasters are given a simple form, when in fact the algorithm should have long addressed content ownership, is rather deflating. |
An algo change to address this has perhaps the biggest opportunity for fallout of any algo change they've done. You win or lose rankings, that's one thing. If others are ranking on your content as deliberate and specific part of the algo, that's another. I would be curious about legal implications.
Thank You Tedster - much appreciated :)
You're definitely right .
Looks like we have AGAIN half-baked idea ...
Since I tracking scraper sites which abusing Google Images , I would say that I spot slight improvments with images and rankings but there is so much this to do / to stop manipulations of google images script.
This should be fixed / working on :
1.There is need to devalue Blogger/BlogSpot rankings because outstanding abuse of this service.
2.Remove/Banned foreign websites/forums/blog platforms which scraping top images and rankings on main Google Images and then serve on certain foreign languages / Goolge Images engines ( such as French , Latin America , ect )
3.Remove/Devalue appspot.com from Google Images engines because service is full of hackers/ hijackers / web scrapers.
4. Remove/Ban this type of <free images mixed with stuff for sale> website for good <snip>
This is typical example of websites which are abusing Google Images abusing thumbnails to outrank real source .
[edited by: Robert_Charlton at 6:49 am (utc) on Aug 31, 2011]
[edit reason] removed specific [/edit]
I'd love to see how big this scraper report file is... hopefully they will get a clue
If Goo were at all serious about this they would have quit scraping images from my site and using them out of context for their own profit from the get-go. Until they fix their attitude problem it's all self-centered idiocy, hypocrisy and money-sucking greed.
Well finally, I don't like being cheated by lazy webmasters that are scraping my content and getting benefits from this.
What surprises me is the first question on the form about what problem the scraper is causing and they cite the example of the scraper ranking higher than the original site.
Over half my images have been scraped well into double numbers. As far as I am aware none of them rank higher than my original page. But that's still a problem for me. Why should they benefit anything at all from my pics? Even if they don't outrank me, they are still riding on my work for free.
|What surprises me is the first question on the form about what problem the scraper is causing |
The form is to help them improve their algorithm to rank the original first, a problem that got worse with Panda. But for everything else, you gotta go through the usual DMCA channel.
| This 55 message thread spans 2 pages: < < 55 ( 1  ) |