|Can Scrapers Hurt Your Rankings?|
For the last few months we seem to be losing traffic to our main web site, and yesterday we took a drop of a further 20%. So I looked at my webmaster tools and found a site with 1,000's of links to us and began to investigate further. It turns our that they have 100,000's of pages of Bing Search Results including the link, hence all the new links to our site. However, what I began to notice was the scraper site was ranking above us in many of the searches I did.
By having the Bing feed, they now have scraped content from everyone in our industry and almost nothing in terms of their own content. The link is still there to the page where Bing took the snippet, maybe they also now link to quality content in the eyes of Google. They also seem to frequently change the URL's of 10,000's of web pages. Maybe this gives the appearance of fresh and new content, who knows.
I searched for a page title on one of my pages, 100% unique title and in "quotes", I now rank at number 11 for the search phrase behind 10 pages of Bing Feeds. So my questions are this: Are we losing traffic because of people having these feeds on their pages? Is Google demoting my pages because of this? It is certainly is very worrying
These types of pages (generated from search results) are very common. My impression is that Google usually identifies them and keeps them from ranking. Maybe the scraper site you've discovered got into the rankings through a hidden "bug" in the algorithm. You might consider filing a spam report if it stays in the rankings.
That's the short answer. Makes no difference if the "find" comes from G or B or any other search engine. Copyright infringers don't care where they get their material
|azn romeo 4u|
how come rhier adsense account dont get banned?
Google deals with infractions of DMCA. They don't apply that against their bread and butter... ie. the site continues to exist since the OTHER pages don't have a DMCA against them. Lawyer parsing. :)
This brings up some interesting questions...
Firstly, if they have the right software to scrape the bing feeds, then they most certainly would have the software to strip the links (and any other html) IF they wanted.
So why would they leave the links in there? They must have known that webmasters would be able to track them down more easily by checking their backlinks.
|Maybe the scraper site you've discovered got into the rankings through a hidden "bug" in the algorithm. |
My guess is you're on to something, aristotle, and the outbound inks are part of it.
I hate scrapers more than the next guy, but this is definitely a tough call.
Technically, it's not a scraper if they use a feed, they're an aggregator.
If they're coming directly to your site, or your SE cache, or the internet archives and actually scraping content then they're a scraper.
If they're using a published RSS feed, technically they aren't doing anything wrong, it's being published for their usage, they didn't scrape it, it was delivered as a feed.
IMO opinion the issue is with Bing publishing the feed, but then again you agreed to let Bing take your content and do whatever they want with it to get traffic in return. If publishing your content in an RSS feed is part of what Bing does, and you agreed to allow Bing to use your content in any way they see fit, not much you can do really unless the other webmaster using the RSS feed is violating the Bing terms of service for that feed.
This is a case where you agreed to play in their game by their rules and now the outcome isn't what you expected.
I would check the terms of service for using those feeds as a starting point.
Some very good points here, and hopefully we are slightly closer to finding out what is going on. I take on board that Bing Feeds are not technically scrapers, however, for them to rank above the actual site where the text came from can only be a error on Googles part, IMHO.
Firstly, the site we are having trouble with and being outranked does leave the links in from their Bing Feed. There is another company in our industry who use the same feed, remove the all links, and they have SUNK this year in terms of traffic. The site leaving the links in have grown month on month in traffic. Panda seems to love what they are doing. So maybe this is somekind of Bug that they are exploiting. The site in question also change URL's very often on 10,000's pages, so you see very many 404's, still seems to work a treat in terms of their traffic growing.
My original question was "Can Scrapers Hurt Your Rankings"? If we are appearing behind these Bing Feeds for our own work, stuff added only weeks ago, then are we carrying some kind of penalty here? I just cannot see how these Feeds are rating above us for exact quotes from our site.
My current approach to these kinds of ranking challenges is to reinforce every signal I can that "this is the original source of the content." That means using things like "fat pings" via PuSH, then delaying the RSS feed until the PuSH ping is received; authorship mark-up; short feeds (that include permalinks) rather than full content feeds, etc.
Google still can have challenges ranking the original on top in some cases, but it happens a lot less often.
One other thing needs to be looked at here:
|So I looked at my webmaster tools and found a site with 1,000's of links to us... |
Could it be that those thousands of inbound links pointing to the Original Poster's site might be harming the Original Poster's site as well?
Maybe they are "de-legitimizing" the natural links the original poster's site might have.
I understand that google bowling is much harder to achieve nowadays, but if google is still giving so much love to the scraper site, then maybe there is another bug in the algo which is causing the original poster's site to be punished by the inbound links, too...
|That means using things like "fat pings" via PuSH, then delaying the RSS feed until the PuSH ping is received; authorship mark-up; short feeds (that include permalinks) rather than full content feeds, etc. |
Two quick questions:
1) What should static html sites (that don't publish RSS feeds) do if they find themselves in such a situation?
2) What form / standard of authorship mark-up should such static html pages use? Is there a (more-or-less) universal authorship mark-up language?
Thanks in advance.
This thread talks about Google's authorship mark-up [webmasterworld.com]. It's helping several sites that I know of.
If you don't use a feed, then you suffer from plain-old scraping, not "legal" syndication. DMCA is one appropriate tool - reported to the site itself, its ISP and Google. I don't usually bother reporting just any old scraper, but only if they begin to outrank the original source.
|I don't usually bother reporting just any old scraper, but only if they begin to outrank the original source. |
I'm glad you brought that up :)
Some of the scrapers that I have seen were NOT out ranking me for the desired keywords (widget images) for which I was trying to rank, but were outranking me for exact word-for-word phrases on my site that contained those keywords (or some closely related semantic variations of the desired keywords).
I wonder if that tells us a little more about the importance of anchor text in links - even internal links, since the internal links on my site used the desired keywords as anchor text?!?!?!?
Because if I out-ranked them for "widget images" why SHOULDN'T I have out-ranked them for the phrase "There is a long history of the use of widget images in the medieval Japanese royal courts."?!?!?!
Way Off Topic: Total seat-of-my-pants feeling that this is somehow related to the MayDay Update of 2010 and its affect on longtail ranking - and I have absolutely no proof whatsoever :)
@God: I've recently noticed that content that has been scraped from our pages is essentially ignored by google. In other words, it's like: "we can't figure out who the author is... therefor lets ignore the content..."
I'd guess that the scraper in your situation has more overall content than you do for that page... including a well rounded link back to authority content, similar pages.
The other issue that comes up is if Google is ignoring "most" of your content... I think they start to look at your site as being shallow.
@Planet13 - it's always a judgment call, isn't it? Search volume of the respective phrases comes into play, as well as how much time/energy you can afford to give the project. No doubt Google still has a way to go to handle scrapers/mash-up sites/syndication in an optimal way.
I try to keep mu eye fixed on traffic optimization most of all - I can't afford to play cop to every two-bit content thief.
|I try to keep mu eye fixed on traffic optimization most of all - I can't afford to play cop to every two-bit content thief. |
Amen to that...
The content that we are being outranked for by the Bing feeds was added to our site about 3 weeks ago. Google picked up the new content straight away, within 24 hours. Three weeks later, we are now behind the Bing Feeds, on 2 different sites now. I have filed a spam report, scraper outranking original content report,and posted on Google's web master forum.
Is there anything else I can do?
|Is there anything else I can do? |
Are they monetizing their site with adwords?
If so, is there a way to notify adwords so that their adwords account will be blocked?
Matt tweeted out a call for examples of scrapers outranking original content a few months ago:
|Scrapers getting you down? Tell us about blog scrapers you see: [goo.gl...] We need datapoints for testing. |
This isn't a reporting mechanism, but a source of examples that the engineers can use to test against. IMO, it's usually good if the engineers use your site as an example.
I submitted all sorts of examples to their scraper detection trial a few months ago. Never heard back, of course:-)