homepage Welcome to WebmasterWorld Guest from 54.205.144.231
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 41 message thread spans 2 pages: 41 ( [1] 2 > >     
My SERP positions taken by scraper sites
danijelzi




msg:4298974
 3:54 pm on Apr 16, 2011 (gmt 0)

I don't know if it's due to Panda or not:

My site is 5 year old and has relevant inbound links, mostly pointing to originally written news post pages, and I don't think my site is a content farm.

Here's a pattern for the last two days:

- I publish an original news post and after an hour or more I get links from relevant sites.
- The related posts with backlinks to my post (on these relevant sites) get on the top of Google SERPS and my page is somewhere around #5.
- After a couple of hours, a couple of scrapper sites take over my position and I'm on the 2nd, 3rd page or simply nowhere.

I was curious and did the same check for my competitor's news, which has a similar site as I do. The pattern in his case is less severe, but scrappers are above him anyway.

I've filled a spam report on GWT and waiting for solution.

Does anyone else experience the same or similar thing?

 

tedster




msg:4298987
 4:14 pm on Apr 16, 2011 (gmt 0)

This has been one of the common complaints about the SERPs - that the scraper site problem got worse instead of better after Panda. Google claims that the update just before Panda was aimed directly at the problem, and then they mentioned copied content again in the interviews about Panda.

It's hard to make sense of Google's statements compared to the experience of many webmasters. One defensive tactic some sites are trying is delaying their RSS feed until googlebot spiders the first time.

Robert Charlton




msg:4299498
 7:59 pm on Apr 17, 2011 (gmt 0)

danijelzi - There's a recent Matt Cutts video on YouTube, which makes a valiant try at addressing the issue, including patterns similar to those you ask about, but it also suggests that there might be some problems "which don't happen that often". This may be true in terms of the web overall, but I think they are happening a lot.

How can I make sure that Google knows my content is original?
http://www.youtube.com/watch?v=4LsB19wTt0Q [youtube.com]

Clearly, Google is aware of the problems, and I think they know that their current approaches are inadequate in some areas, but I don't think they can say that publicly.

The latency and scaling issues in a database the size of Google's make this an extremely difficult issue to deal with, particularly if articles are scraped piece by piece. In some situations, I feel, the recent "scraper update" which preceded Panda made the problem worse.

Even if the original source is cited and rewritten, the societal implications regarding the internet distribution of digital content are complex... and, IMO, they go way beyond Google, and they have scarcely been addressed.

danijelzi




msg:4299527
 9:08 pm on Apr 17, 2011 (gmt 0)

Robert, thanks for the link to the video, it's very useful. I'll try to post tweets and links to my upcoming articles via social sites as soon I publish the articles, so we will see if that will help Google to recognize my site as the original source.

Meanwhile, I have checked positions of another two sites in my niche and scrapers are above them in SERPS also. These sites (blogs) don't belong to the top such as some magazines in my niche, but they write quality and original articles and have quality deep links from relevant sites.

I hope Google Search doesn't treat my site now as one that steals content (from scrapers).

danijelzi




msg:4299537
 9:52 pm on Apr 17, 2011 (gmt 0)

Update: I've published and article and posted the link on Twitter, Digg, and other social sites. The article appeared on the #1 in SERPS after a minute or two, when searched for the whole article title (without quotes). I'll report what happens next in SERPs with my article.

danijelzi




msg:4299543
 10:09 pm on Apr 17, 2011 (gmt 0)

Update 2: It takes around 10 minutes for a scraper site to take my content and position itself on #1 in SERPs :(

Robert Charlton




msg:4299608
 2:01 am on Apr 18, 2011 (gmt 0)

danijelzi - Definitely try delaying your RSS feed as tedster suggests. That might help you, though I suspect if it's scraper activity that's knocking you down so quickly, it might be a signal that you need more high quality inbound links.

There's a chicken and egg aspect to your situation though... if you're not ranking for your own content, you're going to have a harder time being found to attract natural inbounds, which are the links that will do you the most good.

Planet13




msg:4299632
 2:57 am on Apr 18, 2011 (gmt 0)

Out of curiosity...

Have you analyzed the back links that are pointing to the scraper sites that are ranking higher than you for your won content?

In particular, do they have a lot of spammy inbound links?

danijelzi




msg:4299651
 3:58 am on Apr 18, 2011 (gmt 0)

Robert, I'll try RSS delay, although I'm not sure if it will help. I think I'm indexed in Google before scrappers, because they appear in SERPS 10 minutes after me and then get above me. But, I'll try anyway.

The most interesting thing for me is backlinks part:
Two days ago I've posted an article and got 125 inbound links to the story during the last 2 days from both others' front pages and post pages, according to Backlink Checker.
Around 20 of those links are from very high quality sites in my and similar niches, including one site that has PR 8.
I'm nowhere to be found in SERPS when I search for query My Post Title.

On # 1 in SERPs is a post on a social bookmarking site only containing a link to a scraper site with my article. The #1 has ZERO backlinks. #2 in SERPs is another scraper site with another copy of my article with and ZERO backlinks to it, according to Backlink checker.

What a mess...

It's best for me to stop posting doesn't it?

miozio




msg:4299704
 8:11 am on Apr 18, 2011 (gmt 0)

"Update 2: It takes around 10 minutes for a scraper site to take my content and position itself on #1 in SERPs :( "

I feel your pain! Its so unfair.. I have the same problem but with multiple sites and i don't distribute RSS. Our content had been gradually stolen for years and our material is torn apart across many blogs, reputable sites and spam places... The thing that worries me is Facebook groups, a lot of Indian names post our articles to those pages.. If i report Facebook to Google i don't know what to expect... Some of our content is 5-6 years old and it had been replicated across the web since we published it.

Oh, and we are down 90% in Google refferals. Coping with it bit by bit with Bing and Yahoo.

danijelzi




msg:4299728
 9:16 am on Apr 18, 2011 (gmt 0)

I've already posted in another thread that it appears that only scrapped pages of my site are affected by Panda.

First, my category pages: not scrapped, have decent raking.
Second, my older posts: not scrapped, have decent ranking.

Their rankings dropped a bit, but it looks like it's only a side effect of the fall of pages that were scrapped.

Also, checked my competition sites - same thing
scrapped = drop
not scrapped = ok.

topr8




msg:4299776
 12:13 pm on Apr 18, 2011 (gmt 0)

>>Update 2: It takes around 10 minutes for a scraper site to take my content and position itself on #1 in SERPs

use your logs to see what is scraping you, then block them (may not be entirely straightforward)

superclown2




msg:4299822
 2:31 pm on Apr 18, 2011 (gmt 0)

Any reason why you can't you block the scraper's IP address?

Planet13




msg:4299823
 2:33 pm on Apr 18, 2011 (gmt 0)

Another poster has suggested not allowing google to cavhe your site if you can't identify the scraper through your logs, since sometimes scrapers will use the google cache.

Does the scraped content still have your canonical tags and links to your original content intact? Do you have text in your pages that say it is copywrighte by you, amnd if so, does it show up on the scraper's site?

danijelzi




msg:4299827
 2:36 pm on Apr 18, 2011 (gmt 0)

Tried a 45-minute RSS delay, but it doesn't help, it only delays scrapping for a while and the SERPs are again spammed.
I've started to block IPs of scrapper sites that take content from my RSS. Other sites use some other way to scrap since they have the whole article, whereas my RSS offer excerpts only. As far as I understand, they can use CURL to scrap and I can't hardly do anything about that.

Next I'll try to block some of them.

danijelzi




msg:4299845
 3:10 pm on Apr 18, 2011 (gmt 0)

Does the scraped content still have your canonical tags and links to your original content intact? Do you have text in your pages that say it is copywrighte by you, amnd if so, does it show up on the scraper's site?


They either:
- put nofollow on link the source link to my site or
- put a link to another scraper site like it was the original source or
- remove link

Some of the sites have a copyright notice, like they were the original creator.

I hadn't copyright notice, since I thought it doesn't matter to have it and that whatever I publish on my site is my property. But yesterday I put it on all pages, in the footer of the template.

Can you explain me a bit how canonical tags can help with this? I've tried to search for that but no success.

And here's another problem - Is it smart to block IPs of .blogspot.com? blogspot spam sites are all over the place, all I can do is to manually report spam to blogspot.

Edge




msg:4299879
 3:54 pm on Apr 18, 2011 (gmt 0)

Any reason why you can't you block the scraper's IP address?


Better yet, limit access to the orignal work to selected crawlers - then turn it loose after the work has been crawled...

tenerifejim




msg:4299914
 4:40 pm on Apr 18, 2011 (gmt 0)

danijelzi - I understand that the scrappers are not using your RSS to take the whole of your content. But almost certainly they are using some form of monitoring (probably your RSS) to see when you update and then scrape.

I don't know how much this matters to you - but you should delay longer than 45 minutes.

For canonical references refer to [googlewebmastercentral.blogspot.com...] - it is you telling Google that you are the original. If the scrappers do not claim this - then you should be classified as the true (cannon) content.

walkman




msg:4299979
 6:14 pm on Apr 18, 2011 (gmt 0)

Two theories:
- Google has penalized your site so even a scrapper has more juice /credibility /quality than your site.

- Google doesn't care /can't tell about who posted it first. Their site is more 'qualitative' so you are screwed.

I don't think any delay will do you any good long term to be honest, this has to do with Google.

danijelzi




msg:4300005
 6:59 pm on Apr 18, 2011 (gmt 0)

Walkman,
I'm not sure if whole my site is penalized, because non-scrapped pages didn't drop a lot in SERPs after Panda, some of them rank very well even after the update.

Also, I've posted above results from Backlink Watch and the scrapped articles on scrapper sites don't have any backlinks, while I have a lot.

I really have no other idea other that Google actually sees me as someone who steals content from scrapper sites. Before Panda they were probably deep below me in SERPS, and after Panda got on top.

falsepositive




msg:4300009
 7:09 pm on Apr 18, 2011 (gmt 0)

@danijelzi, I've had your symptoms since February and I keep vacillating on whether I should blame scrapers or blame my site for this. What I mean is this: we only have a limited amount of time in the day and so we need to prioritize. Which ones here are red herrings? Is chasing a scraper down a red herring?

I can focus on improving my site in general or I can chase scrapers. After evaluating my site, which is heavily scraped, I also realized there are improvements I could make for the user. So I've wondered whether to spend my time improving the site the way I think will benefit the user more (remove bad pages, rewrite pages, improve community) or should I spend my time chasing down my hundreds of scrapers?

I've done both, but I can't shake the feeling that focusing on my site is just much more value added here. As Google strives to improve their scraper detection algorithm, I'm hoping that by doing the right adjustments and Google doing theirs, someday (soon hopefully), we will meet in a happy place.

OldIrish




msg:4300019
 7:21 pm on Apr 18, 2011 (gmt 0)

Google is too busy harassing it's core user base to worry about content scrapers. When they're not putting hundreds of thousands of people out of business with ruthless search updates, they're shutting down hundreds of small commercial YouTube accounts over minor censorship related issues (this also means you lose your linked GMail account too).

When I hear the word "quality" in regards to this Panda update, I'm instantly reminded of George W. Bush's infamous "weapons of mass destruction". The next time Google's iron "quality" fist comes crashing down on your head, just remember that this is a company that doesn't even have a basic customer service department (Google groups and webmaster tools are NOT a customer service department).

CainIV




msg:4300031
 7:40 pm on Apr 18, 2011 (gmt 0)

The unfortunate part to me is that QA at Google seems to believe that these SERPs are an improvement across most verticals at it relates to scraped content. Nothing at this point could be further from the truth.

OldIrish




msg:4300038
 7:54 pm on Apr 18, 2011 (gmt 0)

Google Groups for webmasters is nothing but a bash and trash propaganda outpost. If you ever have the urge to have your professional website trashed by one of Google's useful idiots (many of whom run Made For AdSense websites), then by all means head on over to Google Groups and post a link to your website for review.

walkman




msg:4300043
 8:02 pm on Apr 18, 2011 (gmt 0)

The unfortunate part to me is that QA at Google seems to believe that these SERPs are an improvement across most verticals at it relates to scraped content. Nothing at this point could be further from the truth.

Even if it was a mess it's hard to admit failure, just imagine the headlines. In any case, declare victory and work behind the scenes.

What's a few million lives ruined for Google?

Google Groups for webmasters is nothing but a bash and trash propaganda outpost. If you ever have the urge to have your professional website trashed by one of Google's useful idiots (many of whom run Made For AdSense websites), then by all means head on over to Google Groups and post a link to your website for review.

Yep, and they got titles too. And they will flag you for spam (ban your gmail /adsense acct) if you disagree with them. My favorite was them bashing a site that was being outranked for its own articles. Their point? Your article is too short (300 or so words). Hmmmm...so how come it isn't too short for the scrapper?

That's Google support though, a few deaf, dumb and blind losers hoping to get a Google job. Almost a $200 billion company that provides next to zero support.

OldIrish




msg:4300069
 9:06 pm on Apr 18, 2011 (gmt 0)

That's Google support though, a few deaf, dumb and blind losers hoping to get a Google job. Almost a $200 billion company that provides next to zero support.


Who needs customer service when you can just break out an enigma machine and tune into one of Matt Cutts' digital info streams. What's amazing is that half the webmaster community still treats this guy's greasy corporate propaganda like it would be a public service announcement from Mother Teresa.

chrisv1963




msg:4300091
 10:07 pm on Apr 18, 2011 (gmt 0)

It's hard to make sense of Google's statements compared to the experience of many webmasters.


That's because we are webmasters and not shareholders that believe Google's pep talk.

kd454




msg:4300100
 10:16 pm on Apr 18, 2011 (gmt 0)

That's Google support though, a few deaf, dumb and blind losers hoping to get a Google job. Almost a $200 billion company that provides next to zero support.


The only customer support I ever got from them was a conversation to put more Adsense on my sites!

When I was an ignorant newb I posted a website on the Google forum for review, I was only throwing the Sharks some bait for a thrashing.

The bionic google Fanboy's sure do seem to have EGO issues.

Sgt_Kickaxe




msg:4300160
 1:22 am on Apr 19, 2011 (gmt 0)

I have competitor that scrapes himself with odd results.

From the footer of his every page on multiple sites he links to his "business" site which is a single page in size. That page republishes the latest 5 posts from his every site. It's pagerank 6, apparently trusted, and draws a good deal of traffic to his "business" site which he sells netwroks advertising from.

It's apparently all about trust now. If you have enough of it you have been granted some room to abuse it it seems.

flashdash




msg:4301119
 7:53 am on Apr 20, 2011 (gmt 0)

Did you consider that your site maybe under some sort of penality? if the domain name is generic and non-brand, does it come up first when google?

Take a 2 sentences from your home page and google it with quotes "". Do you come up first?

This 41 message thread spans 2 pages: 41 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved