Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Epidemic of scrapers grabbing Google SERP snippets

I've been finding this problem worse than rogue robots

         

Clark

8:39 pm on Aug 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Following that thread on "Proxy Server URLs Can Hijack Your Google Ranking - how to defend?" [webmasterworld.com] I started tracking some Useragents and bots. Didn't go as far as I wanted to, but I did take several steps:

1. Removed caching so bots won't grab cached text from SE's.
2. Banned a few bots spoofing google and yahoo.
3. Put identifying info on all content so I can track the IP's etc that are doing the scraping.
4. Put more unique content so I can find who has the content in the SEs.

From my cursory work so far, 90% of the content grabbed has Googlebot's encrypted IP. It turns out the bots are running searches on google and using the snippets of the SERPs as their new content. They know it's relevant to the money keywords because the results are coming up on Google for the very searches they do.

So...
1. They don't even rely on caches..
2. Even if you ban them, they can still grab your content.

True though, that banning the few bots I did may have been effective enough to transfer the problems I have left to the bots who are using the snippet idea. Maybe other people will find that as 10% of the problem to my 90%...

Feel free to share if you have stats on this.

P.S. Maybe they are already doing this, but Google should find some unique word or phrase on the page they are displaying the snippet for, make sure it's part of the snippet and when they find a domain consistently has too many unique words/phrases from other site's snippets, ban them from the index and especially adsense etc...

Robert Charlton

6:34 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Maybe they are already doing this, but Google should find some unique word or phrase on the page they are displaying the snippet for, make sure it's part of the snippet and when they find a domain consistently has too many unique words/phrases from other site's snippets, ban them from the index and especially adsense etc...

There was a discussion a while back about a Google patent intended to identify pages that had too many target phrases, as pages composed of snippets might have. Not quite the same thing as what you're suggesting, but you might find the discussion interesting....

New Patent Application - Spam Detection Based on Phrase Indexing
[webmasterworld.com...]

Tonearm

8:40 pm on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It turns out the bots are running searches on google and using the snippets of the SERPs as their new content.

I'm seeing a huge amount of this with my site. Basically pages that look like SERPs with a link back to my page. How could this hurt me? Duplicate content penalties?

TheSeoDude

8:58 pm on Aug 28, 2007 (gmt 0)



Dude, just thank the scrapers for relevant links. ;)

followgreg

2:38 am on Aug 29, 2007 (gmt 0)

10+ Year Member




a couple of our sites have seen a large increase of scrapers backlinks recently.
I've even suspected competitors trying to demote us.

Bottom line is that I suspect that Google does not filter such spam very well.
If scrapers don't rank their site, I think that it has a bad influence on your link profile. Which could be the reason why I see our sites have issues on some keywords recently (60% of backlinks are scrappers...hate these things).

A while back it was easy to detect such spam, mostly from .info tld's or suspicious hostings, it becomes more difficult now for Google I think as scrapping is also a technique that evolves apparently.

TheSeoDude

11:16 am on Aug 29, 2007 (gmt 0)



... evolves apperently?

I have sleepless nights thinking on how to generate somehow readable text based on previous text using Markov on some basic LSI algo to create crappy sites which do work in Google. ;)

Trust me ... scraping will always evolve as long as Google hits the the white-hats so bad these days that their hat jumps off!

And, yes, there are casualties :(

followgreg

12:33 pm on Aug 29, 2007 (gmt 0)

10+ Year Member



That was fun theseodude :)

I admit that I used to think that Google would get rid of all spam but recent SERPS and actually most 2007 made that I changed my mind.

Spam evolved at least as fast a Google guys implement changes (or tests) for the past 3 years.
Makes us remember that Google is a software (and a marketing company...). It's disapointing though, world is not perfect. Google makes mistakes.

...casualties are not good. Dunno who mentioned "collateral damage" the first time on this forum but the guy was damn right. It never ends.

TheSeoDude

1:28 pm on Aug 29, 2007 (gmt 0)



As long as the link-juice is all that flows through Google's veins it will be like this.

PS: I was saying it in a funny way but I was not joking! ;)

Clark

7:24 pm on Aug 29, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for that link Rob, was very interesting.
As for backlinks, the vast majority of those scrapers that I've seen have been considerate enough to only grab the content and remove the link ;)

TheSeoDude

7:55 pm on Aug 29, 2007 (gmt 0)



Actually most I've seen do linkback. Some consider good outgoing links help ranking so they usually keeps links from the results they leech.

cabbie

9:48 pm on Aug 29, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



These scrapers using google serps worry me.
One site i am involved with which has been doing very well has lost a lot of positions for most of its inner pages.
I say most because not all have been affected.Looking at the ones that have tanked, I see several of these serp scraper sites, all identical except for their url all with site wide links to these inner pages.
I wonder if its causing a 'anchor text' penalty or 'too many links too fast' penalty for these pages.
The ones still doing great don't at the moment have scraper links.

ChicagoFan67

7:27 am on Aug 30, 2007 (gmt 0)

10+ Year Member



I'm new and I'm not sure if I'm posting in the right topic.

I have a site with lots of image galleries. Most of my gallery pages occupy a position within the top 5 for their specific keywords. Recently 8 of my pages disappeared from the top 10 and when I went looking for one of them, I couldn't find it at all but instead found, on the second last page of results, a scraper site having the exact same title as my page. When I clicked on the google cache, some kind of script was executed and my browser was hijacked and I landed at a page telling me that I should install "privacy protector". (I now have the arduous task of removing malware ahead of me this evening).

I also happened to notice that keywords for two of my other missing pages appeared in google's snippet of the cached text of this site. Some of my other missing pages are of similar theme to the text that was in this snippet.

Is there any way of reporting such sites to Google?

Bewenched

12:24 pm on Aug 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I feel your pain, i've been fighting this sort of thing for a couple of years now. I ban one and another shows up.

Although many of them are links to our site, most have them linking to other scraped content on their site. I've wondered if our "title" being on their pages and linking to their pages devalues our own page. I'm sure it has some negative effect within the serps.

Dupe content is dupe content, and in these cases it's our page title and brief description that are getting swiped.

tedster

1:29 pm on Aug 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is there any way of reporting such sites to Google?

Welcome to the forum, ChicagoFan67.

Here are three ways to report this to Google -- if their own cache takes you to a malware page, I would have no hesitation in reporting it.

1) At the bottom of every search results page, there's a link that says "Dissatisfied? Help us improve"

2) Use Google's Spam Report form [google.com]

3) Login (or set up) to your Webmaster Tools account and report the problem from the link on your Dashboard page that says "report spam in our index". Because using your GWT account authenticates you, Google tends to give it higher priority and credibility.

Be precise but brief in your note to Google. Report both the actual search keywords and explain the problem - I would be sure to use the word "malware" in an early and prominent way.

Clark

4:22 pm on Aug 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Does Google actually pay attention when you report sites? If yes, only malware, or scrapers too? Most of these scrapers have adsense on them and they are displaying ads.

I know officially google has decided that MFAs/scrapers are forbidden. But in practice, I haven't seen any indication they've banned many people. The MFAs/scrapers exist in large numbers because of adsense...which in a funny way makes Google the biggest spammer of all because they have closed their eyes to the MFAs from the beginning. Don't you think?

tedster

11:56 pm on Aug 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sure they pay attention. They might not take the exact action your hoping for, or act as rapidly as you would like, but they definitely read this kind of input,log it, prioritize it, etc. And as with all things Google, they prefer to find an algorithmic fix rather than just making a manual fix. But when it comes to real trouble - such as malware or adult sites spamming keywords that kids might use - I've seen action within 24 hours of a report.

ChicagoFan67

4:14 pm on Aug 31, 2007 (gmt 0)

10+ Year Member



Thankyou. I have reported the site via my Webmaster tools account. Will report back if Google does anything about it.