Forum Moderators: Robert Charlton & goodroi
1. Removed caching so bots won't grab cached text from SE's.
2. Banned a few bots spoofing google and yahoo.
3. Put identifying info on all content so I can track the IP's etc that are doing the scraping.
4. Put more unique content so I can find who has the content in the SEs.
From my cursory work so far, 90% of the content grabbed has Googlebot's encrypted IP. It turns out the bots are running searches on google and using the snippets of the SERPs as their new content. They know it's relevant to the money keywords because the results are coming up on Google for the very searches they do.
So...
1. They don't even rely on caches..
2. Even if you ban them, they can still grab your content.
True though, that banning the few bots I did may have been effective enough to transfer the problems I have left to the bots who are using the snippet idea. Maybe other people will find that as 10% of the problem to my 90%...
Feel free to share if you have stats on this.
P.S. Maybe they are already doing this, but Google should find some unique word or phrase on the page they are displaying the snippet for, make sure it's part of the snippet and when they find a domain consistently has too many unique words/phrases from other site's snippets, ban them from the index and especially adsense etc...
Maybe they are already doing this, but Google should find some unique word or phrase on the page they are displaying the snippet for, make sure it's part of the snippet and when they find a domain consistently has too many unique words/phrases from other site's snippets, ban them from the index and especially adsense etc...
There was a discussion a while back about a Google patent intended to identify pages that had too many target phrases, as pages composed of snippets might have. Not quite the same thing as what you're suggesting, but you might find the discussion interesting....
New Patent Application - Spam Detection Based on Phrase Indexing
[webmasterworld.com...]
Bottom line is that I suspect that Google does not filter such spam very well.
If scrapers don't rank their site, I think that it has a bad influence on your link profile. Which could be the reason why I see our sites have issues on some keywords recently (60% of backlinks are scrappers...hate these things).
A while back it was easy to detect such spam, mostly from .info tld's or suspicious hostings, it becomes more difficult now for Google I think as scrapping is also a technique that evolves apparently.
... evolves apperently?
Trust me ... scraping will always evolve as long as Google hits the the white-hats so bad these days that their hat jumps off!
And, yes, there are casualties :(
I admit that I used to think that Google would get rid of all spam but recent SERPS and actually most 2007 made that I changed my mind.
Spam evolved at least as fast a Google guys implement changes (or tests) for the past 3 years.
Makes us remember that Google is a software (and a marketing company...). It's disapointing though, world is not perfect. Google makes mistakes.
...casualties are not good. Dunno who mentioned "collateral damage" the first time on this forum but the guy was damn right. It never ends.
PS: I was saying it in a funny way but I was not joking! ;)
I have a site with lots of image galleries. Most of my gallery pages occupy a position within the top 5 for their specific keywords. Recently 8 of my pages disappeared from the top 10 and when I went looking for one of them, I couldn't find it at all but instead found, on the second last page of results, a scraper site having the exact same title as my page. When I clicked on the google cache, some kind of script was executed and my browser was hijacked and I landed at a page telling me that I should install "privacy protector". (I now have the arduous task of removing malware ahead of me this evening).
I also happened to notice that keywords for two of my other missing pages appeared in google's snippet of the cached text of this site. Some of my other missing pages are of similar theme to the text that was in this snippet.
Is there any way of reporting such sites to Google?
Although many of them are links to our site, most have them linking to other scraped content on their site. I've wondered if our "title" being on their pages and linking to their pages devalues our own page. I'm sure it has some negative effect within the serps.
Dupe content is dupe content, and in these cases it's our page title and brief description that are getting swiped.
Is there any way of reporting such sites to Google?
Welcome to the forum, ChicagoFan67.
Here are three ways to report this to Google -- if their own cache takes you to a malware page, I would have no hesitation in reporting it.
1) At the bottom of every search results page, there's a link that says "Dissatisfied? Help us improve"
2) Use Google's Spam Report form [google.com]
3) Login (or set up) to your Webmaster Tools account and report the problem from the link on your Dashboard page that says "report spam in our index". Because using your GWT account authenticates you, Google tends to give it higher priority and credibility.
Be precise but brief in your note to Google. Report both the actual search keywords and explain the problem - I would be sure to use the word "malware" in an early and prominent way.
I know officially google has decided that MFAs/scrapers are forbidden. But in practice, I haven't seen any indication they've banned many people. The MFAs/scrapers exist in large numbers because of adsense...which in a funny way makes Google the biggest spammer of all because they have closed their eyes to the MFAs from the beginning. Don't you think?