Welcome to WebmasterWorld Guest from 188.8.131.52
I really wouldn't care but for some keywords these sites even rank better than my own site (which is the original!). Those spam-pages are not even two months old and I wonder how they did that.
I have lost at least half of my Adsense earnings in the last week just because of lost traffic. Traffic which is going to some spammer who is copying content from my site and thousands others, putting it on thousands of .info-domains with another thousands of subdomains and interlinking them with each other.
Is there any advice you can give me?
It seems there is nothing I can do against this kind of spam :/
Google prefers developing scalable and automated solutions to problems, so we attempt to minimize hand-to-hand spam fighting. The spam reports we receive are used to create scalable algorithms that recognize and block future spam attempts.
And the main thing you can do is to continue to strengthen your own site, in all the many ways that we discuss here, both ensuring that it's technically sound and has great content.
I have mentioned here many times before, don't waste your time with Google Spam reports they do not act on them individually. You must force Google's hand with legal means, using DMCA:
Here is a previous thread from a month ago where I covered this topic step by step and how to get these scammers shut down:
Scrapers killing our seo rankings
In fact I am living proof this works, I do it all the time. If you run this search on Google with quotes:
You'll see the SERP and Google put up a DMCA notice stating they removed a couple of sites that we complained about through Google's DMCA process. I still have a few more to file there as you can see.
<Sorry, no specific search terms.
See Forum Charter [webmasterworld.com]>
[edited by: tedster at 6:53 pm (utc) on July 3, 2006]
@steveb: it's not about stolen content only. They have copied from thousands of websites passages from articles and thrown them alltogether, the result are texts which don't make any sense, hundreds of phrases and keywords (and variations of keywords), invisible text, hundreds of links with a single page etc.
So it is definetly spam and only God knows why a 100 billion Dollar company can't just recognize that or at least would react when someone reports that!
Almost 100% of the time, the scraper sites we find have indeed stolen content from our sites, and have indeed outranked us with it.
So all I'm saying is Google does not act on your complaints to the "Report Spam" form.
However, if you are a member of Google Sitemaps program, my understanding is the Spam Report tool built into your sitemaps login page is carefully read according to the Google Sitemaps people. That's what they have told me.
THe other trick you can do that we make great use of, is to send a DMCA notice to the scammer's web host. Now even if they are not in the U.S., most web hosts will shut down the offending scraper site anyway, as it violates their policies too.
Still with me? Now once their site goes 404 (Page Not Found), you submit it to Google's URGENT URL REMOVAL TOOL:
and 2 days later the site gets removed from Google. I have done this wtih over 100 sites since May.
This can also remove the duplicate content penalties you might suffer from them scraping your site.
However, if you are a member of Google Sitemaps program, my understanding is the Spam Report tool built into your sitemaps login page is carefully read according to the Google Sitemaps people.
A good suggestion, and one that I can corroborate. If I were Google, I would definitely give more credence to a report from an authenticated site owner than a relatively anonymous report.
Type <one single keyword with over 600 million results> into google and within the top few results you will find a site with no other content than links. This site is nothing more than a cross between a link directory and an. MFA. The links it returns to its link partners are useless and it has a high PR.
[edited by: tedster at 3:27 am (utc) on July 4, 2006]
The other trick you can do that we make great use of, is to send a DMCA notice to the scammer's web host. Now even if they are not in the U.S., most web hosts will shut down the offending scraper site anyway, as it violates their policies too.
Ah yes, but what if the spammer turns around and cites the old 'fair use' arguement? soon enough you might find yourself in court arguing against it.
My site is scraped often. Occasionally, some scrapers can't help themselves and decide to take chunks instead of lines. One such site had copied about 300 words from my site, provided a couple of inconspicuous links back to my site (without credits or acknowledgments) and stuck a desclaimer on the bottom of it's page citing "fair use"
Last I looked this page was hundreds of pages strong and had scraped pages and adsense from every sites in every niche imaginable.
In our DMCA we also point the web host to the exact link on Alexa's internet archive wayback machine, where they can see our site has been cached, but the scammer's site is nowhere to be found.
Also, only once in 4 years has someone tried to use "fair use", and they were quickly shut down anyway.
One other thing to keep in mind is that these scraper sites are often one of hundreds setup by the scammer hiding behind Domains By Proxy type registrations, so often even the web host can't contact them, and shuts them down and it's one net in their fleet of fishing nets, so what if they lose one site.
Bottom line is if you can show they stole content from your site to spamdex the search engine, their site will easily come down.
It's a powerful weapon in your arsenal. Now, I realize not all spamdexing is the result of content theft, but with 2 of our sites being well known, almost 100% of the time these scraper sites scrape a few paragraphs from us, a few from your site, a few from someone else's site, and build a nice page to throw some Adwords on, then they feed all the content from your page into Google's crawler, then they bait and switch after they get ranked, knowing that it will be months for Google to come back and re-index heir page and adjust their rank.
Here’s a few more strategies we use:
Sometimes these sites eventually go down, because they were signed up for a year or 2 and their end is near so it goes offline. When it does, submit it to Google’s Urgent URL Removal tool that mentioned earlier in this thread.
Some of these spammer/scraper sites will out rank you and you’ll see a link to your site on there. Using header checking tools, or just sifting through their source code, you can see they are doing a GASP! A 302 redirect to your site to purposely trash your rankings. Well I have a trick for that one too.
1) Put a noindex metatag on the page of your site where the 302 redirect point to.
2) Copy the 302 redirect URL they are using to point to my page, and paste it into Google’s Urgent URL Removal tool in the “Remove a single page using meta tags.” Feature.
3) Immediately remove the metatag from your page and reload it to the server.
4) 2 days later that 302 redirect will be removed from Google’s index.
For the benefit of SteveB, what do you do when the offending site is just a spammer, no copyright theft? Join Sitemaps, and submit everyone you find, explicitly telling what this spammer is doing, and if it is a duplicate site.
Sorry for the lengthy post, hope this helps some of you!
Thanks so much for taking the time to write those pointers. I will certainly give this a try is much appreciated. I don't understand why the "noindex" tag goes up on my site for such a short time though; could it possibly be to avoid having the Google crawler de-indexing my site also?
Just going back to your opening line regarding screenshots and the waybackmachine; this is something I have always done in the past before contacting the other party. On a couple occasions (I'm not referring to scrapers here) when I had been sucessful in contacting the offending webmaster , I found that this strategy accompiend with a C&D letter worked well. However in one case I tried to contact a webmaster who had stolen images from my site and had altered them slightly. This webmaster simply denied everything and even tried to accuse me of copying him. This is what I fear most; if my page hasn't yet been cached or archived before the offender's page, or if I had updated my content after it had been stolen and before it had been cached. I also fear if there is ambiguity caused by defences such as'fair use' or 'altered' content.
I might be over cautious but I tend to proceed very carefully as I don't want the matter gets to a point where costly legal representation is required. i think my cautiousness is mainly due to the fact that I've been there when it comes to asserting my copyright with costly representation before.
I think it is I who needs to apologise for my lenghty post. Cheers.
They cannot cite fair use when they have copied your content.
Depends on how much of your content they copy.
An entire page is blatant theft but a paragraph or a specific quote is probably fair use.
I'd be real careful because you can get sued if you damage someone that's working within the letter of the law and your use of the Google Emergency Removal tool against a site you don't own could get you in a heap of trouble.
I don't condone what these people do, but you need to be real careful and make sure you're interpreting what they did versus actual copyright law properly or you can end up on the short end of the stick.
Now if you must do some dirty deeds, at least use an elite proxy that completely masks the origin of the removal request ;)
Depends on how much of your content they copy.
An entire page is blatant theft but a paragraph or a specific quote is probably fair use.
All search engines have no choice but to act on your DMCA complaint... even if the material could be public domain - they are really not overly concerned about someone infringing on your rights but protecting their own.
Search engines are in no position to prove or disprove your claim they merely wish to protect themselves from you filing a lawsuit of coercion.
If I filed a DMCA claim with Google, Yahoo, and MSN, and they ignore it you can be certain my lawyer will be filing against them and the likely punitive damage will be sizable.
Therefore, so long as you are certain that you physical wrote the text info, took the picture, video, audio, etc. is 100% worth your time to file and they will remove.
It is worth noting that the site owner can file a counterclaim at wish time you have 14 days to file court action and notify search engines or they will reinstate the information in question.
This is a fundamental problem and there are no real solutions I am afraid.
If it was as easy as a DMCA, why can't I just copy someone's website and start filing DMCAs against *them*?
Sure, there is thread of perjury, but does Google really have the resoures to validate these DMCA takedown notices?
Actually, the scammers we have filed DMCA reports against have NEVER filed responses challenging us. If the web host shuts down their web site for stealing content, they could care less, their automated software has setup a hundred other sites.
Remember that site last month with the 5 billion pages? Shut down one of his pages, he probably did not even notice.
Also incrediBILL, you expressed concern about us getting in trouble for yanking someone else's website out of the index for 302 redirects. We are not removing his web site out of the index, we are removing that "302 URL linking to our site" out of the index. His site will remain intact in the index, but we are just having Google remove a frauudlent 302 redirect to our site that was placed 99% of the time with mailicious intent, and without our permission.
In the case when a site goes down, there is no law against telling Goolge there is a dead link in their index. Google is the one who is removing the link, not you.
Lastly incrediBILL, you should go check the copyright office web site. Some of your statements are not correct related to fair use and they clear that up there.
Sites may NOT even paraphrase your content. We get interviewed by the press all the time, and if they do use fair use, they must give credit to our site. That is the difference between legal use of our content, and scraping it and putting it onto a spam page with your own copyright statement.
What these scraper sites do in an attemp to appear legit, is they send their robots out to visit everyone's site, and they scrape 2 or 3 sentences off the site. DMOZ gets scraped all the time. Then they build a fake directory or SERP page with all these sentences from all these high ranking sites.
We often nail them because they are lifting unique sentences off my site. SO what you should do is grab a unique sentence from your site, and search Google with quotes, and see what other web sites are shwoing up for that search. IF these guys show up before you, then there is a problem because it means Google is hitting you with a duplicate content penalty.
Other tricks you can use are link: command to see who is linking to to you, and check every single one to make sure there are no 302 redirects.
Also, use intitle command to see and inurl: an dinanchor: to catch sites who are trying to use your web site domain name in unethical ways to get ranking off your good name.
> I have wrote spam reports to Google.com and Google.de
> about a spammer who is using some kind of 5 billion-pages-style spam,
This is not about "spamming" - this is page jacking. You are confusing a spammer with a page jacker - a big difference.
The spam part is Googles to deal with and they probably already have in the algo. That part is not your concern - that is googles. How they choose to deal with it - if at all - is up to them and them alone. They probably won't ever do anything.
Now, back to the issue here:
> copying thousands of articles from my website,
Contact your legal counsel to contact the site and take action.
If you are getting beat by automated spam - seek other employment opportunities...
[edited by: Brett_Tabke at 8:11 pm (utc) on July 5, 2006]
Every page jacker is a spammer!
So this thread has everything to to do with Goolg, with spamming, with content theft, and how to fight all of them.
Because they are creating unwanted garbage in the index using info stolen from other web sites. So there are 2 different types of spam, non stolen content, and stolen content. DOes not matter which type of spam it is, it is all spam in the index and people want it out. This thread dealt with how to get it out, and why the spam report forms don't seem to provide results for some people.
You assessment that this has nothing to do with Google is a wide right, my friend, it has everything to do with Goolge, and although you make cmments about it not being our business, Google has made it our business by giving us the Report Spm FOrm. Furthermore the Sitemaps team has requested that we submit spam reports of all kinds through the Sitemaps version of the form which they said gets more attention.
The other poor folks here who are getting neaten by automated spamming tools would rather not run away as you suggest, they'd rather stay and fight the spammers in their own game. We jsut provided some of the ammo to help them.
Maybe if the same happened to your web site you would be of a different opinion.