| 8:15 am on Aug 3, 2013 (gmt 0)|
They must take down the links from search unless they get a counter-notification. Have they? Ask a lawyer, but I am pretty sure that if they just ignore a DMCA notice you can sue Google for infringement.
It is also possible, that with the massive volumes of DMCA notices they are now getting, they are giving priority to people who look likely to sue if ignored. Sue them just once and they may take you more seriously - and, as far as I know, if you have registered your copyrights in the US, then they have to pay your costs if you win.
As far as policing it goes, you could do what Viacom and Warner do, which is to use a service that sends out automated DMCA notices. Likely to be expensive, but an obvious time saver if there are thousands of infringing pages.
| 1:22 pm on Aug 3, 2013 (gmt 0)|
It sounds like you are being scraped at a rate that you probably will never be able to keep on top of. Have you considered putting your actual artwork behind a password, and using the "public" portion of your site to entice people to come in (with text and watermarked thumbnails, for example)
The whole DMCA system fails where piracy is rampant, because it's reactive and not proactive. You will never get Google to *prevent* people from scraping your content and ranking for it unless you block it from Google altogether. So from a practical standpoint, unless you like spending your time this way, that may be the way to go.
(This may be a topic for the SEO forum as opposed to the AdSense forum, since it doesn't sound like it's AdSense publisher related)
| 5:08 pm on Aug 3, 2013 (gmt 0)|
Been there, it sucks. What sucks the most is when G is not acting while it's evident the piracy.
You should... you must: watermark your pictures. As for the text there is little you can do unless they are copying it 100% as it is.
Post content constantly so G bot acts more often, won't discuss the details: it is that way, you will get indexed faster BUT use reporting tools to spread the word you have new content, like RSS pings. Also try the pushubhbushubb hub whatever that's called.
The goal is to get YOUR content indexed FIRST than anyone else. I don't know about your sites so don't take it as "I'm saying you are there" but many sites have good content but fail to get indexed fast, as they should, this way others can monitor and copy, and win the race appearing as if they published the content first.
In my case the text has a lot of value but the images are unique, so watermarking them has helped a lot to stop the piracy. Certain niches make the content worthless without pics, like diagrams per example, or medic images just to say something.
|(This may be a topic for the SEO forum as opposed to the AdSense forum, since it doesn't sound like it's AdSense publisher related) |
True. I believe that while they show Adsense on their site with your material you could use, again and again the "report" tool on the ads and other forms to deal with it inside Adsense.
| 7:40 pm on Aug 3, 2013 (gmt 0)|
What I have found out that I think there was a major issue when with google in 2009-2010. Every page we have submitted a DMCA for has been published prior to that and the scrapper site has posted it after. A lot of the scrapper sites are hosted on blogspot and blogger. That in mine It takes a week or so for google to respond, and you must make it clear what you are claiming is yours. I also posted last night 5 websites that rank in google, for scrapping other peoples sites. They are also claiming fair use, I'm going to contact a company that has been hit hard by the updates. Which has the money and connections to sue them. All 5 owned by same company. I would just assume google doesn't know of the scrapper site
| 8:26 pm on Aug 3, 2013 (gmt 0)|
Are these sites using Adsense? Should be easy to report them to Adsense if they are.
| 8:32 pm on Aug 3, 2013 (gmt 0)|
of course they are 3 ads wrapped around other peoples text. one is top 20,000 site.
| 9:32 pm on Aug 3, 2013 (gmt 0)|
1. Stop scrapers by blocking certain countries. A lot of webmasters find China and the Ukraine are the worst offenders, YMMV.
2. Stop search engines keeping a cached copy of your content, likewise the Wayback Machine. This means scrapers will have to come direct to you to do the thieving.
3. Who is buying the Adsense ads on the infringing sites, and how do they feel about supporting plagiarism? It may not hurt to contact them directly and ask.
| 11:03 pm on Aug 3, 2013 (gmt 0)|
I agree with blocking them, but not #2
We are using way back machine to prove we had it up first: and has been very helpful as a third party.
| 3:30 pm on Aug 5, 2013 (gmt 0)|
Good but actually, going around country-blocking is very easy, the ideal thing is not to block access but to shut them down.
I'm not saying it works 100% of times but as I posted previously #1 make sure your content gets indexed FIRST and them perhaps you should let them enjoy it for a while, 1 infringement doesn't weight as much as 15 solid proofs.
If you shoot too soon they will use your content rewriting it.
| 7:50 pm on Aug 5, 2013 (gmt 0)|
|Since then I am ranked around 9 or 10 for my critical keywords & most of the top slots are filled by sites that are stealing the content from my website & pirating my original fonts & artwork. |
This bring up two glaring issues:
#1 - I can confirm that this is a growing problem in my niche. Using simple SEO tools, many of which are free on various traffic reporting sites, it's all too easy to find a Pandalized site and these are prime targets for scraping now. As you say it's too easy to be outranked with your own stuff when the scraper doesn't have the Panda on his/her back.
#2 - This actually suggests that you can escape a Panda penalty by taking the content and displaying it on a new domain, scraping yourself essentially. I'm not saying transfer the content to a new domain and shut down the old here, I am saying fully duplicate the top pages of your site until a) Panda lets go of your old site or b) the new one stops rankings in which case you can copy it again.
Google never should have confirmed Panda updates and I suspect they won't moving forward. This is a real mess, imo.
Cleaning up the mess - I don't think you can directly influence adsense since they do not deal with rankings. They won't have any ground to close someone's account until your DMCA sticks and/or you proceed with an attorney.
- Use hotlink protection. This will force search engines to display your images from their own cache. If scrapers want to take those they get a lower resolution copy and if they link to the source from their spam sites they link to Google, not you.
|Both Google search & Adsense teams tend to reject or ignore about half of my requests to take action. |
Be very careful with the wording of your request and keep the following in mind: Google will never approve a DMCA for something Google themselves are doing. This includes snippets of text, displaying feeds and image hotlinking.
| 5:53 am on Aug 6, 2013 (gmt 0)|
If they are pulling the text verbatim, without proof-reading it, you can have a bit of fun at their expense with a rewrite rule that feeds something a little different into the content, like:
<h1>PLEASE CLICK MY ADSENSE ADS ON THE RIGHT SIDE OF THE PAGE</h1>
OVER HERE ------> ----------> -------->
Then report the adsense TOS violation.
| 6:47 am on Aug 6, 2013 (gmt 0)|
Thank you all so much for taking the time to reply to this topic. I see that it's being featured on the front page of the website now & I'm strangely honored. The environment here is very different from that of the Google webmaster forums. I have purchased a subscription here & I'd love to take a crack at making a new logo for this site.
There are a number of great suggestions in those replies many of which I am aware of & some of which are already underway. I will read through these responses to make sure I'm getting everything before I make a more detailed technical reply & I will surely be acting on some of this however...
The will to steal our material is a bit too strong to be thwarted by protecting images or churning out dmca's. I am a self employed artist. My resources are limited. I need to protect & watermark my images. Our whole site has been stolen a few times. stolen images eat my traffic every day. With all that said...
Our worst problem is really with people stealing our actual fonts. It's easy for a site to get hold of one of my fonts and generate a web page from it complete with specimens, stats... all sorts of neat stuff, potentially even new & interesting stuff...
But the font is mine. it's nothing like any other font on earth. It's not supposed to be on the sites. It's not supposed to be available for free download. It's not supposed to be earning money for thee useless compilers of other peoples work & Google by proxy. The fonts are copyrighted, trademarked, clearly labeled as such within the software itself.
I can see how this is difficult for google to process but are we supposed to believe that a company that can mechanically filter boobs out of our search results can't detect such behavior. Google opts out of dealing with piracy and makes a direct profit.
Why even have a copyright?
To paraphrase a wise man "all that can be copied will be copied"
| 7:38 am on Aug 6, 2013 (gmt 0)|
How do you identify pandalised sites? Its never really been clear to me.
| 11:45 am on Aug 6, 2013 (gmt 0)|
If DMCA requests are falling on deaf ears, get an attorney. Multiple parties could be named in a lawsuit including the scraper, Google and those who are displaying ads through Adwords. All of them are profiting from the fruits of your labor.
| 12:35 pm on Aug 6, 2013 (gmt 0)|
Got a lawyer. I suspect the lawyer will tell you to register your copyrights, and will send out another round of DMCA notices, and if that fails sue.
Remember, you can get damages, so you can cash by suing, not just remove the material.
|Why even have a copyright? |
Because it is very useful for people who can afford expensive lawyers and enforcement systems. They do rather well out of it. The rest of us get a few fortuitous crumbs.
| 1:53 pm on Aug 6, 2013 (gmt 0)|
You can do all that, but when you're a one man show, it gets expensive and time consuming really quickly.
The thing to do here is realize that you are not going to be able to rely on Google to protect your intellectual property, and the web being full of rampant theft, maybe you need to look at alternative ways of serving up your content. I know it sucks, but at some point you have to be practical.
| 3:36 pm on Aug 6, 2013 (gmt 0)|
The problem is nobody worries out piracy/plagiarism until it's already happened.
You have to be proactive and block bots so they don't scrape the site in the first place instead of wailing about it when your content is spread all over the web like cheese on a cracker.
| 5:00 pm on Aug 6, 2013 (gmt 0)|
|Remember, you can get damages, so you can cash by suing, not just remove the material. |
Probably not if the scraper and his servers are in a foreign country that doesn't comply with your country's legal orders. Which is often the case, because they know the law, too.
Re: the fonts, I think maybe you're barking up the wrong tree here, but that might actually be good news. I have a feeling fonts are going to be treated like software downloads, not content. If someone resells your software or music creation download, they are "pirating" not plagiarizing.
So you may want to forget the DCMAs and trying to get Google to kill their Adsense accounts, and instead just report these sites to Google as piracy sites. The US Congress has made some noise about Google "allowing" piracy sites to rank well, and while Google correctly argued that they cannot be expected to prevent that from happening, they certainly have a duty to try to do something after they've been made aware it's happening. That MAY be all it takes to get these sites out of Google completely.
I suggest you do some research about both methods of protecting your downloads from being resold, and about what to do when that's already happened. Approach it as if you are selling software or MP3s, and you might get much more helpful answers on this.
| 7:18 pm on Aug 6, 2013 (gmt 0)|
|Probably not if the scraper and his servers are in a foreign country that doesn't comply with your country's legal orders. |
Except that Google are in the US, and if they have ignored a DMCA they are liable for infringement - and if you have registered the copyright they even have to pay minimum statutory (not actual) damage and your costs.
| 1:10 am on Aug 7, 2013 (gmt 0)|
|You have to be proactive and block bots so they don't scrape the site in the first place instead of wailing about it when your content is spread all over the web like cheese on a cracker. |
1. How can you block all of the bots? None of them honor robots.txt.
2. It's impossible to block all IP addresses from scraping.
3. Not all scrapers use bots.
I start with a kind copyright violation warning that I email to the owner of the site. Most of the time, this works.
If that fails, I file a DMCA with anything I can find. It has to be really close to your content or an exact copy for many hosts and Google to consider a takedown.
The danger here is that if you go after some jerk in India or Nigeria, sometimes they have tons of time to make your life hell. I've had retaliation DMCAs filed on my Adsense account as well as to Google. There's little you can do unless you want to spend thousands of dollars battling somebody in the courts who is thousands of miles away.
If the site owner has attitude, I find content on their site which has been stolen from sites other than my own. I can often find content big name blogs like Mashable or Gigaom. I then shoot a message to those sites pointing out that their content has been stolen. These bigger sites usually take action immediately. Google listens to them too.
I took this approach with this person who was making hell for me. Eventually it forced him into switching hosts. He promptly removed the content he stole from me as well.
Another thing, block Nigeria and India too via htaccess. Doing so will block 80% of your scraping issues. It's not worth the traffic. It's true there is a way around it, but it makes it way to difficult for most content stealers to bother with your site when they have so many other targets.
| 1:50 am on Aug 7, 2013 (gmt 0)|
|How do you identify pandalised sites? Its never really been clear to me. |
a. Find a site that shows a reasonable estimated chart of historical organic traffic in a graph format. Semrush, Compete, Quantcast, etc.
b. Browse and/or specifically search the domain names of potential victims.
c. Look for sites with noticeable drops in traffic that correlate to a known Panda release date.
d. Confirm vulnerability by copying a single article and getting it indexed. Search google for the article's <title> in quotes. If your article ranks higher than theirs, you're off to the races.
| 3:10 am on Aug 7, 2013 (gmt 0)|
Whether one man, a small business or large corporation, you can choose to be a victim or protect your work.
A couple portions from the DMCA:
|If the provider has the right and ability to control the infringing activity, it must not receive a financial benefit directly attributable to the infringing activity. |
Applies to Google owned properties and infringing websites monetized with Adsense that are ranking stolen content.
|The provider must comply with rules about “refreshing” material — replacing retained copies of material with material from the original location — when specified in accordance with a generally accepted industry standard data communication protocol. |
My interpretation of this, when applied to search engines, is that the original content should replace the stolen content in the serps. I don't see any exclusion to rank stolen content merely because Google has or may have penalized the original author's site with Panda, Penguin or any other demotion.
I'm not an attorney, but I'm not a fool either. Seeing so many scraper sites with Adsense outranking the original sites that are not monetized with Adsense was enough to raise my eyebrows.
| 3:57 pm on Aug 7, 2013 (gmt 0)|
I agree completely with with Google's assertion that they can not stop piracy. It's like asking a water company to stop water pollution. I believe their response was something like... "We will not be the worlds piracy police".
Imagine carri g that over to the water analogy. No one is asking Google to police all the worlds piracy. Many are however asking them to "police" their own system. I'm not asking the water company to end all pollution, I'm simply asking them to deliver clean water to my home.
As several people have pointed out here: you have to be proactive if you expect to make a living from intellectual property such as software or art or music or anything else that can be reduced to "data".
What this post really needs is a list of links to file dmca requests for google search, copyright complaints to Google Adsense, and to contact Google's legal team. Some links to articles or news showing the results of such attempts would shed some light.
As both a provider of organic search and a provider of paid avertising Google seemed to have the "conflict of interests" issue well in hand until panda. I have been a user, partner, fan & stockholder of Google for many years now. As such I gave them the full benefitnofbthe doubt for some time on this issue. My experience over the last two Years however has lead me to believe that this philosophy of negligence is to beneficial to them to be a coincidence.
I'd like to hear any suggestions anyone might have as to how an independent software developer, author or artist of any kind could reach the masses without Google. It's obviously possible to reach "some" people but with Google's near monopoly on the guiding of traffic it's not really a fair game.
I get contract work through word of mouth. I expand the audience for my fonts with availability through major distributors... I am not entirely dependent on Google. I guess I made the mistake of thinking Google and I were partners with the common goal of brining quality content to the world. I was under the mistaken impression that they needed people like me. All they really need is my material. As an actual human being I am just another mouth to feed.
I am dedicated entirely to my craft and to supporting my family. I will never stop. At some point though, my resources and my patience will run out. At this point I feel like I could make more money suing them then working with them. In reality though I have to continue my struggle with the websites. I will continue to catalog what I believe to be willful instances of infringement.
| 4:21 pm on Aug 7, 2013 (gmt 0)|
|I agree completely with with Google's assertion that they can not stop piracy |
I agree that perhaps they can't stop it, but they could certainly do more than they do now. For whatever reason, Bing is much better at identifying the original producer of content, and not letting copies outrank the original. I assume Google has the technical chops to at least match Bing?
Additionally, Google allows themselves some very odd exceptions for their own properties. One interesting example is their Appspot/Appengine product.
It's notorious for being used a negative SEO vehicle. The scrapers find existing, or create their own "proxy cache" applications on *.appspot.com domains. Then, the job of scraping is automated. All they need to do is build links to hxxp://someproxy.appspot.com/victim.com/path/to/article. The proxy then scrapes the article, replacing all internal and navigation links with links to itself...duplicating ALL the content on the victims site.
Then, if you file a DMCA request to Google, showing the indexed appspot.com urls....they reject it, with the reasoning that "it's a proxy".
Adding salt to the wound, Google's own properties seem to have an inherent "boost" in terms of ranking.
So, yeah, perhaps they aren't on the hook for proactively stopping content theft, but they shouldn't be an active enabler either.
| 5:32 pm on Aug 7, 2013 (gmt 0)|
I don't expect Google to somehow detect piracy before it happens, or to always recognize it after the fact. But once NOTIFIED that Site A is pirating, I believe Google has a legal duty to drop them out of the SERPs and possibly cancel their Adsense account as well.
That's why I was suggesting you notify them that these sites are pirating your fonts, and include any documentation you can to show that you created the fonts and/or designed them first. (If you can't attach the documentation in your first contact with them, just describe what documentation you have and ask how to send it to them.)
I don't know if the following is typical but I have one page scrapers just adore. I used to check the SERPs every couple of months and send Google DCMA notices about all the ones that had stolen my page. Now I get trackbacks from scrapers but can't find them in Google. At least in this case, it looks like once Google can "recognize" a page that's particularly loved by scrapers, they do get better about not indexing new scraped instances in the first place.
| 6:12 pm on Aug 7, 2013 (gmt 0)|
|I was under the mistaken impression that they needed people like me. All they really need is my material. As an actual human being I am just another mouth to feed. |
Not even that, actually. Individually, we don't even exist for Google. They are simply not interested in anything that doesn't scale, and individuals don't scale. Heck, they didn't even want to offer customer service because Larry Page felt it was an outdated model that didn't scale (and only reluctantly got dragged back into the modicum they now offer) (See Levy's book "In the Plex" - you probably won't like what you read, but you might better understand where Google is coming from)
I absolutely sympathize with your ongoing battle, but waiting for Google to fix it just isn't going to work. You can take them to court, and maybe you will win, but they have enough lawyers and devices to drag it out forever, so it might end up costing you more than you get. If you win.
Some business models just aren't made for search engines. I would venture to say that making art that people love to steal would be one of them. If it were I, I would get all those images behind a password, or at least the high quality versions of them.
You could take a look at how the stock photo companies deal with it.
I'm not sure what your exact business model is, but alternatives might include mining your past customers via email newsletters, offering only watermarked "tastes" of the art/fonts on your website, requiring people to create an account before viewing them full size, print advertising, subscription models, social media, affiliate programs, etc.
I dunno, maybe you can use a slideshow or video presentation that can't be captured at full high quality resolution.
Google is Godzilla, and we're the collateral damage getting smushed between its toes. It doesn't even notice.
| 9:53 pm on Aug 7, 2013 (gmt 0)|
Top list to do to a scrapper if you are 100% sure its your stuff.
1) Report their money accounts, adsense, yahoo, amazon. (It's why they scrap)
2) Insert a word or your domain name into the text.. It's very hard to disapprove they own the article when it mentions your site in the text.
3) Add links to your inner pages in your content. Sloppy scrappers copy the links too.
4) File the DMCA correctly, I look to take a site out, at the web host. Rather then filing 3 dmca with the serps.
5) If they are scrapping still, bait them with article and links to your site.
6) Lawyers are expensive and in it for the money... I see it hard to justify paying $300 hr for one unless you are losing thousands.
7) Look beyond serp's, its become a traffic game, years ago you type a domain into the serp they would toss you there now they show you search listings.
8) Look to live with out serp traffic, every link that points to your site is a potential pathway for a person to come to your site.Period.
| 11:46 pm on Aug 7, 2013 (gmt 0)|
|1. How can you block all of the bots? None of them honor robots.txt. |
2. It's impossible to block all IP addresses from scraping.
3. Not all scrapers use bots.
YES you can stop almost all of the bots and the following 5 steps will get rid of nearly all of the scraping. I've been doing it for years and although a little still slips thru the cracks, I can deal with one or two problems versus hundreds or thousands of incidents.
It's simple. 5 Steps and most scraping is all gone.
1. White list robots.txt to tell all the good ones OK, the ones you don't want that honor robots.txt all go away.
2. White list .htaccess with the same bot names allowed in robots.txt and include browser UAs like Firefox, MSIE, Opera, etc.
3. Install a bot blocker script to catch everything that slips thru the cracks.
4. Block all data centers
5. Put NOARCHIVE in all pages to stop scraping from cache
| 12:06 am on Aug 8, 2013 (gmt 0)|
Where would we find a bot blocking script. I do agree that if a bot is scrapping your content they aren't going to listen to a robots file.
| This 37 message thread spans 2 pages: 37 (  2 ) > > |