| 3:28 pm on Nov 21, 2012 (gmt 0)|
Yep I seeing this exact same thing with my site.
I just checked one of my articles in google using the " " method
I find 3 pages of my copied content, all directly copied from my site, some using my images as well.
I have to click the "omitted results" to see me right at the end in last place and I'm the actual author of the work!
How many years has google been working on there algorithm? in my opinion they are absolutely clueless.
[edited by: engine at 12:38 pm (utc) on Nov 29, 2012]
| 3:41 pm on Nov 21, 2012 (gmt 0)|
|I have to click the "omitted results" to see me right at the end in last place and I'm the actual author of the work! |
How many years has google been working on there algorithm? in my opinion they are absolutely clueless.
Welcome to the club and I agree.
| 4:04 pm on Nov 21, 2012 (gmt 0)|
I have even noticed that sometimes you see a 301 of own site show as duplicated, proxies still a problem also and of cause those whois scrapers which sometimes also copy you full text of frontpage.
I have noticed with the simple script "forbid right click" does a good job be cause those scrapers are often lazy people.
| 4:22 pm on Nov 21, 2012 (gmt 0)|
@MrBreakEven, you have a situation like I have with one of my sites. First check if the other search engine has the same trouble. Submit to the Google scraper outranking doc. Is it one site outranking you and are they using a frame based scrape. You can break that frame which is only a band aid situation.
| 9:42 pm on Nov 21, 2012 (gmt 0)|
|Sadly, Google has a tendency to like freshly found content over anything else... not fresh as in new, but fresh as in they just found it. |
This summer, I posted a pic that I took of the World Trade Center nearly two decades before 9/11. Within a week, and to my utter surprise, out of some 200 million images it ranked 2 in a search on Goofy!
It is still in the top 10.
I have no idea why.
I would monetize my site, but now and then I use a "less than family friendly" word to characterize perverts and pedophiles.
| 10:52 pm on Nov 21, 2012 (gmt 0)|
I used to be plagued by content scrapers on a 12 year old site until I set up Google+ Authorship on all pages last year. I just checked several new pages and none of them have been scraped. So anyone plagued by scrapers may want to try it.
| 12:27 am on Nov 22, 2012 (gmt 0)|
I am stymied by what G considers copying and copyright infringement these days. I found a site a couple of days ago using cop---ape on one of our previously top 5 SERP rank pages that shows as "has 4,024 words matching 33% of the page". It's a 100K+ size page. It's obvious it was copied as our seed data and entire paragraphs are letter for letter the same. The only reason it was not an EXACT match is because they obviously copied it about 2 years ago and whereas we update the page weekly, they have not changed it since then. I submitted a copyright infringe claim via WMT explaining it was of the version which existed in 2010 which is still available in archive dot org, and it came back as rejected, URL not infringing!
Many of our prior top pages are updated daily, so by the time a copy shows up in the search results chances are SOMETHING is going to be different. How much does it take for a copyright violation? So we can just copy pages, change one line and avoid a copyright violation now?
So my main point is if G does not consider an obvious 33% of a page copied as a copyright infringement, shouldn't that mean it should NOT consider it as a duplication penalty either? I'm betting they do, and leaving us with no option to get rid of them, other than rewriting our own work.
| 1:21 am on Nov 22, 2012 (gmt 0)|
Google’s DMCA system has seemingly changed where it is looking more for verbatim copying. In other words Google forces you to appeal multiple times and deal with a parade of dummies saying “no” to everything.
| 6:04 am on Nov 22, 2012 (gmt 0)|
|until I set up Google+ Authorship on all pages last year |
How did you overcome the obstacle of needing to post your email address on the pages?
| 6:21 am on Nov 22, 2012 (gmt 0)|
|I used to be plagued by content scrapers on a 12 year old site until I set up Google+ Authorship on all pages last year. I just checked several new pages and none of them have been scraped. So anyone plagued by scrapers may want to try it. |
Lorel - Thanks for posting this. I've previously cited some threads and posts that discuss integrating Google+ authorship with PuSH feeds to notify Google before wider publication of the standard feed, and I'm reposting them in this thread....
Questioning the wisdom of using fat pings to deal with scrapers
Any further details you can provide would be helpful. Unfortunately, most here seem to be more interested in complaining about Google than in investigating what might be done to fix things.
| 7:10 am on Nov 22, 2012 (gmt 0)|
What if our writers do not want to use their own faces to anchor the article. Sometimes they write under alias. From my understanding you don't use an avatar?
| 2:45 am on Nov 23, 2012 (gmt 0)|
10+ years site. very few problems until I lost most of my G traffic on 09-07-2012. I knew my content had been stolen over the years but have always thought that G ignored those sites. After reading this thread I checked a long snippet of text from one of my pages that had lost page rank and there was over 300 pages with that exact snippet that out rank me in the results.
Also, did anyone else loose a huge amount of traffic on 09-07-2012 exactly?
| 2:23 am on Nov 27, 2012 (gmt 0)|
|I used to be plagued by content scrapers on a 12 year old site until I set up Google+ Authorship on all pages last year. |
I've set that up as well, not sure if it will help since we're a company and not an individual, but it was worth a shot.
Speaking of G+ we're seeing less than stellar G+ activity. I think we've had 5 or so in 6 months time, but FB likes we get nearly that many a day.
| 2:30 am on Nov 27, 2012 (gmt 0)|
Here's a great and very powerful method that I have been using since 2005 to beat these scraper scammers at their own game, I never see people mention it, but this strategy involves not only shutting down the scraper sites, but also getting them booted out of Google. The problem is folks, Google wants us webmasters to do their dirty work and scrub the toilet bowl for them.
Now I too have a high quality site since 1999 that constantly has 300 or more scrapers, almost exactly what Frost_Angel reported here. it's very simple folks, here's what you do, here’s the recipe for revenge:
1) Identify the scraper site
2) Perform a whois lookup on the site to get the abuse email of the web host
3) Send a properly formatted DMCA (Digital Millennium Copyright Act) Notice to web host with screen shot of offending data.
4) Web host shuts down the offending web site, leaves a 404 Error message in its place
Here's my extra spice to this recipe:
5) Submit the dead URL to Google’s URL Removal Tool immediately while dead scraper site is still 404!
6) By the next day check the status in the URL removal tool, it will either say the site was REMOVED or Denied. The tool keeps a history of all the sites you have submitted for you to see.
By performing these steps above, you really put the screws to the scammer, because not only did you shut down his MFA, you removed them from Google’s index. That’s the important part that people skip over.
Also, Waste no time submitting the dead URLs to Google’s URL Removal Tool, as the scammers can pop back up tomorrow on another host.
So the very second I get a DMCA response from a web host, I instantly submit that dead URL to Google’s URL Removal tool. Done! It’s Miller time.
Another great benefit is that as you search Google for your stolen content, you are sure to find as I do, many URLS that are already dead, simply because their domain expired, or they got shut down for other reasons, but they still appear in Google. Feel free to submit those to the tool also, that saves you time.
Now it does take a couple of weeks for Google to re-index and flush them out, but it’s important to get these shuttered sites into the queue to be removed from Google.
If the webhost takes down the offending page, but it’s not returning a real 404 header error (maybe a splash page instead), Google’s tool will whine at you that it can still access the page. But, it offers to remove that page from Google’s cache, which to me is the next best thing.
But it asks you to enter a word that you know that was on the offending URL page before but is not there now. So you have to know a unique word from your content that was on the page but is no longer there. Enter that word into the form, and submit, check it next day. 75% of the time they remove the page from the cache. Other times it just says “DENIED”. I can’t tell why they deny some of them, which is why I specify to the web host to give me a 404 error page, because 100% of the time, with a 404 page, Google will remove it from the main index, and I would imagine the cache too.
I hope this helps many of you who were unfamiliar with this.
Happy shooting and looting!
| 4:12 am on Nov 27, 2012 (gmt 0)|
Thanks. You have obviously expended some time in compiling that helpful post.
| 7:40 am on Nov 27, 2012 (gmt 0)|
Hi, thanks for all the replied. I have a new problem:
Google wouldn't take action against offending sites! :(
Some sites are copying our product descriptions on our ecommerce pages for content (word for word, around 100-200 words each.) and pasting them onto their own site. Just the text is copied and they are using their own images.
I submitted a DMCA request to google explaining that exact phases are copied. If you were to control+F the phrases submitted, they will show up...
The exact request to sent in is as follows:
Our website, “domain.com”, is infringed by the text excerpted on the site, with the text below copied in it's entirety:
Guess what's the reply from Google? ---> "Not enough information provided."
1000s of our pages were copied and we spent hours and hours submitting the DMCA requests to Google but this is what we get. ---> "Not enough information provided."
Do we have a case and how do we get through to Google?
| 3:31 pm on Nov 27, 2012 (gmt 0)|
If Google is telling you “not enough information”, perhaps your DMCA notices to them are not in the right format. I have found different web hosts for example are much harder to work with to shut down a site, and require what they call a “properly formatted DMCA”, you'll find they have common guidelines printed on how they want the DMCA. My current DMCAs are structured to adhere to the strictest of these “properly formatted DMCA notices”.
You just can't send them a note saying these scammers stole our content, take it down.
Many web hosts want you to swear under penalty of perjury, they want electronic signature, tell them you are the copyright owner, etc. So they often ask us to include something like this:
"I have a good faith belief that use of the copyrighted materials described above as allegedly infringing is not authorized by the copyright owner, its agent, or the law. I swear, under penalty of perjury, that the information in the notification is accurate and that I am the copyright owner or am authorized to act on behalf of the owner of an exclusive right that is allegedly infringed."
I use that statement above. I also have my scanned in signature, and a typed digital signature with your name in between slashes, an email address and phone number for them to contact you.
I like to use screen shots side by side showing the offending site and arrows drawn to my site, showing exactly what they stole, because many of these abuse desk people cannot see the forest through the trees, and can’t read hundreds of words to find a few copied sentences. You have to show them.
I give them the exact offending URL,
and the exact URL from our site where they stole the content from.
You have to hand it to them on a platter.
I used to fax Google our DMCA notices, but quit doing it because they don't act on all of them, and it's double work doing a separate DMCA for Google. Remember, Google wants YOU and ME to do their dirty work for them. They don’t want to waste time all day long answering thousands of DMCAs from around the world.
Again, my solution that I pointed out above works best. Even Google suggests you "work with the webmaster to remove your copyright content", which is ridiculous. To me it's lot easier and faster to just get the darned web site shut down, and the index will take care of itself.
I just send my DMCA to the scraper’s web host, get the site shut down, then submit the dead site to Google's URL Removal tool, and within 2 weeks the site is out of the index. Done deal, no extra DMCAs to waste Google’s time, use all the automation that you can. If a scraper site has been shut down, Google’s tool will never come back and say “not enough information”. If the site is down, Google will kick them out. It's that simple folks, a simple matter of physics. If the site is gone, it can't be crawled and indexed. It's like their algorithm says "if it ain't there, it aint staying in our index".
It's fast, easy, and effective. Try my method I perfected over the last several years, and you'll save time that you would have sent DMCAs to Google, to instead get more of your scraper sites shut down.
Sometimes I have gotten sites shut down within minutes of sending the DMCA to the abuse desk. Just had one of those yesterday. Makes your day go real well!
Happy shooting and looting!
| 3:51 pm on Nov 27, 2012 (gmt 0)|
It may also help if, your content was on the web before the scraper snagged it, that you provide Google with a link to a screenshot to your page(s) from archive.org showing that you had the data online first. the problem is archive.org is about a year behind in posting info.
| 4:06 pm on Nov 27, 2012 (gmt 0)|
Yup! We use archive.org all the time, because we often find scrapers who took our content 2 or 3 years ago, and we have to find the version of our site from the time they took our content and send the correct screen shot. Or sometimes you find a domain name registered say in 2007, and you know he stole your content to make his site, so you have to get a 2007 snapshot from archive.org, and it's easy to show the web host that the offending site has no archive.org content, which proves your content was up first.
With some of these web hosts, if you don't send them a 100% match, they might balk at it, or say something stupid like, "we don't see a cut and paste, we something similar but no cut and paste."
| 4:19 pm on Nov 27, 2012 (gmt 0)|
spica42, I've never had that problem and I've done tons of DCMAs in similar circumstances. I think maybe you just left something out or filled the form in incorrectly. It's not the most user-intuitive form. In my experience, Google always removes the page from its index within a week or so, and sends me a notification they've done so.
|I would monetize my site, but now and then I use a "less than family friendly" word to characterize perverts and pedophiles. |
You can totally monetize that. If Adsense has an issue with it, there are plenty of other ad brokers who don't care. I know this from working with a lot of social activist webmasters who write about similar topics in similar language - no shortage of ad brokers who will work with them!
[edited by: Robert_Charlton at 8:32 pm (utc) on Nov 27, 2012]
| 1:49 am on Nov 28, 2012 (gmt 0)|
Spica is accurate and I’ve had about a dozen people inform me of the issue in the past three months. I reported on it in the copyright threads. What the problem is at Google I do not know. I know most of the DMCA’s are handled out of India and Ireland which is a little ironic. Repeated appeals to higher ups can get a reversal but it seems Google has made it more difficult to access the current forms in the first place. Regardless of how you interpret rejecting DMCA’s reduces Google's workload and expense since reports indicate they are growing with them. Interestingly nobody has ever mentioned to me that problem at Bing.
| 5:08 am on Nov 28, 2012 (gmt 0)|
Ok, I'll try to resubmit the the DMCA again. I had previously submitted 70 over pages which took me 5 hours, so this time I'll submit just one page and see if it gets through.
I really need it to work as the #*$! that copied our content is ranked higher than us. :(
Also for JeffOstroff's suggestion I think sending a DMCA to the host may not work as the host is located in China and I've tried sending DMCAs to hosts in China before but they get ignored. Nevertheless I'll try it if Google carries on to ignore me.
I will add in a link to archive.org and see if it works with Google. Previously, I have provided:
1) The entire passage of text to Google. (Around 50 words to 150 words each) If they bother to copy and CONTROL+F the entire passage they will see the same text on the offending site. These PRC merchants are copying our description because I think they are selling similar stuff but they can't write english.
2) A link to the page on my site where the text first appeared
3) A link to the offending sites. Usually it's a few with the exact same text.
I think the information I provided is pretty clear and so when Google asks for more information but do not provide more details on what they want, I really feel like 'flipping the table.' :P Will try to provide a link to archive.org and see if it works.
| 9:38 am on Nov 29, 2012 (gmt 0)|
"I like to use screen shots side by side showing the"
Is there a way to send screen captures to G through their copyright complaint form?
| 10:49 am on Nov 29, 2012 (gmt 0)|
Spica42: "Not enough information provided."
Whenever we get this, we always get an accompanying e-mail from "removals@G-----.com" with a cryptic subject like RE:[x-xxxxxxxxx] with a little more detail. In this case it said something to the effect that they did not have enough specifics about the source file (No idea how I could be more specific, but...)
So check your e-mail for more useful info perhaps.
| 11:06 am on Nov 29, 2012 (gmt 0)|
Nope, no email from anyone regarding this. Is there anyway to email them to ask them for a response?
| 7:47 am on Nov 30, 2012 (gmt 0)|
Hey, guess what, I received an email today. :P Maybe they are monitoring this thread and decided to drop me a msg.
Thanks for reaching out to us.
We have received your DMCA notice. It is unclear to us whether or not you
are the authorized copyright agent for the content in question. Only the
copyright owner or an authorized representative can file a DMCA
Infringement Notice on his/her behalf. Please note that you will be liable
for damages (including costs and attorneys' fees) if you materially
misrepresent that a product or activity is infringing your copyrights.
If you or your client is not the copyright owner for this content, we can
not process your notice. Please have the copyright owner file a DMCA notice
with us. If you or your client is the copyright owner, please provide more
detail explaining how this is the case.
Thank you for your understanding and cooperation.
The Google Team
Anyone knows what other information they need? The email I'm using to file for DMCA is is not the same the domain as the one I'm defending. If I create an email account for that domain. I presume that would be enough to satisfy the criteria?
| 5:38 pm on Dec 3, 2012 (gmt 0)|
I filed a DCMA takedown request in March this year to get a site removed that was using a 2 year old copy of my site. Over 1000 pages.
It was successfully removed.
Traffic is really low at present, so started investigating and 2 days ago I found the same site back in Google's results and ranking higher than my pages for "phrases".
Filed another DMCA and also contacted hosting company who have taken the offending site.
Do DMCA takedown's expire after a few months?
| 12:53 am on Dec 4, 2012 (gmt 0)|
I'm not sure if DMCA requsts expire or not.
I've only had to file a couple of them... until now.
Just found a site that's using an overseas bot to fetch images from our site and display them on theirs on the fly. Normally you could just block the ip and move on, but this site and the bot keep changing IP addresses. It's driving me wild to say the least.
Just when I think I've got them locked out it comes back. Their proxy service wont help and what I think is their host wont help either since the logs only show the ip of the bot, but with them as the referrer.
What's in their whois record as a host is really just a proxy dns server same thing with this "shopping bot" .. more info on it in the search engine identification section.
Sadly I've finally resorted to watermarking the images on the fly with our site name... I really hate doing that.. I personally think it looks crappy on an ecomm site. The software I'm using also lets me edit the image meta data, so I've embedded our information into the image as well. Not sure if their bot or processing method wipes it out or not.
Wish I knew a good cheap attorney that would go after these guys and their sites for what they're doing to our good honest business.
| 8:59 am on Dec 4, 2012 (gmt 0)|
Tell me why when you do everything right like the following:
1) Write totally original content and post to my website
2) Updated sitemap to show new content and submitted to WMT
3) Small snippet of new content posted to my blogspot blog, once again to direct the G bots to my new content
4) Pinged to all the popular networks, once again to show my new content
5) Article indexed and shows up in google
I've made it clear this is my content.
So why is it a few weeks/months later someone can come along and steal all my content and unrank me?
The stolen content leaves me the AUTHOR in last place relegated to the omitted results! Wonderful!
I realised a while back its too stressful working online when someone else can come along and just steal all your hardwork overnight and reap the benefits.
I'm going to build a business that's mine, controlled by me and not dependant on some quirky search engine that gives stolen content priority over the original author's work.
| 9:21 am on Dec 4, 2012 (gmt 0)|
Ok a little update. They approved the DMCA when I emailed them with an address that is associated with the domain.
Time to submit the other 1900 pages.! :P
[edited by: spica42 at 9:59 am (utc) on Dec 4, 2012]
| 10:02 am on Dec 4, 2012 (gmt 0)|
MrBreakEven: What angers me is that the scrapers are not just ranking higher for entire passages of my text, they are ranked higher for three word keywords too!
Imagine that. Ranked on the 1st page for major keywords but you who wrote the original content is no where to be found.
| This 98 message thread spans 4 pages: < < 98 ( 1 2  4 ) > > |