| 3:52 pm on Aug 5, 2010 (gmt 0)|
I have reported to Google using google.com/webmasters to reconsider my site. I've also written to the offending site's web host but have yet to receive a reply. What else can I do?
I'm not sure if this is relevant but I have blocked my site in China using htacess more than a year ago. Accessing my site from China will show a 404 error. (To prevent copying, which clearly failed in this case. But at least they are not copying the new content uploaded recently.)
Do you guys think think is the reason why Google favors the scrapper site and treat mine as duplicate content?
| 4:38 pm on Aug 5, 2010 (gmt 0)|
I suggest you read this: Scraped or Stolen Content: What To Do First [webmasterworld.com]. It's a discussion from 2003, but the situation is not changed since then. You can find this reference and many more in the Hot Topics area [webmasterworld.com], which is always pinned to the top of this forum's index page. For this paritcular thread, look for the section called DEFENDING YOUR RANKINGS.
|Accessing my site from China will show a 404 error. (To prevent copying, which clearly failed in this case. But at least they are not copying the new content uploaded recently.) |
Do you guys think think is the reason why Google favors the scrapper site and treat mine as duplicate content?
Probably not - googlebot spiders your site from a US-based IP address.
| 5:12 pm on Aug 5, 2010 (gmt 0)|
I've speculated elsewhere (including our Updates thread [webmasterworld.com]) that since the Mayday update, Google is getting original source attribution wrong more often than they were, ranking the scraped or mashed-up URL and filtering the original. My theory? It comes from Mayday giving good rankings to "sites" they feel are more popular - and therefore better over all destinations for the search user. The emphasis used to be more on the "page" rather than the "site".
Now this does not necessarily fit in with a pure scraper site - but the lines between a mash-up, an aggregator and a scraper are blurring quite a bit these days.
| 5:40 pm on Aug 5, 2010 (gmt 0)|
Thanks for your reply.
I have read the thread under the hot topics area and I have also done many of the things suggested in the 'Scraped or Stolen Content: What To Do First'.
Namely, I have:
1. Contacted the Web Host/ISP Contact via email but they have not replied me. An automatic ticket has been created under their support area and it seems to be in their spam folders.
2. I have submitted to google via Google Webmasters.
3. I'm not sure if I should contact the site owners as they seems to have ripped off my site manually. I operate an ecommence website and their entire site is a duplicate of mine except of their sidebar and their logo. The seems to be selling similar products and are using all my product descriptions and my pictures. Furthermore they are in China and I don't think contacting them will help. (Anyone wants to see the sites pm me?)
My site has been up for many years and I have submitted a sitemap via Google webmaster tools one year ago. Most of my pictures are watermarked with my site name. The offending site seems to have only started in July / August 2010.
I apologize if I sound like I'm ranting but we need these traffic to sell products and having someone rip off our content and being able to rank higher is really painful. Furthermore, we have employed photographer and content writers to create these content. What else can I do? Anyone can tell me why Google is filtering our site instead? Their site design seems to be inferior to ours.
| 6:11 pm on Aug 5, 2010 (gmt 0)|
About 2 years ago I had a similar issue with a large Chinese website which had taken around 400 pages of content from my site. I threatened them with a DMCA which got a response from them, but they weren't being very quick about getting rid of the content. What really worked was when I actually telephoned their European office and spoke to someone there. She was as horrified at their theft as me! Turned out it was a rogue 'writer' who didn't like to write his own material.
Is there any way you can contact them directly? DMCA threats when you have solid proof you are the original source usually stop most sites from stealing content very quickly. Sadly they'll just move on an steal from someone else. I've found DMCA threats sent directly to the site owner to be very effective many times.
If you do send one to them don't rant at them - however hard it is to restrain yourself! It's much better to come off deadly serious and proffessional - prove to them you've got solid evidence, and explain that it could result in their entire site being punished.
| 6:20 pm on Aug 5, 2010 (gmt 0)|
I think they are a very small company located in China with no overseas office. If they take down my content there will be no more content on their site so I'm not sure if they will do that. I'm trying very hard to get their host who is located in the USA but no reply yet after half a day.
I'm also concern that they'll just change a host or the domain (I found out they have another domain with some of my content as well!) so I think the real solution is to know why Google filtered my site.
| 6:22 pm on Aug 5, 2010 (gmt 0)|
|I've speculated elsewhere (including our Updates thread [webmasterworld.com]) that since the Mayday update, Google is getting original source attribution wrong more often than they were, ranking the scraped or mashed-up URL and filtering the original. My theory? It comes from Mayday giving good rankings to "sites" they feel are more popular - and therefore better over all destinations for the search user. The emphasis used to be more on the "page" rather than the "site". |
After a rather stressful morning with finding my own content outranking me on scraper sites... I thought I would share that in my case, scraper sites outranked my own for 45minutes - 3hours, due to site caching. (content cached, rss was not)
Immediately after I published a new article (WordPress) I started seeing the content show on scraper sites. I spotted the updated sitemap.xml, I saw my new content in feedburner. I could not however find it on Google.
It was 60 minutes or so later that I was able to find it on Google, well... content that led to my domain, and not scrapers.
For myself, it was purely cache related... but made me wonder if Gorgle now treats the scraper site as the original publisher!?!
Several hours later (now) I find my own content leading the serp's, but it took 3-4 hours to wash out. If someone like a ehow or answers were to grab my rss feed and do the same, I would be screwed.
| 6:42 pm on Aug 5, 2010 (gmt 0)|
It certainly doesn't hurt to threaten them with a DMCA. Tell them you either want the pages removed entirely or the content of the pages completley rewritten. Tell them you'll give them x number of days to do it or you'll pull out the big guns. Telephone the hosting company, I've done that in the past and they talked me through their procedure for removal - in the end that guy removed my stuff before I needed to go that far.
As regards to Google giving priority to scraper and mashup sites recently, I agree completley. In fact in an attempt to PROVE to Google I'm the first to write an article, every time I write an article I put a small link to the new page from several of my important well ranked pages. I then give it a few days of sitting at the top in Google for those search terms, then I put a nice big prominent link on my homepage for all my visitors. Then once the scrapers pick up the story I've had a chance to get to the top in the SERPS. It doesn't always stay there, but it's usually in the top 3 results when the dust settles. Some of the sites which pick up the stories are good quality big sites too so it seems to kinda work.
Of course if Google would get their act together and start growing some morals I wouldn't have to waste my time doing dumb stuff like that!
| 7:15 pm on Aug 5, 2010 (gmt 0)|
Anyone knows how long google takes to respond these kind of things?
| 9:53 pm on Aug 5, 2010 (gmt 0)|
|Anyone knows how long google takes to respond these kind of things? |
Google is not going to remove it from the index easily. If they did, what would prevent your competitors from just filing DMCA's against you?
Your best bet is to take this up very aggressively with the sites webhost. Look up the whois, and have your business attorney draft the paperwork for the DMCA, and submit it to the website hosting company. After you submit the paperwork they MUST comply or be held liable (Make sure you tell them that also)
ANY time I have had to work through DMCA issues, the best reply I get from Google is waiting 3-5 days for an email that has nothing but the link to the paperwork for properly filing a DMCA.
Its ALL about getting the legal paper trail started... email and electronic requests will only get you so far.
| 10:30 pm on Aug 5, 2010 (gmt 0)|
I'm currently testing a Wordpress plugin called Feed Pauser, which delays the publication of the new post in the RSS feeds of the blog by x.
| 10:32 pm on Aug 5, 2010 (gmt 0)|
I recently filed a DMCA complaint with Google AdSense over articles that were a rewrite of my own articles (in their own words).
The person contacted me within a couple of days saying yes, they'd used my articles, and they'd take down those rewrites and put up new original articles. I said great, and the next day, they were down and new articles were up. I then received a notice from Google AdSense with the counterclaim the person filed. The counterclaim summarized that we'd settled the matter by their putting up new content. Google's notice said I had ten days to file a court order; if no court order was filed, it would be dropped. I wrote back and said we'd settled it and all was well.
The whole thing took 5 days.
On the other hand, the first DMCA notice I ever filed (Edited to add: with Google AdSense) was over a scraped article, and I never received a response to my notice. I think it was because the scraper had my article on some sort of dynamic list that rotated, and the URL had changed by the time it was investigated - or because my own article was unpublished for a time due to events beyond my control.
I've filed other notices by directly contacting the website owners, and in all cases that I recall, the articles were removed within 24 hours.
| 2:03 am on Aug 6, 2010 (gmt 0)|
Thanks for all the help! I've managed to get the hosting to suspend one of the sites in question. For the other domain that has duplicate content, I've file a DCMA and is waiting to the web host to revert.
One thing that really puzzles me why this scrapper site is rank higher than us. According to Google, this should not happen. If it does, then there's something wrong with the original site. Our site is rather professionally done, sites maps are submitted to Google for a long time. Our articles first appeared from our site on Google SERPs but Google decided to suddenly rank them higher which is really strange as Google must have records that we have publish these content first. The stolen articles are not stolen automatically, they are manually copied and pasted.
The scrapper site looks ugly (as do all such sites) and the layout looks bad. So we really need to find out why what's wrong with our site? If someone knows a reputable company that is able to assist us, do pm me!
| 2:55 am on Aug 6, 2010 (gmt 0)|
In my opinion, there's a simple explanation. The part of Google's algorithm that is supposed to locate the original and filter out the copies is currently broken. Make that more broken than it was before. It's a tough problem, I'll give them that, but it used to be better than it is.
| 5:51 am on Aug 6, 2010 (gmt 0)|
Unfortunately, the longer it stays 'more broken', the more the bad guys are going to try to take advantage of it.
| 2:44 pm on Aug 6, 2010 (gmt 0)|
|Google ranked their pages higher than my site for key words |
They wouldn't happen to have Adsense on them would they? ;)
| 3:21 pm on Aug 6, 2010 (gmt 0)|
Nope, it's a purely ecommence shop selling costumes. They must be selling similar items. They ripped off our photos (with real models) and all our product description. They even ripped off our 'Add to Cart' button!
| 4:38 pm on Aug 6, 2010 (gmt 0)|
What program or sites are people using to discover if their content has been scraped / copied?
Thanks in advance.
| 5:06 pm on Aug 6, 2010 (gmt 0)|
I believe some people use Copyscape.
I used to use Google Alerts. That ended up being more work for me than it was worth, though. I ended end up investigating every suspicious incident of copying, and most of them weren't bothering about.
The other "trick" is to search for an exact sentence from your content, in quotes (not the title, though).
Another trick some folks use is to plant a sentence like "as my sweet Aunt Addie May used to say," naturally in the text of each page, and do regular searches for that.
Nowadays I just check out nearby competitors and keep a loose eye out, and maybe check if I'm wondering why something's falling down a bit.
| 5:18 pm on Aug 6, 2010 (gmt 0)|
|What program or sites are people using to discover if their content has been scraped / copied? |
I am a Google Alerts user, "wrapped with quotes" (for exact match) and sent to my email daily.
| 3:07 pm on Aug 7, 2010 (gmt 0)|
Hi, a quick update. Both hosts of the offending sites have removed the sites in question 12 hours ago. However, when I check for one of the domains again now, it is up again! I called the host that I lodged their complaint with and they claim that they are no longer hosting the site so they must have moved it to another host! Damn!
However, when I do a who.is check it shows a host in China. I think i'm really in deep sh*t.
| 4:00 pm on Aug 7, 2010 (gmt 0)|
Hey there, spica42:
Out of curiosity, are those two sites in China trying to actually sell the same products as you through their sites?
Or are they basically using your content for adwords / adsense (or for linking back to other sites they own)?
After looking at your site (you posted the link on the google support forums), I know that there are many conventions and other activities that are related to your products. One thing I would suggest is to really see about getting inbound links from sites that would find your products of interest and related to their sites.
In fact, I would look into an affiliate program for those sites to make a commission (I know this is off topic from google and SEO)
Also, do the manufacturers of your products have wholesale only web sites (in English)? If so, then I think getting a link from a manufacture's site saying where people could buy their products retail would also be great.
I didn't check out your site thoroughly but you could add value to your site by linking out to different sites for conventions and magazines and other resources that your customers might find helpful.
I hope this helps.
| 3:48 am on Aug 8, 2010 (gmt 0)|
Yes, I'm unsure but I think they are selling similar products.
I'll try your suggestions!
| 1:22 pm on Aug 8, 2010 (gmt 0)|
Personally, I would try to locate their crawler, try and find out what kind of IPs, agent string being used, speed of crawl. Then write a bit of code that knows when its them and block it, or maybe send a different view of the website, given a negative description of the products! Hopefully they will update all their content without noticing! I always have a central include file for all pages which is perfect for such a job.
| 2:23 pm on Aug 8, 2010 (gmt 0)|
Actually, it's a pitifully simple problem but Google are just too dumb or too stubborn to fix this problem properly.
|It's a tough problem, I'll give them that, but it used to be better than it is. |
The only way, and I really do mean the only way to distinguish between duplicate and original content algorithmically in the general case is by the age of the url. In a very few cases, when scrapers are faster than Googlebot, an automated test could be initiated by site owners such that the submitted page is scanned immediately and if it then appears in a scraper site, all duplicate content on that site is then deemed non-original.
However, what you have to understand is that Google don't care. They have always demonstrated complete and total disregard for the property rights and privacy of others. At the very most they do only what is required by law and many people would argue that they don't even come close to that.
| 4:28 pm on Aug 8, 2010 (gmt 0)|
|However, what you have to understand is that Google don't care. |
I think I do..., I write descriptions, the content starts to rank, here comes the Scraper, rips off the content, Gorg ranks Scraped content higher or just below my URL, cause there is Adsence on it(scrapers site) and/or original Writer might be a good candidate for Adwords Program. If they implement what we ask them they would loose 2 potentially lucrative revenue streams, as simple as that.
Happen just last week, I wrote an Article on Technical HOW-To, Basic Level(CODE) and recommendation to contact me if Advanced Method needed. Boom, 90 minutes later, GBOT Picked up URL and started to rank it for the Proper Words, Couple of hours later Bing Picked it up as well. Got 2 New Clients next day, One came from G, One from Bing. Picked up couple of inbounds from Community Members next day as well.
3 Days later, someone copies the content onto a very popular freelance site asking to bid on a project for the Advanced stuff. That content in turn gets redistributed into several other properties owned by that site.
4 days later I am on page 2 for that Article in Google SERP, and #1 on Bing and Yahoo for the Proper Words.
WHY?, I understand 100%:
google_ad_client = "pub-22222222222222";
google_ad_slot = "1111111111111";
google_ad_width = 300;
google_ad_height = 250;
On Freelance Site my content is not even indexed by Bing or Slurp. MIA.
| 4:56 pm on Aug 8, 2010 (gmt 0)|
Well, here's what you do...
You start a website looking for webmasters willing to take part in a class action against Google charging them with something along the lines of conspiracy to defraud. Google could not argue that it's out of their control because I've just explained how to solve the problem.
It would never actually get to court because Google know full well they would loose (barring legal technicalities). They would have no choice but to implement the system I outlined.
Google care about their own IP rights (as demonstrated by all the patents they register) but the only way they will ever care about the IP rights of others is if they are forced to by legal action. Not even bad publicity will make a difference - they have such a dominant position they couldn't care less about bad publicity.
One other thing - You can be absolutely certain that Google are terrified of a court issuing any sort of order that requires them to change the algorithms because that would set a monumental precedent so they absolutely will fix this problem if their hand is forced.
| 6:10 pm on Aug 8, 2010 (gmt 0)|
There is a philosophy that underpins a lot of the technical world - not just Google - that the human race is entering a new age where intellectual property rights just will vanish into "the cloud". If you ever casually reused an image without being certain you had the right, or if you ever took a copyrighted song or movie from P2P or the torrents, you may be sharing in this same mindset.
Given the technical tools that people already have, a major change in the legal concept of IP rights is probably inevitable - if not its complete disappearance. Maybe we can blame the Grateful Dead ;) and the way they encouraged bootlegs.
Sometimes Google does things that are aligned to this emerging hive-mind-cloud of "no individuals" - and the way scrapers and mash-ups get ranked is a sign of it. And sometimes (Youtube and the recording industry, DMCA compliance, Google Books) they can be pushed into honoring the way things are today, rather than some futurist philosophy. But it does seem to take an outside push to make it happen - honoring IP rights is not a core purpose.
| 6:33 pm on Aug 8, 2010 (gmt 0)|
Humans taking an occasional file is unavoidable but automated scraping tools are easily stopped.
There's no excuse for any site to have a copy of someone else's site without express permission, it's easily stoppable.
| This 66 message thread spans 3 pages: 66 (  2 3 ) > > |