|How to remove Google's duplicate content penalty?|
Google gave quick response, but still puzzled
My attorney is in the process of filing DMCA complaints against a competitor's website which has duplicated all our content. In terms of the DMCA stuff, I'm not worried since it should be a slam dunk case.
What I AM worried about is the fact that the competitor's website gets indexed before our site, and therefore it may appear to Google that WE are the website publishing duplicate content.
SO I had my attorney write another letter to Google's help team (with some help from me), describing the situation. Google's help team was nice enough to look at my site and let me know that it was not receiving a penalty for having "duplicate" content.
It's not that I don't trust the Google team--afterall, my site is a PR5 and gets decent traffic from Google--however, I'm just not sure why there wouldn't be a duplicate content penalty on us.
Surely if another website has duplicated our content (and gets indexed before us), there should be a penalty right? And given that the competitor's website has higher PR than our site, it stands to reason that the penalty is more likely to be on us than them.
Either Google's duplicate content penalty does not exist, or that we are suffering from it. (Or I'm freaking paranoid, which is definitely a good answer as well.)
I don't want to bug the nice folks at Google anymore. They've been very helpful but I think they've reached the end of their patience with my questions. Is there anything else I can do to check out if my site has the duplicate content penalty?
<email quotes removed - see TOS [webmasterworld.com]>
[edited by: tedster at 3:34 pm (utc) on Dec. 30, 2005]
If your communications with Google focused on the duplicate content "penalty" then the emplyee who answered you will be someone whose job focuses on that area - penalties.
I'd say go ahead with the DCMA complaint with Google. That will be handled by a different area of Google and they should remove the offending page from their index. Then the algorithm will not have any duplicate content to cope with.
I think part of the issue here could be that you are using the term "duplicate content penalty". Your PAGE may be "filtered" by the algorithm on certain searches because of duplicate content without your SITE receiving a penalty.
You wrote "my site is a PR5 and gets decent traffic from Google". That doesn't sound like a penalty on your site, and the Google team has apparently confirmed this for you. Rest easy, file the DMCA, and all should be well soon.
I think that as far as Google is concerned there is no dup content "penalty". Penalty being defined as an intentional punishment imposed upon an offender.
Their algo is simply designed to reduce duplicate content by only providing a given content set from what is determined to be the most relevant source.
What you need to determine is why the other site has been determined to be a more relevant source of your information than you.
1) Do they have better incoming links?
2) Do they have more frequently updated content?
3) Do they have a better site structure?
4) Is there site more "compliant"?
5) Have they been around longer?
6) How does your domain name history compare?
7) Do they have complete contact info, TOS, etc. on their site and you don't?
All of these thing (an many more) could tip the scales as to who the algo determines is more relevant.
[edited by: Frequent at 3:59 pm (utc) on Dec. 30, 2005]
Thank you Tedster, Frequent.
If both Google and a WebmasterWorld admin tells me to rest easy, I really should take that advice. =)
Frequent, excellent questions!
1) Do they have better incoming links? No. They have almost 0 incoming links. We're linked from Yahoo directory and Dmoz.
2) Do they have more frequently updated content? Yes. They duplicate content from several other websites and gets indexed almost instantly.
3) Do they have a better site structure? Not really. It is just a glorified blog.
4) Is there site more "compliant"? Other than the duplicate content, it doesn't do anything weird. It does run google ads (my site doesn't). Wonders if that makes a difference.
5) Have they been around longer? No.
6) How does your domain name history compare? Not sure what you mean. We are registered since 2004 and they were just created six months ago.
7) Do they have complete contact info, TOS, etc. on their site and you don't? No. There is no contact info on their website. Their whois info is all fake.
I think the major reason their site is deemed better is because they have a lot more content (like ten times) more content than my site.
Maybe they are habitual content thiefs & are doing so to earn from adsense.
You can and must report them to adsense.
Read the part on 'Account Termination' ( It's towards the end)
Hmmm... I didn't know Adsense has a separate DMCA filing procedure than the regular Google DMCA filing. Thanks for the tip!
Don't spare them.
BTW, for the next time you're feeling paranoid ;-) a duplicate problem can often be spotted, if any of your previously indexed pages start going Url only/supplemental, or if a snippet in quotes of the duplicate content sees your page come up as supplemental, you've probably got one.
Not definitive by any means, but......
Well, if it's a "glorified blog" as you say, and they are taking advantage of RSS then that is a big part of why they are spidered faster.
As I (and others) have mentioned in prior posts. Google is a bit blog crazy lately.
Not a surprise considering how easy it is to publish an RSS feed and have it distributed by literally dozens/hundreds/thousands of blog networks, agregators and feed-fed scraper sites.
That and the blog format is so search engine spider friendly that it's scary. Blogs are the fast food chains of the internet when it comes to feeding the spiders.
I don't think we show up as supplemental results. This is generally what happens:
1. We publish an article.
2. Within minutes our competitor copies the article.
3. Google indexes competitor site within 24 hours, picks up article.
4. Our article does not get indexed at all, or, if it does gets indexed, it sometimes appears as the second result behind the competitor's page.
Frequent, that's an interesting point. Our site also publishes our articles via RSS feeds, but we only publish the first 2-3 lines of each article.
We are listed in some blog search engines, but we haven't really made a concerted effort to get our RSS feeds listed everywhere. DO you think we should?
"DO you think we should?"
Yes, by all means pursue marketing (blog directories are jumping up everywhere) your blog and feed as you would yoour e-commerce site.
>>Their whois info is all fake.
Report that as well, that's an ICANN violation.
Blimey, this is plain compulsive shop-lifting: Theft.
Don't waste time asking the infringer to remove it. In my experience they simply baulk, grumble, pull it, and repost it when you're not looking. (The stories I could tell you of the run-a-rounds I've had with infringers. And their excuses.)
Send G a DMCA first, list every infringing url carefully, and be precise. Format it like they ask. And be certain you own the copyright to the material.
If it's a forum post, look for print versions (ampersand variations etc) of your stolen material. Both on the thief's site, and in the SEs. (Don't forget to DMCA MSN, and Y! if they've have it too.)
Wait patiently... once they have removed it (G is v slow: 6-9 weeks, Y! and MSN are fast), send another DMCA to the infringer's host (or if they run their own server, DMCA their pipe supplier).
The order is very important long-term.
SEs first: Thief's host later: Thief last.
Wiser Bods will know why, rash ones will pay a price.
In addition, try to stop the thief receiving your RSS feed. Spot their RSS bot IP, and 403 it. This sometimes needs a bit of detective work, but it's satisfying. Like squashing mosquitos mid-air.
Thanks ronin100, I will make that my top priority for 2006!
Marcia, I was considering doing this but wasn't sure if it would work. But if you say it is effective, I will give it a shot. Thanks.
Angonasec, good advice there, thanks. I think my lawyer was going to do the SE and hosts at the same time. What is the reason for doing the host second?
Also, what is the best way to spot their RSS bot IP? Several websites subscribe to our RSS and they bring in good traffic. I don't want to accidentally block the wrong folks.
>>Their whois info is all fake.
Report that as well, that's an ICANN violation.
Unfortunately, though it is an ICANN violation, good luck on getting any mileage out of it. The registrar will simply allow the offender to "comply" and submit "updated" registrant information.
"My attorney is in the process of filing DMCA complaints against a competitor's website which has duplicated all our content. In terms of the DMCA stuff, I'm not worried since it should be a slam dunk case."
The only problem with filing a DMCA with google is that they allow a counter notification to be filed, which means they throw the ball back in your court. If the infringer files the counter, he is basically checking you out to see if you are willing to pursue it in federal court (very very very expensive), or just bluffing. For this reason, it may be best to check out the registrar this thief used. See if you can file a dmca with the registar, because if you can, he'll lose his domain name which would be the most effective way to flyswat him.
Sorry to hear your dilemnas, I've experienced the same issue before with a theif in the same genre as ours.
Make sure all duplicated content is removed from your index page and important pages. As someone eluded to earlier, a SITE rarely ever incurs a full penalty, usually only the page(s) that show duplicate content, and it is a filter.
Only in extreme cases have a seen a site wide penalty applied and that is rare.
In terms of RSS, I only syndicate a teaser of the text. This helps attract the visitor to your site to finish the article, so you are on track there for sure.
Go through Robin Good's list of the top 50 RSS directories. The list is actually alot larger than that now. Take the time to go through and properly register at each. The return is well worth it.
Get a My Yahoo account and add your rss feed to your my yahoo site via the 'add content' button.
This all helps fuel your rss feeds bigtime. :)
flowermark, can you sticky me your URL and your competitor URL as well?
Thanks for all the support and information once again. I really appreciate knowing that other people have gone through this. =)
ownerrim, that sounds like a good idea. If he loses his domain name, at least it will take him a while to "age" his duplicate content somewhere else.
CainIV, I only syndicate a two line teaser. Somehow he is able to strip that two-line limit and pulibsh the whole thing, with pictures! That MyYahoo tip is a good one. I never really considered doing that before. Does that make Yahoo like you better?
moftary, thanks for offering to check out our problem. However, as we are engaged in possible legal actions against this infringer, our attorney has suggested that we do not share this information with others. Sorry, I know you mean to help, but you know those darn legal types. =(