Forum Moderators: Robert Charlton & goodroi
If hello were not a stop word and it was used 10 times, that is not duplicate content.
If you quote a sentence from another site. That is not duplicate content.
How about if you quote a page out of 100 pages? If you quote two paragraphs?
If you quote a paragraph but your page is only a pararaph.
If you quote a paragraph and your page is 10,000 times as long.
I imagine you can see what I'm getting at.
The question is at what level google applies its filter - the answer is - I don't know!
Personally, I don't mind a copied sentence or two, as long as there is a valid link back.
Problems arise when scrapers take half a page of text from a hundred different sites.
Then they can claim that only a small percentage is taken from any given site,
even if they write virtually nothing original.
So far, the engines appear to have let this practice slide, and that's a shame. -Larry
Reason I'm asking is that on my site I got different versions of the same content. Just comes with the tools I use. The printer friendly (which doubles as search engine friendly) pages are rarely used. And the actively used content is not that optimized for google.
I've heard on a thread here and always surmised that Google will use toolbar and user data to determine what pages are being visited to give them preference. So with that thinking in mind, I would use robots.txt to disable the SE friendly pages. But that would be a shame for obvious reasons.
At the end of the day it just seems to me that Google is handing out penalties in an unfair way. Maybe they need to, I don't know, but it seems like they could just rank the answers as best they can and apply the penalty to a page within the site rather than penalizing the whole site. If that makes sense.
>Has anyone done some serious analysis into the extent of the Duplicate content filter?<
Clark - are you only referring to dup. from other sites (scrapers, etc) or from your own site? Google seems to think I have dup. content on my own site, but I don't think I do. (non-www 301s in plae and working)
Comments?
interesting idea there. do you have an evidence that google recognizes dupe content that links to the orignial, seems like lots of blogs quote and link to orignial blog posts with no problems.
The duplicate content whitepapers that have come from the google camp over the last few years seem to indicate that it has a fairly high threshold of similarity before tripping a filter. ie 90% or more non-unique content. and that urls/directory structures play a large role in this.
but of course the issue becomes the processing power needed to find and identify all the duplicates and the originals.
- We have a content site of about 20,000 articles.
- We have approxixmately 700 index pages which act as a site map to the articles. There are approximately 30 articles per page. The index pages consist of snippets (first few lines of the article) and a link to the articles.
- The snippet used on the index page is also the snippet used for the description meta-tag in the article. Therefore, the snippet (first few lines of the article) is used three time:
* 1. On the index pages.
* 2. In the descrption meta-tag.
* 3. In the article itself.
(PM me and I will send sample URLs showing how it is setup.)
Could this cause our site a duplicate content penalty?
We have been dead in the water since Feb. 2nd. I also spoke with a webmaster yesterday who has a similar structured site and is having problems.
Thanks.
The funny thing I have noticed is that we used to have many more of the articles in supplemental results. Over the last few weeks, things have been moving out. Is this a positive sign?
In Yahoo, we are doing great. This is the main reason I cannot go and make wild changes to the site. In Ask and MSN, we get a fair amount of traffic. Ask has picked up in the past few weeks.
Thanks.
"make sure the domain name isn't there. No links and boom, unique content."interesting idea there.
Um, I wasn't trying to give scrapers any ideas. Not that it's rocket science. They can also put a dictionary of words together and create random words on a page for unlimited content.
do you have an evidence that google recognizes dupe content that links to the orignial, seems like lots of blogs quote and link to orignial blog posts with no problems.
Evidence for a courtroom, no. Just used to notice anecdotally lots of scrapers with the same pattern. Title with a link to the original, although often a redirect in order not to pass pagerank. And lately I've noticed Google caught onto the pattern and stopped those sites. But now I've seen several where there was no link and Google did NOT catch that pattern.
Since I had an identical mission statement for every meta description and there was no quick way to rewrite them all I added the page title to every meta description making each unique. I'll have to do a better job later but with Google banning with a slash and burn mentality quick and dirty solutions are a necessary expedient. Which will probably be called "gaming the system." You can't win, you can only make marginal gains this way.
This is the first time I've heard that G uses meta descriptions but I noticed that a huge batch of very different pages were lumped together as "Supplemental Results," probably because of this incidental duplication.
I don't know about others but I do know Google is leaving me much less time to create original content because of all its secret rules about dup content.
i doubt this, but where did you hear it?
Several forum posts returned on searching "Supplemental Results." At least one was a WebmasterWorld post, though I doubt I could find it again.
I usually take things I read on forums with a grain of salt and require some other confirmation--which I seemed to get in this case.
On a search for site:mydomain.com I got three results and the message "In order to show you the most relevant results, we have omitted some entries very similar..."
Those three were one with my standard description and two others where I had departed from my standard and used a unique meta description.
I reason that if all the descriptions were unique or at least began unique, there would be more results shown. Well I made that change, but it's not indexed yet so time will tell.
I recently heard that identical meta descriptions across an entire site will be considered dup content.
When I first started my site, I did this, thinking that since the site is about "xyz", then I didn't see the problem with the global meta description being "xyz". I started the site in July, ranked well in March, and got busted during the Bourbon updated on May 20th. Although I did lose all traffic from Google, I was never in any supplemental results. To take a precaution, I did delete those meta tag descriptions. However, I still seem to be affected by Bourbon. Somehow I think its over for that site.
In December, 2004 my site got hit hard by Google. At first, 90% drop in traffic, then it eased somewhat to just 75% drop. *SIGH*
People kept telling me to check for duplicate content, and that was all I could find. So, I deleted most of the description tags, and the ones I left I wrote new ones specifically describing that page. The result? My site has come back somewhat, traffic is still about 50% of what it was, but I changed nothing else.
This may have had nothing to do with it, but the SERPs were bad in Google until I made the change. Yahoo/MSN were unaffected (so far).
Also how sad we can't set up printer friendly pages for people who like to just print out the article.
As far as a penalty for anything as small as a paragraph. That seems strange as it is legal to quote a paragraph as long as you referance the source. Sure I get tired of scammers doing it but I appreciate when an academic site does it.
The only time I've been hit by a dupe content was during Bourbon and that was a small site of mine that was 301 hijacked. In that case the whole site was brought down but is fine now. Is this always the case or sometimes are individual pages penalized without the penalty affecting the whole site?
I think that it's actually a common sense issue and for a lot of people while they are worrying about it they could be getting on with writing orignal content. Not a criticism just a suggestion.
Why would anyone want to use unoriginal content? Do what hacks do, summarise someone elses work and source it then add your own comments. Bingo a well sourced orignal page!
Why would anyone want to use unoriginal content? Do what hacks do, summarise someone elses work and source it then add your own comments. Bingo a well sourced orignal page!
Content content content...... if your site is bread recipe how many original ways are there to bake bread.
Even if I have a totally new way to back bread and I start my site with.
"Welcome to my bread baking site, we have some great and original ways to bake bread we're going to share with you"
How many sites my start out that way and unless I search and read thousands of sites I have no way of knowing if I have duplicated or nearly duplicated someone elses opening paragraph.... should I be penalized for that?
Of course, they also use a hidden link setup for their supposed link out for their competitors and hidden keyword text. I've reported them for spamming twice. No action in 4 months. Even used the "gilligan keyword". go figure. I don't report anything anymore. Google doesn't really care. They have their favorite target algos.