Forum Moderators: open
I am generating a couple of cross-reference pages from a 'product' database and this results in pages that are almost similar because the products themselves are already very similar.
Maybe GoogleGuy can help
If for example a single word was changed, it is no longer identical but ever so very very similar but I bet Google treats it as identical. How much needs to be changed before Google treats it as a page in its own right and not a duplication.
Between identical and similar, there is a grey are to cross over but who knows how wide that gray area is?
so similarity (about 80%) works for me, but if you duplicate 1-1, you could be in trouble....
I seem to recall reading that these do not draw a penalty simply because they're wrapped in different designs and navigation schemes.
It would certainly help if we could gain greater clarity concerning what is and what isn't regarded as page duplication.
It sometimes seems that with our obsession with Google's rules we sometimes get too caught up with the theoretical and ignore what we actually see in front of us everyday.
Most times when I search on Google I find duplicate content. And, depending on topic, it can be duplicated on quite highly-placed pages. Whether reprints of articles, extracts of dissertations or historical documents it's all there plain to see. The snippets are often much the same.
Just did a search for a popular market newsletter by a well-known web positioning company (this ain't no plug). Most of the pages returned are identical or real close to it. And I found the first PR0 result somewhere around page 40 of the SERPs.
Now, I'm not saying that Google doesn't penalize for duplicate content, but they don't appear to penalize ALL duplicate content.
Don't know what the threshold is: almost identical doorway pages on the same site; identical pages on related sites (interlinked?, same server?); one identical page on multiple domains might be okay, two aren't? Who the heck knows?
I'm in the process of designing a site optimized for Google and -- no matter what I say above -- though the underlying content is the same, it won't read the same nor look the same.
Jim
nope..the thread was posted on another site totally....
"Soapystar so tell us how good we are, were the answers the same on the other forum?"
this forum kind beats the other thread out of site.... ;)
btw..someone did post some info about search engines looking for a minimum of 8-13% difference between pages..
I am guessing that Google must be a little bit smarter than just counting word frequencies, and also takes into account structural similarities. There is info available at the web about page similarity check algorithms and I was hoping someone would have converted one of these algorithms into a tool for all of us to use.
But since we don't know which algorithm Google uses and neither the magic similarity percentage that triggers a penalty, the use of such a tool would be limited beforehand.
Near duplication can cause problems.
A year ago, Google definitely roled out an overly aggressive duplication filter, but I haven't seen any signs over the last 8 or 9 months that that filter is still being used.
There is simply too much natural occuring duplicate content on the web.
However, going after near duplication is a different story. A large group of pages that are very close to being identical, probably look that way because someone has intentionally altered them so that they wouldn't be an exact duplicate. And that is the kind of actions that generally contribute to a poor search experience.
So, you you should either make sure that your pages are an exact duplication, or significantly different.
Also did the ' GoogleGuy ' mention any percentages when talking about 'slightly' or 'significantly' different?
Thanks again,
S.
That being the case, it is quite common for Googlebot to stumble across some of these additional domains. That causes them to reindex a site that they've already crawled under a different domain.
>>Also did the ' GoogleGuy ' mention any percentages when talking about 'slightly' or 'significantly' different?
Of course not. :)
I think that what Google and other SEs would be looking for is not the duplicate pages between or among websites, as WebGuerrilla states, much of this occurs naturally (syndicated reports, affiliate product descriptions, press releases, etc.).
What they are most probably after are the "doorway" pages that are essentially the same with just a tweak here or there to appeal to different SEs or target slightly different keyword phrases. There was a huge push on this a few years back when SEO wannabes were pumping out 10s & 100s of these pages per site.
Look at your pages. If they look, feel or smell spammy, well, then they might pose a problem. If they serve a legitimate purpose you're probably okay.
And again, just in your normal searches of Google, notice how many dupe pages are indexed and rank quite highly.
Jim
Let's say i have some pages, <page.html>, <page1.html> etc, where people register for my services. The trouble is that lots of people find registration confusing. They lose track of where they are.
So i am trying to create an alternative experience for my visitors whereby they can choose to be led through a serious of interconnected pages with special navigation guides. The material on these interconnected pages is identical to the <page.html>, <page1.html> except for navigation aids. (they might be named <pagea.html>, <pagea1.html>, etc.)
These pages would all exist on the same server in the same root directory. Is this the sort of "duplication" that i could be penalized for?