homepage Welcome to WebmasterWorld Guest from 54.166.105.24
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 176 message thread spans 6 pages: 176 ( [1] 2 3 4 5 6 > >     
Adam Lasnik on Duplicate Content
tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3192967 posted 6:06 am on Dec 19, 2006 (gmt 0)

Google's Adam Lasnik has made a clarifying post about duplicate content on the official Google Webmaster blog [googlewebmastercentral.blogspot.com].

He zeroes in on a few specific areas that may be very helpful for those who suspect they have muddied the waters a bit for Google. Two of them caught my eye as being more clearly expressed than I'd ever seen in a Google communication before: boilerplate repetition, and stubs.

Minimize boilerplate repetition:
For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details.

If you think about this a bit, you may find that it applies to other areas of your site well beyond copyright notices. How about legal disclaimers, taglines, standard size/color/etc information about many products, and so on. I can see how "boilerplate repetition" might easily soften the kind of sharp, distinct relevance signals that you might prefer to show about different URLs.

Avoid publishing stubs:
Users don't like seeing "empty" pages, so avoid placeholders where possible. This means not publishing (or at least blocking) pages with zero reviews, no real estate listings, etc., so users (and bots) aren't subjected to a zillion instances of "Below you'll find a superb list of all the great rental opportunities in [insert cityname]..." with no actual listings.

This is the bane of the large dynamic site, especially one that has frequent updates. I know that as a user, I hate it when I click through to find one of these stub pages. Some cases might take a bit more work than others to fix, but a fix usually can be scripted. The extra work will not only help you show good things to Google, it will also make the web a better place altogether.

[edited by: tedster at 9:12 am (utc) on Dec. 19, 2006]

 

CainIV

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3192967 posted 6:30 am on Dec 19, 2006 (gmt 0)

Interesting, especially the boilderplace part. All of my article pages of 100% unique content contain a snippet of text about me - about 40 words in total.

There are about 3-4 different variations of the text.

Wonder what the threshold is in terms of dupe content and that resources area.

BeeDeeDubbleU

WebmasterWorld Senior Member beedeedubbleu us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3192967 posted 9:09 am on Dec 19, 2006 (gmt 0)

I know that as a user, I hate it when I click through to find one of these stub pages.

Me too! Google would be doing us all a favour if they filtered these sites out.

walkman



 
Msg#: 3192967 posted 9:34 am on Dec 19, 2006 (gmt 0)

would including, let's say, a menu as javascript work?

idolw

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3192967 posted 9:48 am on Dec 19, 2006 (gmt 0)

Use TLDs: To help us serve the most appropriate version of a document, use top level domains whenever possible to handle country-specific content. We're more likely to know that .de indicates Germany-focused content, for instance, than /de or de.example.com.

does that mean i should make german version of my main site on www.mysite.de?

Adam_Lasnik

5+ Year Member



 
Msg#: 3192967 posted 9:55 am on Dec 19, 2006 (gmt 0)

Interesting, especially the boilderplace part. All of my article pages of 100% unique content contain a snippet of text about me - about 40 words in total.

I wouldn't worry about a 40 word snippet of that sort, unless it's the primary content on many of your pages.

would including, let's say, a menu as javascript work?

Walkman, I'm not quite sure I'm grokking what you're asking. Could you please clarify? ah, heck, I'll take a related stab anyway:
"javascript menus"... if this means site navigation that's broken without javascript, well, that indicates a potentially significant user experience problem, and that should generally trump any SEO-related concerns!

does that mean i should make german version of my main site on www.mysite.de?

If you already have German language content that's indexed / ranked decently in search engines, then I'd hesitate starting over, but otherwise yeah, I think putting German language content (or, more specifically, Germany-audience-targeted content) on a .de domain is a good idea.

On that note... ack! I gotta eventually reset this nightowl schedule! I'll stop back here again tomorrow, er, later today :-)

Whitey

WebmasterWorld Senior Member whitey us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3192967 posted 10:05 am on Dec 19, 2006 (gmt 0)

Great clarification Adam .... thanks.

I presume this applies to page templates with a substantial amount of similar content and functionality [ e.g. search boxes with lot's of similar menu's ] as well, where the ratio of this to unique text is high.

[edited by: Whitey at 10:06 am (utc) on Dec. 19, 2006]

glengara

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3192967 posted 10:12 am on Dec 19, 2006 (gmt 0)

Could be bad news for sites that use manufacturer/product type drop-down options, these lists can turn up in the text only cache giving a large chunk of "boilerplate" content.

Whitey

WebmasterWorld Senior Member whitey us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3192967 posted 10:18 am on Dec 19, 2006 (gmt 0)

Adam ... look forward to waking up our time to these comments! Could mean a quick fix is needed for 10's of thousands of sites.

Crush

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3192967 posted 10:19 am on Dec 19, 2006 (gmt 0)

Adam the new googleguy. Post more stuff like this adam.

fjpapaleo

10+ Year Member



 
Msg#: 3192967 posted 10:26 am on Dec 19, 2006 (gmt 0)

Finally, something of substance from Google. Thank you Adam.

asher02

5+ Year Member



 
Msg#: 3192967 posted 10:41 am on Dec 19, 2006 (gmt 0)

Hi Adam,

What is the best practice for an ecommerce web site that has similar products with only different colors for example.

The best user experience is to assign a page for each with the product photo, the description however will be similar for Google.

How can we avoid duplicate filters in this scenario?

Receptional

WebmasterWorld Administrator receptional us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3192967 posted 10:45 am on Dec 19, 2006 (gmt 0)

However, we prefer to focus on filtering rather than ranking adjustments ... so in the vast majority of cases, the worst thing that'll befall webmasters is to see the "less desired" version of a page shown in our index.

I have heard matt also say this before, and Paul Ahaare (spelling?) and I disagree. Or rather - I think you say this from your perspective - but that perspective is not the way we'd see it. I think that if you filter one Dupe URL and include the other, then the damage is significantly more than showing the wrong URL, as it will also mean that humans will be linking to both versions of the content and therefore will be splitting the content's reputation between the two URLs. Regardless of whether you filter one out or not, the point is that the content - duped or not - is not getting its "fair share" of reputation because there is no attempt to pass the dropped reputation to the url perceived as genuine. The biggest example of this is, of course, www vs no www urls, but these can easily be fixed by the webmaster. Presumably - through good design - most other problems can also be mitigated through 301s but don't tell me that
the worst thing that'll befall webmasters is to see the "less desired" version of a page shown in our index.
unless my assumption here is wrong. I don't think it is, do you?

Crush

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3192967 posted 10:50 am on Dec 19, 2006 (gmt 0)

OK here is one for you Adam.

Years ago we had site A, for city A, site B, for city B etc etc

We linked them together and got a crosslinking penalty.

Now we have 1 mega site in loads of languages on 1 url.The url does not look good in all languages and we would have preferred to cross links

site.com, site.de, site.fr, site.es etc etc. All different content. Are you saying a crosslinking penalty would no longer apply?

Leosghost

WebmasterWorld Senior Member leosghost us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3192967 posted 10:55 am on Dec 19, 2006 (gmt 0)

If you already have German language content that's indexed / ranked decently in search engines, then I'd hesitate starting over, but otherwise yeah, I think putting German language content (or, more specifically, Germany-audience-targeted content) on a .de domain is a good idea.

another G policy which favours only the big guys ..how is a mom and pop site or a small business supposed to put it's content aimed at for example French or Australian audiences on those TLD's ..both of those require to be either a citizen of those countries or to have physical business presence ( in France that will cost you around $7,000.oo to $10,000.oo per year for the physical presence ..or $2,500.oo followed by $7,000.oo per year in charges and fees per year to have a french lawyer do it for you ..even if you make not one red centime of turnover ) ..

So in G's eyes these two countries at least should have closed internet economies where ( due to their TLD requirements ) only businesses and citizens of said countries should expect to be found in their serps ..

You ..G..( with your 85% market share here ) dont want the rest of the worlds websites to be visible from here?..you dont want that we should buy from elsewhere? or get our info in a language other than french?

This kind of heavy handed geo targetting is going against the free flow of goods , services and information that the internet was supposed to be promoting ..google does not know best about what the surfer wants ..if we dont choose regional ..then dont force it ..either directly via serps ..nor weighting TLD's to the languages ..

Already the results from " the rest of the world " are heavily skewed in France to send me back predominantly french language results based on the IP number allocated by my french ISP ..If I want french languge results I'll say so by using the radio buttons to choose it ( respect that choice and stop filtering even "web" to weight it to mean from France ).. quit trying to second guess me and the other one million or so English mother toungue speakers and the other million or so anglophones here.. who would like to be able to choose results from the world web ..and actually get just that ..

Oliver Henniges

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3192967 posted 11:33 am on Dec 19, 2006 (gmt 0)

What is the best practice for an ecommerce web site that has similar products with only different colors for example.

The best user experience is to assign a page for each with the product photo, the description however will be similar for Google.

I doubt that, and I believe that it has been google policy for years now, to derank sites with thousands of single product-pages differentiating only with respect to minor aspects.

As a visitor, I would very much appreciate if the site owner grouped those products, so that I can either see all the different colors of those otherwise identical products on one page, or if I can see all your different products of one specific color on one page ( the latter a strategy which I have performed quite successfully for years now), or - as a third alternative - let me as a user chose which categorization to apply.

It is a completely different matter for products which require a technically detailed description, like e.g. electronics, cars, or machines: Here as a user I would expect the "one product one page" -scenario (however again with a concise grouping on folder-level). But for shops with thousands of similar products I would expect the site owner to help me a bit with his specific product knowledge and pre-group the mass for me.

This is simply googles law: Concentrate on the user, all else will follow.

It is so easy nowadays to import thousands of products via affiliate-csv into a database and let your scripts generate thousands of pages. But you won't seriously expect any visitor to type "blue round cylindric wooden widget" into the search slot and be happy you are the one who designed a special page for that item.

This is a complete misunderstanding of the long-tail-concept.

Particularly, if dozens of other affiliate partners used the same csv-data.

From the very beginning, I have put considerable effort in structuring my product-pages in a reasonable manner. This virtual counterpart of tidying up my b&m-store took a lot of my time. Many others, who much more cared for backlinks and big databases, sometimes for a short time performed better, but so far I prevailed all shifts of the algos and my site performs better than ever. Knocking on woods.

For me as a searcher it is only logical that pages differing only in "red" and "blue" with several kByte of otherwise identical content (like links or other boiler plates) will not rank well in google. I like that. Thanks for picking up this topic, tedster.

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3192967 posted 11:45 am on Dec 19, 2006 (gmt 0)

For me as a searcher it is only logical that pages differing only in "red" and "blue" with several kByte of otherwise identical content (like links or other boiler plates) will not rank well in google. I like that. Thanks for picking up this topic, tedster.

Hmmm, those sure sound like WPG Gateway Pages, don't they? ;)

blueline.jpg

leadegroot

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3192967 posted 11:47 am on Dec 19, 2006 (gmt 0)

Crush:
No, he was talking about a translation of a page not being a duplicate of the original.
He didn't say it was ok to crosslink those multiple sites once you've created them...
Different question.

Myself I am curious if they are able to recognise the html protocols - you know, you create index.en.html and index.de.html (IIRC) and some browsers should know to grab the appropriate language if the user has defined their language.
Thats a neat, underused solution and I wonder if Google implements it too!

asher02

5+ Year Member



 
Msg#: 3192967 posted 11:59 am on Dec 19, 2006 (gmt 0)

"For me as a searcher it is only logical that pages differing only in "red" and "blue" with several kByte of otherwise identical content (like links or other boiler plates) will not rank well in google."

I think you are so wrong.

I'm in the gift industry and in order to close a sale I need to show my customers the actual product with a very good picture.

So I have my category page with thumbnails of the products and have a unique page for each products.

In my industry a "hey I have this one also in red" does not work you have to show the customer the exact products they want to buy, so the other option is to upload all similar products to one page and end up with a 120k page that no one will wait for it to download.

other problem is similar products that have a small difference in design, so one will be a "dove widget" and the other one will be a "Jerusalem widget", both use the same materials same artist and just a different inner design so basically the difference in the description will be just one sentence about the different motif but the products are not the same.

I'll even go further, we offer Jewelry with the same motif but the design is different and we have over 30 of these. I'll give an example from my site,

product 1 description: "This pretty pendant is handcrafted from 925 sterling silver and opal stone. The opal is set into the middle of the silver to create the shape of a Magen David. This beautiful pendant makes an amazing gift for your friends and family. "

Product 2:"This attractive handcrafted Magen David pendant is made of opal stone and 925 sterling silver. The opal stone is fitted in the middle of this silver Magen David shaped pendant. This beautiful pendant makes an amazing gift for your friends and family.

These 2 products do not look the same at all, each one of them is unique and deserve a page of its own. However the description uses the same keywords. We changed it a bit to avoid duplicate filter not because it has to be this way.

The second products is now supplemental as well as many others from the same kind despite the fact that each one of them is a unique product.

I can go on & on but I guess you get the picture.

Now If it is just a duplicate filter then I don't mind, my concern is about duplicate penalty that affect the other portions of my site.

swa66

WebmasterWorld Senior Member swa66 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3192967 posted 12:16 pm on Dec 19, 2006 (gmt 0)

FWIW: those under the impression that a language maps to a country and vice versa) are very wrong.

E.g. German is spoken in Germany (.de), Austria (.at), Switzerland (.ch), Belgium (.be).

Countries like Belgium, Switzerland, have multiple official languages in active use.

I hope Google realizes that asking people to add language specific information on their own ccTLDs will increase duplication of content, not decrease it.

Getting a hold of *all* ccTLDs is not easy at all, and getting the last one might be a costly affair if you run into a ccTLD not having a easy way to avoid a domain squatter sitting on your domain name.

There is one case where a ccTLD makes sense:
- if you target specific products to specific countries.

But in general as a user I hate it that the same shop is there a handful of times, all willing to ship to me, all carrying the same products.

I'm fine with a ccTLD of google searchign by preference in it's ccTLD.
But I hate the geotargeting used on google.com based on visitor IP to force users to a ccTLD version of their search engine.

Worst of them all is msn though: they use geotargeting of the website IP address to demote a site about 40 SERPs if it's hosted outside the US. As long as Google stays away from that I'll play with them.

Edwin

10+ Year Member



 
Msg#: 3192967 posted 12:20 pm on Dec 19, 2006 (gmt 0)

For instances where a detailed page is needed for every product, wouldn't a "noindex" tag on each near-duplicate product page help matters?

That way, instead of seeing perhaps half a dozen or more "identical" pages, the search engine would see one category page with several on-topic phrases on it (the links, and perhaps associated descriptions, for each of the products that differ only by colour, or size, or whatever).

Sure, that means fewer pages indexed overall, but that might increase the average perceived quality of the site too?

idolw

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3192967 posted 1:04 pm on Dec 19, 2006 (gmt 0)

If you already have German language content that's indexed / ranked decently in search engines, then I'd hesitate starting over, but otherwise yeah, I think putting German language content (or, more specifically, Germany-audience-targeted content) on a .de domain is a good idea.

hmm, so with each translation I should start my work again and again? If I put new language on an old domain it should rank immediately. With new domain I need to wait a year to get rid of sandbox effect.
I heard of sites disappearing from google SERPs after putting a structural copy of their english site in another language on another domain.
Can you confirm it has been now solved and won't happen again even if sites are heavily cross-linked? (Yes, let's realize pages in different languages should be cross-linked with their other language versions to give the USER maximum satisfaction. In fact, in most cases it should happen on each and every page of the site)

smells so good

5+ Year Member



 
Msg#: 3192967 posted 1:18 pm on Dec 19, 2006 (gmt 0)

For instances where a detailed page is needed for every product, wouldn't a "noindex" tag on each near-duplicate product page help matters?

This was my solution. With a single widget in literally tens of thousands of varieties, I have one page that lists all of the basic widget versions, each listing linked to a boilerplate page that offers all of the possible combinations. Initially I set all of them up to be indexed. After some time I began to see those pages in Supplemental results so I made a change to the boilerplate. Now, only the top 50 widgets are indexed, the remaining are noindex. I keep track of the widget popularity in a database, based on actual sales, and the database determines what pages are indexed.

The bottom line for me is that I want my customers to "see" the widget of interest. It's important to my customers, so it's important to me. A B&M analogy might run like this.. A customer is looking into my storefront window (G Search) and can see the top widgets. Coming into the store they can inquire about similar widgets (my list page) and I can go into the back room and pull something off the shelf if someone wants to see it (my "noindex" boilerplate page). Obviously there is too much clutter in my storefront window if I display all of the widgets (Googles quality concerns) so I'll only show the best items there (my SERP's).

I have done this same thing with several products that I sell, but not all of them. I know what sells, I know what ads people click, and so I can presume to know that I want those items indexed. How can any search engine, without intimate knowledge of my business, presume to know better?

contentwithcontent

5+ Year Member



 
Msg#: 3192967 posted 1:49 pm on Dec 19, 2006 (gmt 0)

smells so good,

That is a wonderful analogy.

puchscooter

5+ Year Member



 
Msg#: 3192967 posted 1:58 pm on Dec 19, 2006 (gmt 0)

I presume this applies to page templates with a substantial amount of similar content and functionality [ e.g. search boxes with lot's of similar menu's ] as well, where the ratio of this to unique text is high.

Great thread, and thanks for the post Adam. This is something I have been seeing for a while, and it relates to two topics that are being discussed here. I have alot of pages dealing with cities within a country, with search boxes with lots of repeated city names in each. First off G indexes the pages as in a foreign language, though all the city name spellings are in english, and they rank well for my main site, with no drops over the past 8 months (though the same reindexing as a foreign language on a smaller site and less inbound links it almost dropped it out of the SERPs totally). The second issue is that some of these lists are taking up a little over half of the page or more (and the larger the saturation in the dropdowns, the more pages it is repeated on). I haven't seen any movement for quite some time, and have been in discussions whether or not to redesign the menus on another noindex-ed page. This info may be a catalyst to get it done, though I am wondering how much weight is really going to be put into en rankings for sites indexed as being in French, German, and Italian. Anyone else have the same thing going on?

MrSpeed

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3192967 posted 2:21 pm on Dec 19, 2006 (gmt 0)

However, we prefer to focus on filtering rather than ranking adjustments

What the heck does this mean? Is a ranking adjustment a penalty? Is this the -30 penalty?

And I hope that no one thinks that "Minimize boilerplate repetition" refers to html, navigation etc...so don't even go there.

It is easier than ever for people to create content on the web. The fact is if you don't want to create different 300 word descriptions for white irish knit sweaters versus green irish knit sweaters then someone else will.

[edited by: MrSpeed at 2:24 pm (utc) on Dec. 19, 2006]

calicochris

5+ Year Member



 
Msg#: 3192967 posted 2:55 pm on Dec 19, 2006 (gmt 0)

I'm confused! One of my sites is simple .html and .css for style with an extensive menu structure. Blocks and blocks of duplicate content in this menu structure repeated on every page of the 150 or so pages.

Is this menu repeating on every page duplicate content? All I'm doing, is giving the site visitors a repeating simple structure to navigate the site.?

mattg3

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3192967 posted 2:55 pm on Dec 19, 2006 (gmt 0)

Does Google just ignore Stubs, or is there a penalty? A wiki without Stubs is no wiki. :\ Ignoring is no problem, but a penalty I would think would be contraproductive.

bwnbwn

WebmasterWorld Senior Member bwnbwn us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3192967 posted 2:56 pm on Dec 19, 2006 (gmt 0)

I agree with this statement but I have a serious issue with the way duplicate content is affecting Ecommerce sites that have a product with 10 different flavors.

Say muscle milk by cyrosport has 18 flavors ok so I put one product up and add the other flavors ok this seems to be the way google favors this type of approach.

Ok so I buy choclate but the ingredients for the product are in vanilla I get the product and find I can't use it due to something different in the chocolate or other flavor.

I want my money back or worse yet I take it and get really sick due to this ingredient my site doesn't show. I am liable plain and simple.

So I have a product page for each flavor.

Here is the question I have Google has pretty much wacked all the product pages due to this duplicate content as there isn't allot of differences in the flavors just a few extra ingredients so what would you do.

1-
Take off the product description as they are the same for the most part have one description and just the active ingredients on the other pages.
2-
Do as all the other sites do just one description and one supplement facts for all flavors and add supplement facts for vanilla only

It isn't right this is considered duplicate content as this should be a part of any site.

If I go to a store to buy something I check what is in it as I am sure you do as well. Why should the internet be any different or the search engines push for this to be taken out in the search.

europeforvisitors



 
Msg#: 3192967 posted 3:16 pm on Dec 19, 2006 (gmt 0)

Does Google just ignore Stubs, or is there a penalty? A wiki without Stubs is no wiki. :\ Ignoring is no problem, but a penalty I would think would be contraproductive.

I disagree. In two of the sectors that I'm familiar with (travel and electronics), there are a number of huge players that have thousands (millions?) of keyword-driven, computer-generated stub pages containing nothing but "Add a review," some ads, and maybe some price-comparison links. If such pages are merely ignored by Google and other SEs, the sites have no incentive to refrain from spitting out stub pages for every conceivable keyword. What's more, the success of their spit-out-a-million-stubs approach just encourages every Tom, Dick, and Harry to try the same strategy. A "quality score" that took the ratio of real pages to stub pages into account would discourage such "If it's Tuesday, let's spit out a million empty pages about Belgium" shenanigans.

This 176 message thread spans 6 pages: 176 ( [1] 2 3 4 5 6 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved