Forum Moderators: Robert Charlton & goodroi
He zeroes in on a few specific areas that may be very helpful for those who suspect they have muddied the waters a bit for Google. Two of them caught my eye as being more clearly expressed than I'd ever seen in a Google communication before: boilerplate repetition, and stubs.
Minimize boilerplate repetition:
For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details.
If you think about this a bit, you may find that it applies to other areas of your site well beyond copyright notices. How about legal disclaimers, taglines, standard size/color/etc information about many products, and so on. I can see how "boilerplate repetition" might easily soften the kind of sharp, distinct relevance signals that you might prefer to show about different URLs.
Avoid publishing stubs:
Users don't like seeing "empty" pages, so avoid placeholders where possible. This means not publishing (or at least blocking) pages with zero reviews, no real estate listings, etc., so users (and bots) aren't subjected to a zillion instances of "Below you'll find a superb list of all the great rental opportunities in [insert cityname]..." with no actual listings.
This is the bane of the large dynamic site, especially one that has frequent updates. I know that as a user, I hate it when I click through to find one of these stub pages. Some cases might take a bit more work than others to fix, but a fix usually can be scripted. The extra work will not only help you show good things to Google, it will also make the web a better place altogether.
[edited by: tedster at 9:12 am (utc) on Dec. 19, 2006]
I disagree.
I was talking about a Wiki not a review site. I think most of it is taken care of at least in Mediawiki that it will lead to a script. Though some users might type the url directly and a sitemap cronjob would pick these up. I have never seen ie Wikipedia history pages and index.php?bla files in Google so I hope the topic is hopefully mute.
But we see again life is more complex and opinions differ.
I'm confused! One of my sites is simple .html and .css for style with an extensive menu structure. Blocks and blocks of duplicate content in this menu structure repeated on every page of the 150 or so pages.Is this menu repeating on every page duplicate content? All I'm doing, is giving the site visitors a repeating simple structure to navigate the site.?
That's a very good point Calicochris.
I am not authority on this but me think that duplicate CONTENT relates only to the stuff outside <> tags.
In other words structural templates are probably not penalized/filtered, even if they represent majority of total "content".
However I would like Adam to confirm this.
[edited by: activeco at 4:03 pm (utc) on Dec. 19, 2006]
I was talking about a Wiki not a review site.
And I was talking about "stubs" in the broader context of this discussion (as the term was used by Adam Lasnik), so maybe we're discussing apples and oranges. Still, in an era when Wikis have been acquired by corporations and turned into profit-making enterprises, the temptation to create Wiki stubs for "long tail" search referrals may be too great to resist--which wouldn't be good news for users or for search engines.
And I was talking about "stubs" in the broader context of this discussion (as the term was used by Adam Lasnik), so maybe we're discussing apples and oranges.
When you quote me directly and say I disagree, then I assume you disagree. ;)
Wiki, Forums and so on are widely used software. Surely it would all be easier with basic HTML, still technology has moved on beyond the abysmal Frontpage kind of site. There are x gazzilion kind of technologies out there that make publishing easier and more accessible. The potential for abuse does not mean abuse. A situation where only the SE are allowed to use technolgy and the rest of the world has to carve their letters into stone is undesirable.
It's the old argument, shoot everyone to prevent crime or live with the reality. Everyone a suspect until proven innocent is a prehistoric strategy.
Adam, I have also placed a noindex on over half of my pages, many were relatively thin, but did the same to many others just in case. Those pages with noindex, although they still reside on my server, they do not cout as far as Google is concerned, correct?
Thanks,
I'm debating on whether to send a re-inclusion or wait for Googlebot to sort it out....
Adam,
Since Google has made this a public FYI now, why not build into your webmaster toolkit something like a "Duplicate Content" threshold meter. That would at least automate things like identifying what is boiler plate and what is not, and elimiate the "what if" questions, and all the millions of potential scenarios that webmasters are now scratching their heads about.
This needs to be read again by "the powers that be" that lurk these forums.
Google, if you are TRULY interested in "communicating" with webmasters...not the public relations scare tactics you currently call communications,
APPLYING THE ABOVE RECOMMENDATION IS HOW YOU COMMUNICATE IN A USEFUL WAY THAT MAKES YOUR JOBS AND OUR JOBS EASIER.
Avoid publishing stubs:
Users don't like seeing "empty" pages, so avoid placeholders where possible. This means not publishing (or at least blocking) pages with zero reviews, no real estate listings, etc., so users (and bots) aren't subjected to a zillion instances of "Below you'll find a superb list of all the great rental opportunities in [insert cityname]..." with no actual listings.
[edited by: RonnieG at 7:06 pm (utc) on Dec. 19, 2006]
Thus without writing the same description in lots of different ways for each product (which isn't practical or even sensible) you are left with "boilerplate" content. At least, that's how it sounds as if Google will see it, although the user is likely to benefit from having the products organised clearly into categories if relevant.
The idea of duplicate content on the same website causing a penalty to rankings doesn't sound fair to implement until the algo can automatically work out how the data on the site is categorised. Not easy I'm sure, but...
Cheers
Simsi
[edited by: Simsi at 7:30 pm (utc) on Dec. 19, 2006]
I always see someone posting about the magic cure "noindex" tag.
The problem is that it kills lots of long tail searches as for example if you have a blue widget page that is similar to a red widget page and you use the noindex tag on the red widget page , you will not be found on serps for the "red widget" even though people do search with the long tails keywords.
So using noindex on page that deserve to be indexed just to make Google happy looks like a bad practice to me unless you are willing to get just a small portion of the traffic you deserve.
I have yet to run into issues such as these and I'm going to chalk it up to using as many variables that I can from the database to make the page unique. Changing a word here and there isn't going to work. The meaning of that page needs to change. That means a top to bottom use of variables that break the Boilerplate mold.
I truly believe that the structure of the page is a determining factor in the Boilerplate discussion.
We ran pages with 3/4 navigation alternatives per page because we thought the user would like the various options presented this way in drop down menus and on page link navigation.
The result is 3/4 times the quantity of similar content. This has gotta be a red hot issue.
The question is: Is it? [ 99% likely IMO ] and how to structure it appropriately. Maybe it's time for REM scripts to take the repeated info off the page.
When you add in Stubs then not only is there potential for a page filter, there is also the high risk of a site wide filter tipping out one's entire site, or allowing sporadic results to appear on less pages.
We are seeing this. Google says on the site:tool we see all your pages, but only these are worth listing, and even then they are filtered out of the way. If you have a high PR you might be less effected, but ours are PR 5 and 6 and still having issues. Best to get it right in the first place.
Good point's pageoneresults - but i sense we know that 80% "boilerplate" or something similar on stubs will cause Google to throw it's hands up and say "too similar". We wouldn't repeat visible content like this would we?
Try 15% or less IMO
A search analysis of the top 5 results on key terms revealed that not one of our competitors had boilerplate pages. They restricted themselves to one drop down [ or non ]which varied on every page. All of their menu driven pages [ elsewhere ] were geared for SEM not SEO.
How on earth could we miss something so obvious!
[edited by: Whitey at 8:48 pm (utc) on Dec. 19, 2006]
On the other hand: You gave an example with text snippets of - I guess - less than 300 characters per product. Do you really think it necessary to design a single page each with just these 300 characters as unique content? An alternative would be (as I said) to group these products together and present an "view-large-image" - link. The long-tail-argument, as you use it, doesn't work: If your customers are searching for those unique phrases, they will also occur on pages with several products and google will also index the phrases. Even better: the key phrases will automatically be repeated, which a search engine would expect, if these occur in the meta-tags.
I admit: If I intended to buy some jewelry online, I'd probably expect the site owner to put some care in his presentation. Maybe twelve watches or rings, each worth several k, might look a bit strange grouped on one page. But then I thought: If you stick to the "one product one page"-concept: Why not add some poetry on each page? Unique content, the product deserves it, flattered customers, and there's probably quite a number of poor poets out there looking for a job.
As Saint-Exupéry's "little prince" once put it: "It's the amount of time you spent with your rose, which makes this rose so important." I guess the same holds true for html-pages in the eyes of a search engine.
Lawman might chime in, but if I recall correctly there has been cases where a disclaimer on a single location was not sufficient, no matter how links were constructed.
So, the question is - do I reduce risk, but damage opportunity, or vice versa?
It would be great if we had something like this...
<noindex></noindex> Those who are really good at this type of stuff are serving one page to the visitor and another to the bot so this issue of boilerplate is a moot point for them. ;)
Why? google won't tell you, anyway.
> I truly believe that the structure of the page is a determining factor in the Boilerplate discussion.
I believe that structural diversification of the SITE is the best antidot against tanking. And probably google is looking at this from a site-perspective: If you have the same - lets say - links-footer on every page, this may be viewed as a boiler plate. But if your whole site comprises three, four or even more completely different strucures, each of which deserves a different footer, the percentage of duplication is automatically diminished sitewide.
Maybe google has means to find out how many scripts you probably wrote to generate your x-thousand pages. Each script generates a different structure. In google's eyes, the importance of your site, the care you took for your visitors, is only partially defined by the number of pages (i.e. the complexity of your database), but also - if not mainly - by the number and complexity of the scripts you wrote. And the amount of time you spent on (writing, not tinkering) these scripts.
On the other hand, I would expect google to be able to identify this "boilerplate" on a site basis and just discount it from the page while still seeing the unique content. Technically this would not be too hard (and I expect that they are doing it this way).
I have some concerns about this and am very curious about the SEO and duplicate content filter impact on this design decision.
Should I be worried?!?!?
Every page on my site offers a static menu at the top, and generic site links at the footer. That's nearly 1500 pages of duplicate content? Nope, not at all. I use separate pages to elaborate on some basic information - my About Page, Contact info, site policies. To my knowledge I have not triggered any dup content filters, except where I mentioned earlier about some pages going supplemental. My site certainly is not penalized in any way as I'm still at the top of the SERP's. It would be foolish for Google to consider such things as dup content, UNLESS, that's all I have on page after page (can you say stub?). Take a look at this site (WebmasterWorld) - each page is built around a template with identical info in the same place on each page. Just like mine, or vice-versa.
A few other things may help to reduce the chance of pages looking alike to G-bot. Good page design includes proper use of meta tags. Keywords, descriptions and page titles should reflect the page content. This thread offers a few other really good gems about how to reduce the duplication, while still displaying the content on the page. (I'm almost ready to hire myself a poet and put a couple of hundred pages back in the SERP's)
It takes a little imagination to describe different products that are essentially the same. How many ways can you say a bottle is about 10 inches tall and and made from plastic? I personally think a lot of duplicate content on the web is intentional, created by lazy people with no imagination or ambition other that to create more useless content. I know that my own boilerplate pages suffer from a dearth of content, and so they probably look very similar to an algo, even though they are uniquely different to a set of human eyes.
"What is duplicate content?
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Most of the time when we see this, it's unintentional or at least not malicious in origin: forums that generate both regular and stripped-down mobile-targeted pages, store items shown (and -- worse yet -- linked) via multiple distinct URLs, and so on. In some cases, content is duplicated across domains in an attempt to manipulate search engine rankings or garner more traffic via popular or long-tail queries."
We syndicate a version of a forum we own for mobile users. It is identical content on 2 separate domain names (one is a .mobi) but the purpose is clearly not to generate better search placement (i.e. not malicious) it is to service users interested in accessing our content through their phone. The way I read the statement above it doesn't seem that would be a problem but it is definitely not totally clear. Anyone have any thoughts? Any help would be appreciated.
Some of us here know how to cloak. Some of us here know how to make appropriate iframes, meta tags and such.
Majority of the web site owners do not know how to do these. That majority installs a default package, makes minor look-and-feel changes, and starts adding content. The menus, disclaimers, headers and footers are the same all across these sites.
At what percent Google begins to punish such duplicate content? (Rethorical question - I know they will not answer.)
I think this hit me one massively, when i had a huge related articles include which displayed the same text and links on a hundred pages on my site. I cut it short, to just ten links on ten pages only, and the rankings came back.
Now was that a duplicate content penalty too? I think so, but would like to know your opinion.
Another site boasts of being the best travel forum for this place: has zero posts, so quite how this makes it best I dunno.
Another offers travel deals there: yet there are no hotels etc for miles.
Bah! Humbug!
At least, as far as I could be bothered checking the search result, not seeing a page for flower shop there (even tho no such shops for miles). I have seen for some other small places.
I've emailed google about pages like these; as this thread shows, they remain commonplace - and google even encouraging their creation.
Not so much a personal gripe - I have page at top of the results (it's a really, really small place!); but there are pages on this place, with info and photos, yet they are jumbled in with the stubs, so google results not a boon to users.
- we see Google advising re webmasters making sites that work for users.
Google could likewise better help users.
First Vanessa gave Rand a video interview on the topic, and now, soon after that, Adam has given us a more detailed blog post. He's even sharing some useful vocabulary to help further our discussion and comperhesion.
You can tell where at least part of the search quality emphasis is right now at Google. So this current focus might also be a bit of a storm warning for the wise. It's happened before. The way I see it, public statements don't just emerge from a vacuum.