|Google Boilerplate Patent and Page Title Keyword Repetition|
Patent applied for March, 2004 and published in 2008:
Systems and methods for analyzing boilerplate [appft1.uspto.gov]
The first time I noticed "boilerplate" issues, though I didn't know what to call them then, was during the horrendous Florida update of November 2003. At that time I suspected it with on-page text, and it was the first time I suspected that excessive use of keywords in internal anchor text could be a problem. After all the years since then, and now that others have been seeing the effect of navigation repetition issues, it kind of figures, since this patent was applied for soon after that.
But now I'm watching something happening on a particular site that's got two related main 2-word terms for the homepage and has what looks like a keyword specific penalty and the repetition is in the page meta titles:
Both of them are yo-yo'ing between the 20s to 30s and 350+ or 450+ or 550+ (or disappearing altogether), each individually on a rotating basis.
mainkeyword1-synonym1 - #30
mainkeyword1-synonym2 - #452
mainkeyword1-synonym1 - #360
mainkeyword1-synonym2 - #29
Like that, or term-combo 1 gets totally trashed:
mainkeyword1-synonym1 - gone from index altogether, unless "clicking for similar results" and then the homepage shows up indented at the end.
mainkeyword1-synonym2 - #450+
What's an outstanding factor is that mainkeyword1-synonym1 is repeated in the page title of close to 400 pages on the site, which has about 1,500 pages total.
Also, mainkeyword1-synonym2 is repeated in the page title of about 30-40 pages; but keep in mind that mainkeyword1 is repeated in ALL, and is also included in the domain name.
Mind you, only a few of those pages turn up before clicking for more similar results using this search:
site:example.com intitle:"mainkeyword1 synonym1" or
site:example.com intitle:"mainkeyword1 synonym2"
That's a very handy diagnostic tool for finding "boilerplate" elements, wherever they may be.
I've seen what seem to be problems with boilerplate use of keywords in page text before, but this is specifically related to the page title element in a site that was just recently hit, and this appears to be a major factor, although there probably are other contributing factors.
Has anyone else checked for boilerplate issues in a recently penalized site, and/or is anyone else seeing a problem with excessive repetition (cannibalization) of a penalized phrase or word for a page in multiple page titles?
First, to note, I haven't had a chance to look at the patent yet.
But, a quick top-of-my-head comment on boilerplate text in title... I've worked on a number of very large geo-targeted sites where the titles were basically all templated, with only the placename being change, and I've seen no apparent problems. I should add that I don't know whether this would have been the case had the sites not been well linked. Sufficient inbound linking, in my experience with Google, seems to trump onsite duplication caused by templating. (The type of site might also be a factor).
In the example you post, something else jumps out at me, though, and that's that you have a keyword followed by a synonym. I don't know whether in the example you cite it was a natural phrase, but the keyword-synonym syntax doesn't sound natural. I've seen what I felt was the co-occurrence of too many synonyms in onpage body text apparently lead to a -950 situation.
This is conjecture, though, and I've not read the patent, so I can only throw this out as an additional thought about your question.
|I've seen what I felt was the co-occurrence of too many synonyms in onpage body text apparently lead to a -950 situation. |
Sadly, I keep coming back to that one too for some ranking issues on some sites. But then I think, "surely not".
The existence of an Over-Optimisation-Filter -- like so many bad SEO's keep banging on about to explain their failings (i.e. "we were too goo at our job, sorry!") -- would mean that the most comprehensive, most organised and most explanatory pages would not rank properly. That seems insane.
It is only logical to have synonyms on a page with good, extensive content. Penalising a page for having too many synonyms would mean sites like WebmasterWorld wouldn't rank well for it's pages ... doesn't it?
Now, percentage of synonyms ... that a whole different thing that I keep my tin-foil hat on for ... ;)
|Now, percentage of synonyms ... |
In the situation I'm referring to, I should in fact have said too large a "percentage" of synonyms.
Getting back to titles, I don't know whether Google is looking separately at co-occurrence in titles and in other individual page elements (and in offpage elements... eg, in anchor text). I think it's likely that they are. If so, co-occurrence weighting in these elements would most likely be different than in body text.
The point I'm raising with regard to the OP's question is whether the problem is boilerplate text repetition, or whether it's keyword-synonym co-occurrence in the title. Or, it might be something else entirely. In this case, that something else might be inbound linking weakness, perhaps in conjunction with too much other similarity among the pages.
Not yet having read the patent, I'm not sure what points it makes to suggest that boilerplate in a title by itself might be problematic.
When there are so many penalties reported by people that are keyword (or keyphrase) specific, there's no choice but to seriously think about keyword co-occurrence [webmasterworld.com], since it's mentioned so many times in those phrase based indexing patent applications.
I'd never use data or information from one site as an example - except that this site is an example of the most blatantly profuse of repetition and boilerplate text you could ever imagine - all over the place.
The page titles aren't the only problem relevant to this issue. There's also use of a "tag line" repeated across the site - and in fact, if you do an exact match search "any phrase in quotes" from the homepage - that homepage is completely and totally filtered out of the results for every single string on the homepage, on up to 9 or 10 words in sequence in quotes.
The whole "investigation" of this issue that I'm onto started out because of the filtering for the tagline - which repeats a couple of the main keyword phrases for the site. I tested it specifically 2 years ago by substituting the text with graphics on 2 pages - and those 2 pages popped out of the "excluded" results one by one, as the graphic was substitued.
There's also an issue with text from the page being used as title and description for just about every link the site has (including some purchased ones, sad to say).
There's been discussion about page text, and navigation (which I spotted as 16.8% repetition as the lethal threshhold when the 950 penalty first hit); however, there's not about "boilerplate" in the page title element, other than one discussion I recall about pagination.
The synonym matter isn't really an issue - the excessive overuse and lack of supporting related phrases are an issue.
It's like (totally unrelated niche, loose examploe) informal dinnerware and informal dishes - kind of synonymous. For instance, instead of plastic plates the title is plastic informal dinnerware plates. Informal dinnerware: blue dinnerware dishes, on and on.
See? sticking informal dinnerware in over and over (close to 400 times) - including a pagination issue in the shopping cart for pages (1), (2), (3), and (4). And titles repeated verbatim in the H1 of shopping cart pages, too.
I really believe that for sites with otherwise seemingly unexplained keyword specific penalties, that site:example.com including keyword advanced operator after can provide some telling clues in some cases.
Another observation: When checking the site: and intitle: search again, I see that most all the "similar pages" are showing the same alt= text back to the homepage, which includes the penalized phrase.
Doing this search:
site:example.com inanchor:"keyword1 keyword2"
There are 1,050 instances returned - just about all are that alt= anchor text from the heading graphic, within the site itself.
Recent analysis I've seen also shows an uptick in the importance (and touchiness) of keywords in the image alt text. I'm not sure how this would relate to the idea of "boilerplate" - you wouldn't boilerplate the same alt text into many image tags, would you? But there is definitely an algo factor at work here. It used to be less sensitive than H tags - but at least for now, H tags are tuned way down.
This is hard to explain in words without "looking" but here goes. This morning I did a search using
site:example.com intitle:"otherword1 otherword2 keyword1 keyword2"
where "keyword1 keyword2" is a penalized main site phrase. What came up for that 4 word exact match page title are paginated shopping cart pages - there are 8 of them actually, all with the repeated page title, image alt= text for the heading graphic and simple text from a drop-down box that's JS so it doesn't get indexed as links.
Only 2 pages show for that search, one having no cache, then when clicking for similar results "51 results" (there are actually only 8) are turning up, most of which have no cache either. What shows for the snippet all is the same - the alt=text and text from the drop-down.
Back in this thread from 2008:
Pagination - KW cannibalization and duplication [webmasterworld.com]
|My belief is that pagination of results is THE killer for the last 2 years or so. |
I'd have to agree with that as part of the problem, though I've seen sites with limited pagination that did have their pages properly indexed and ranking fine - but those sites don'[t have the fierce KW duplication sitewide in titles, and are NOT totally filtered out for boilerplate tagline on pages. IMHO it's a combination of factors, and passing over a certain threshold that results in a loud death rattle for a site.
|Recent analysis I've seen also shows an uptick in the importance (and touchiness) of keywords in the image alt text. |
This is looks exactly like what was happening with duplicate titles and meta descriptions, except that this time it's the image alt= as the first trigger rather than the meta description.
|I'm not sure how this would relate to the idea of "boilerplate" - you wouldn't boilerplate the same alt text into many image tags, would you? |
I sure wouldn't, but some people are into a lot of keyword stuffing, sitewide taglines and repetition.
I see it as related to boilerplate because it's happening on a site that has its homepage filtered out for any "grouping of text" and has multiple instances of duplications throughout - just like boilerplate, which is repetion that according to the patent can get ignored for many common words. It may have expanded further than just "contact" or "home." IMHO it's closely related to duplicate or near-duplicate content filtering.
All in all, it's a combination of (potentially lethal) factors, with each element needing checking out in the event of an obvious penalty.
Here's the Google patent that relates to "similarity filtering":
Methods and apparatus for estimating similarity [patft.uspto.gov]
[edited by: Robert_Charlton at 11:07 pm (utc) on Aug. 7, 2009]
[edit reason] fixed typo [/edit]