Welcome to WebmasterWorld Guest from 220.127.116.11
If I am remembering correctly that was pointed out in the many hundreds of comments on this topic is that there are more than likely different variations of this penalty. Some see it BEFORE the filters and others AFTER the filters. If this is correct it also seems that the AFTER the filters tends to be more about over optimization. Before the filter seems to be something else or found during (re)-indexing (trust, pr, incoming links, dupe content, etc.) Also the before the filter seems to be site wide rather than specific terms. This is just theory though based on experiences shared in this.
The excessive navigation and keyword/phrase stuffing...If you read back I can show you many low popular, low pr, and weak linked sites that stuff away and remain where they are. If you are worried about duplicate blocks of text also keep in mind that you will find tons of blogs and sites with duplicated article summaries. Again with low pr and low links etc.
What I have and haven't done so far (on site stuff):
1) Page titles using different keyphrase order than links to the page (the whole phrase can be found in title and link but different order). Page titles with omitted parts of phrases than links. Page titles similar but different words than links.
2) Navigation now is as simple as possible. There are no stuffing here and never really has been. No cross linking into unrelated sections. Also tried cross linking sections that are related in the same category - seen no difference.
3) Meta keywords...Existed and now removed. No difference. Tired of messing with them anyway.
4) Meta description...All unique (webmaster tools reports one short description and no duplicates). The descriptions do have the main phrase. Tried reordering and reducing occurrences in relation to what is found in title and links. I am holding off removing or rewriting them completely across the site. A few thousand descriptions is just a bit much right now.
5) Title of page matched h1. I have tested using differing h1 and titles. Again using same phrase words as well as mixing them or keeping them completely different.
6) I do have a related article section at below articles. I haven't removed them as this is keeping us alive with traffic flow and visitor retention. This related article section of course uses the title of the article which is in the h1 tag of the article. Tried different variations, mixes, similar but not exact words/phrases.
7) I have tested sections with article summaries, links only, and unique descriptions. No go.
8) I have links on bottom of my pages to TOS, Privacy, About us all no followed and blocked by robots. Don't use any other footer links...never had.
9) H1 tags are used for titles of each page. Used various tags, h2, h3, and the like but no go. I have yet to remove the tag completely.
10) Duplicate content. Never really had any complete duplicates. Have my php scripts worked in a way that only 1 variation is accepted and will generate a page. Query strings aren't allowed. A 404 is sent automatically. Yes this works properly. Took down and still fight those darn content and site thieves. Never ending battle.
11) Duplicate content again. We do have re-published content as well as our own. Gonna work the re-published stuff out on our next step. Our own content is -950'd. Whether it be fresh or old it just gets -950'd. There is interlinking between re-published and unique content through the related articles section. Also replaced re-published with unique that was on theme using the same url (links established to the url). No go. We tried new urls for new content and established links over time. -950'd at the get go.
12) The site is done in CSS which validates and HTML 4.01 Strict and validates. With CSS I don't get tricky or use anything that would be considered spam, cloaking, black hat, or unethical. With CSS turned off the page reads like a book and main navigation at the bottom. This navigation may appear as footer links. Another thing I could test.
13) I do not use any cookies or sessions except in one area that is blocked by robots.txt in a form.
14) Have tried static html pages rather than php generated pages (.html extensions). No go.
15) I know there are more things but can't think of any right now.
Any way this has been done over the course of 2 years. Test...wait...Test...wait...Test...wait...Test...wait. Insanity is setting in :)
Next to work on is off site stuff.
[edited by: tedster at 2:20 am (utc) on July 6, 2008]
Should I try to reduce keywords on a couple pages and see it these pages return at all or would have have to reduce the keywords on all pages to get out of this penalty or filer? Being how both of my sites were hit the same way just a year apart i'm thinking it is a penalty.
Content is king. Added content with a good, natural navigational structure, and didn't even get any new IBLs (that I know of). I was tempted to add a link to it from another of my sites to help it out, but resisted the temptation.
On other sites my longer pages are doing really well, too.
I think you can be penalized one or two places (maybe more) for the same type of sins as get the 950 penalty. I'm still beating one competitor which had a lock on #1 for years after deoptimizing anchor text. (He didn't.)
So I'm going to experiment with one old site and see if I bump it up the rankings with deoptimized anchor text (keyword stripping). Don't need "widgets" in every text link!
There are still a ton of 950d sites and/or pages, some of which the webmasters appear to have abandoned (because they couldn't fix the problem). Some of them used the wrong software to build their sites.
Apparently it automatically creates a link from the page title, and obviously a few titles on the same topic creates very spammy anchor text (series of major keyword repeats) on the same page. It also has "tags" in addition to the links, one after the other, which just exacerbates the problem, doubling, tripling (or worse) the number of text links with the same keyword on the same page.
Even an online company owned by the NYT has many pages locked in 950 hell and once again, it has the same keywords in successive anchor text links. The NYT isn't doing so well these days, so it should put some editors to work on its internet property.
I finally ended 2008 cleaning out the "stables" on my main site after seven years to see if I could restore the remaining 950 problem (a few single-word keywords). I got rid of as much crap as possible.
I tried to keep Google from having any excuse to penalize the site. And in the process I rebounded for three single-word keywords, but the main one from SERP #50-something to now #25. (Was about #12 but now there's more competition.)
Some of these changes may have helped or collectively the number could have made the difference:
* Removed all dead links (careless editing unchecked). (Xenu used to crash on the site due to its size and I had stopped checking.)
* Replaced all 301'd links with the new target URLs.
* Removed nested/hidden empty links which Dreamweaver for some reason puts into your code without asking. (Didn't know if Google saw it as a trick to get an extra link in a page.) I think it happens, for example, when you delete a thumbnail (which is linked to a page), and then add a new thumbnail with a new target URL; Dreamweaver tacks the old URL onto the end of the code for the new link!
* css'd as much of pages as I could easily. (I would have done more, and may still do more, but it means a lot of individual page editing instead of Find and Replace sitewide.)
* used background images in css as much as possible.
* Standardized all <b> to <strong> sitewide.
* Standardized all <i> to <em> sitewide.
* Removed all <strong> tags around images (incl. thumbnails).
* Removed <strong> from text on pages that could look spammy with it (esp. shorter pages), e.g: another part of the page already had the same word tagged with <strong>. I think some webmasters would be surprised how <strong> can tip the balance on Google's view of "spam" or "not spam."
* Deleted all orphan files on computer AND server (which I'd forgot about), but could have been infected with the 950 "virus" (old spammy text links). Note: Google keeps checking old pages long after you've forgotten about them, and stopped linking to them, and may therefore consider their content in awarding penalties.
* Deleted/merged very thin pages.
* Moved virtual "orphan" pages out of thin directories to thick ones (and 301'd old pages).
* Checked and rechecked many pages for dupe anchor text and removed it if found.
* Updated all pages with new Google Analytics code.
(Left all page titles alone.)
I don't know how much all that helps ranking immediately. I know it does help users somewhat. I think long term it could be very good for stability in rankings (incl. algo updates).
I love Dreamweaver for moving files--the way it automatically updates the links on every page. But it ignored some files that I'd linked using the full URL (http://www.example.com/page.html) instead of the local URL (../page.html). Xenu caught those.
in other words, if you have a page about widgets and there are 20 links about subtopics which include anchor texts with the word "widget" in it (such as "widget x"), is the current page you are viewing penalized for overuse of the term "widget", or does google filter the pages you are linking to because the term "widget x" is overused in the anchor texts to that page?
On Jan 30th, I noticed one directory and its subpages, and only this one section, suddenly wasn't ranking. Previously almost every one of these pages would be in the top 5 of the results for its keyword. Out of nowhere, these pages are suddenly no longer being shown in the SERPS. When I do a search using inurl: the page exists in the index, but does not show organically with phrases you would think would be associated with that page. Sometimes, if I go down deep enough in the SERPS I can find the article, but most of the time I cannot.
Other things that I have found:
1. When I search for the keyword, other pages on my site, that are not as relevant as the penalized content of my site, may show up in the index and appear to be ranking properly.
2. From my experience, when I enter a specific url as a search query in Google, that url would appear first in the SERPS and the any pages that contain that url in the text would appear. For most of my penalized pages, this behaviour is strange. Sometimes other pages that are not relevant from my site appear, but the actual page does not appear in the serps. Other times, it will show pages in my site that contain the keywords from the url I am searching for.
On Feb 25/26th, all of sudden all my rankings were back. New content was indexed and I was in the 1st search result for that keyword. Then today, I noticed all of those pages were gone again. I checked my access_logs and at around 12:am, I was no longer receiving google referrers from search.
Does this sound like the -950 penalty?
People mention repetitive keywords in anchor text could be a problem. In this section of the site, I have a little box showing the new articles that are posted. This box appears on every page in that section, so essentially every page links to the latest 10 articles, including the current page. In other words, the current page will contain a link back to itself. Is what is meant by repetitive keywords in anchor text?
[edited by: Skalek at 4:42 pm (utc) on Mar. 2, 2009]
To clarify the idea of repeated keywords in anchor text causing a problem - that means the keyword anchor text is frequently repeated on the same page, and not just in various places across the entire website. It is very natural for a major keyword to appear on many pages of a website, even in anchor text.
So for example, if your menu (or even your New Articles box) includes "keyword-A" in a high percentage of the links, that sometimes seems to cause the -950 problem.
Matt Cutts identified the -950 as a kind of "over optimization" action in their algo. That's about all he said on it officially, though. The rest is the experiences and ideas that we are sharing.
One thing I forgot to mention with my problem is that there are some pages in this section that appear to be unaffected and rank #1 in the serps. These particular pages were ones that have been linked to by many sites, including highly authoratative ones.
The only reason why I do not think it is the yo-yo effect is because it only came back that once. Throughout the month that I have been watching this I never saw my pages going up and down in the serps. They just remained out of the serps altogether. Makes me inclined to think they are in the -950 group?
What is to stop someone from just recreating the section of their site under a new directory to get out of this penalty? Streamline it so there is not alot of duplicate keywords, linking on the same page, etc and just make a new section and use the other as an archive? The content in this section has a shelf-life, so it would not be the end of the world if it never came back, but I could continue working under a different subdir.
We'll see where this goes.
32. The system of claim 31, wherein the means for determining a freshness associated with each of the links includes: means for determining the freshness associated with one of the links based, at least in part, on at least one of a date of appearance of the one of the links, a date of a change associated with the one of the links, a date of appearance of anchor text associated with the one of the links, a date of a change associated with the anchor text, a date of appearance of a linking document containing the one of the links, or a date of a change associated with the linking document.
Unique Words, Bigrams, Phrases in Anchor Text
 According to an implementation consistent with the principles of the invention, information regarding unique words, bigrams, and phrases in anchor text may be used to generate (or alter) a score associated with a document. For example, search engine 125 may monitor web (or link) graphs and their behavior over time and use this information for scoring, spam detection, or other purposes. Naturally developed web graphs typically involve independent decisions. Synthetically generated web graphs, which are usually indicative of an intent to spam, are based on coordinated decisions, causing the profile of growth in anchor words/bigrams/phrases to likely be relatively spiky.
 One reason for such spikiness may be the addition of a large number of identical anchors from many documents. Another possibility may be the addition of deliberately different anchors from a lot of documents. Search engine 125 may monitor the anchors and factor them into scoring a document to which their associated links point. For example, search engine 125 may cap the impact of suspect anchors on the score of the associated document. Alternatively, search engine 125 may use a continuous scale for the likelihood of synthetic generation and derive a multiplicative factor to scale the score for the document.
 In summary, search engine 125 may generate (or alter) a score associated with a document based, at least in part, on information regarding unique words, bigrams, and phrases in anchor text associated with one or more links pointing to the document.
Another thing to keep in mind is anchor text on other sites pointing to you if you change titles on your pages.
 According to an implementation consistent with the principles of the invention, information relating to a manner in which anchor text changes over time may be used to generate (or alter) a score associated with a document. For example, changes over time in anchor text associated with links to a document may be used as an indication that there has been an update or even a change of focus in the document.
 Alternatively, if the content of a document changes such that it differs significantly from the anchor text associated with its back links, then the domain associated with the document may have changed significantly (completely) from a previous incarnation. This may occur when a domain expires and a different party purchases the domain. Because anchor text is often considered to be part of the document to which its associated link points, the domain may show up in search results for queries that are no longer on topic. This is an undesirable result.
 One way to address this problem is to estimate the date that a domain changed its focus. This may be done by determining a date when the text of a document changes significantly or when the text of the anchor text changes significantly. All links and/or anchor text prior to that date may then be ignored or discounted.
 The freshness of anchor text may also be used as a factor in scoring documents. The freshness of an anchor text may be determined, for example, by the date of appearance/change of the anchor text, the date of appearance/change of the link associated with the anchor text, and/or the date of appearance/change of the document to which the associated link points. The date of appearance/change of the document pointed to by the link may be a good indicator of the freshness of the anchor text based on the theory that good anchor text may go unchanged when a document gets updated if it is still relevant and good. In order to not update an anchor text's freshness from a minor edit of a tiny unrelated part of a document, each updated document may be tested for significant changes (e.g., changes to a large portion of the document or changes to many different portions of the document) and an anchor text's freshness may be updated (or not updated) accordingly.
 In summary, search engine 125 may generate (or alter) a score associated with a document based, at least in part, on information relating to a manner in which anchor text changes over time.
I feel this means that the spam penalties are less likely to create collateral damage. And for the webmaster, the good news is that we can relax any lock-step on keyword repetition in anchor text - the kind of thing that could get you into the penalty in the first place. Google is better at picking up relevance, even when the exact keywords are not hammered home.