Welcome to WebmasterWorld Guest from 126.96.36.199
Something is now becoming clear, and I think it's time we put away the name "950 Penalty". The first people to notice this were the heaviest hit, and 950 was a descriptive name for those instances.
But thanks to the community here, the many examples shared in our monthly "Google SERP Changes" threads and in the "950 Penalty" threads themelves, we now can see a clearer pattern. The demotion can be by almost any amount, small or large -- or it even might mean removal from the SERP altogether.
It's not exactly an "OOP" and it's not the "End of Results" penalty. From the examples I've seen, it's definitely not an "MSSA Penalty" -- as humorous as that idea is. (Please use Google to find that acronym's definition.)
It's also not just a Local Rank pheomenon, although there are defiitely some similarities. What it seems to be is some kind of "Phrase-Based Reranking" - possibly related (we're still probing here) to the Spam Detection Patent [webmasterworld.com] invented by Googler Anna Lynn Patterson.
So let's continue scrutinzing this new critter - we may not yet have it nailed, but I'm pretty sure we're closer. The discussion continues:
[edited by: tedster at 9:18 pm (utc) on Feb. 27, 2008]
Sometimes the pages can do better, like for searches for the eight words in the title a page might turn up in to 400s.
The penalty seems most often directory-wide. I had one directory recover yesterday, 54 pages now all back to number one or two, while another directory gets hit, with 152 pages dropping from the the top ten to 950. Heh, unfortunately I saw the good news first, then checked and saw the bad news a minute later...
You're pointing here to a directory-wide phenomenon that hasn't been touched on all that much. Do you have any more observations on that? Might these cases be related to phrases in the menu labels within the directory, do you think?
Trustworthyness of the sites navigation is, imho the key. Many factors can trip that analysis, which is probably made offline and then applied according to a search phrase at run time.
In one case I know of, the signs of this problem disappeared with one solid new inbound link from a very different domain, with the problematic phrase used as the anchor text. By "very different" I mean the linking domain was not in the 1,000 for the given search.
Multiple index based information retrieval system [appft1.uspto.gov]
Down in part b) about ranking documents based on anchor phrases (which is part of it, not all), this relates to the possibility of the IBL helping to pull a site out:
 The product value here is a score of how topical anchor phrase Q is to document D. This score is here called the "inbound score component." This product effectively weights the current document D's related bit vector by the related bit vectors of anchor phrases in the referencing document R. If the referencing documents R themselves are related to the query phrase Q (and thus, have a higher valued related phrase bit vector), then this increases the significance of the current document D score. The body hit score and the anchor hit score are then combined to create the document score, as described above.
Thinking beyond phrases, in terms of taxonomies, the terms ranked for by the linking out site would be a few levels up in the ODP category tree. That makes it "related" enough for me, even though not specifically.
Plus, there are IBLs to this (linking out to D) site from other sites, on the specific topic of the one in question, that DO rank well for the specific topic and those have been ranking well for related keywords for several years. Remember the two_hops_ back link thing we were discussing a few years ago? I dont think it's gone away, and what it does is establish a "chain of relevancy."
This was an experiment, and at this point I'm kind of curious what would happen if that link were taken down. That would make it an even more interesting experiment. ;)
I really need to add something. The linked_to site recovered 2-3 days after the link was put up. Coincidence? Maybe. But at that point, the cache date for my page - and it showed that link - was the date for the day before the other site popped out. The cache date then reverted back to the older date and showed the older cached page.
[edited by: Marcia at 11:11 am (utc) on Feb. 8, 2007]
Then start another thread, and let people wanting to talk about you thing talk about one thing, even if you have a different opinion.
"Might these cases be related to phrases in the menu labels within the directory, do you think?"
No. It's the best example of why the penalty isn't strictly word-based, as the pages below it in a directory can be about anything and even can be tested by creating totally innocent pages titled as Pink Leftist Zebras in Nebraska that will likely be close to the last result.
MHes makes a good point. It makes perfect sense for there to be collateral damage if secondary pages get a lot of their link power from a page in the 950 dumpster. The secondary pages might see drops, or even themselves get a related ranking drop (like lost 12 scoringpoints for having five links or more from -950 pages).
Sure it can be a phrase-based penalty and drop pages down to 900+ at query time. Theoretically anyway, unless the papers explaining it are incorrect.
I have 2 sites in the same niche; both do well on Google local. The main one extremely well, it has highly optimsed pages (title, H1, internal linking etc etc) and has definitely coped a phrase penalty late December, for quite a few uncompetitive key words strings.
The main site has not been changed in over 8 months, The home page is ranking top 5 on keyword keyword but the actual informational pages for many phrases ... gone & un optimsed terms still ranking really well on the same pages. So not a page penalty.
The funny thing is that the other site, which is not highly optimised on page or off (well, it has had the basics done) but has no external link work done on it, is now doing really well on the phrases the other lost.
My guess is over optimised internal links, combined with on page (now) overoptimisation, throw it some on theme deep external recips and there you have the recipe for "Phrase Based Re-ranking"
I have changed the site nav and some on page factors to do a test today. I think that 2 months is long enough to think that this could stick.
Of course open to revision at any time and all feedback appreciated.
[edited by: Interent_Yogi at 12:45 pm (utc) on Feb. 8, 2007]
The spam recognition theory where the SERPs are recalculated seems to make sense to me, I just don't think it's "spam" in the normal definition of spam, since I'm not finding spammy words or phrases on the pages that were hit. It's an overoptimization of some kind, but I honestly believe it isn't necessarily on the pages impacted, I think it's other factors on the site that cause some pages to drop.
I base this on the fact that I really didn't change all that much on the actual pages I've been watching, as they just didn't seem to be overoptimized. I took the search terms out of two outbound links on one page, and used a slang term for the keywords a couple of times to lower the number of times those terms appeared, but that was about it. The text on the page read fine. I did make changes to other pages higher up that were in a position to pass on more or less PR to the pages hit. I've also had two other people look at some of the pages that were hit, and they saw no evidence of overoptimization on those pages, but made some recommendations on pages that were higher up the chain.
If we're relating all this to the patent, I personally feel that one of the key-phrases of note in there is; "predictive phrase":
Another aspect of good phrases is that they are predictive of other good phrases, and are not merely sequences of words that appear in the lexicon. For example, the phrase "President of the United States" is a phrase that predicts other phrases such as "George Bush" and "Bill Clinton." However, other phrases are not predictive, such as "fell down the stairs" or "top of the morning," "out of the blue," since idioms and colloquisms like these tend to appear with many other different and unrelated phrases. Thus, the phrase identification phase determines which phrases are good phrases and which are bad (i.e., lacking in predictive power).
A lot of scraped content out there is basically, incomplete, ie... jumbled-together snippets centred around the keyword/phrase being targetted.
So whilst a snippet might contain a targetted phrase, it doesn't (or rarely) also contains another "expected" phrase in "support" of it.
This may also affect any sites (or pages) with RSS feeds or (merchant etal) data-feeds, which tend to be incomplete.
PS... That patent has got to be the worst one I've had the unfortunate pleasure to have read - Spelling mistakes, grammatical and sentence structure is appaling, for a patent. I'm almost inclined to believe it's deliberate for the purposes of obfuscation. They should be denied the patent on that basis IMO
possibly related (we're still probing here) to the Spam Detection Patent invented by Googler Anna Lynn Patterson.
If this concept is the root cause of the present results I'm seeing, itís certainly a very big departure from Google's "signals of quality" as we have historically come to know them.
Bottom line, over the past 60 days the quality of the results are not as good as they used to be. Lots of good, useful sites (unique and older) have gone deep, and lots of weaker sites (scraped content, poor fundamental building blocks and design) have risen.
Perhaps theres some refinement coming.
1. Phrase-based or directory-based with some other trigger, or ...?
2. End of results in every case or not always?
Since WebmasterWorld's name for this thing is spreading around the larger SEO commuity, I want to be sure we don't create another frustrating label, like "sandbox" or "nofollow" or "reinclusion request" ;)
At some point, about 5 weeks ago the page was ranking about #6 for its primary phrase. Then, it disappeared. About 2 weeks later we discovered that a developer had accidentally closed off the </body> and </html> tags before the navigation area (it was last in code order). So, we fixed that immediately and about 2-3 days later the ranking came back. We assumed that it came back because it now had internal linking power... perhaps wrong. Perhaps, in retrospect, it was ranking based on the internal anchor being removed because after we fixed the navigation it went back into oblivion (about 6-7 days later)
I've just removed it again (not through that method obviously). I'll post back if anything interesting happens.
page - www.sitename.com/blue-widgets.html
inbound link with anchor text "blue widgets"
A search for blue widgets will put us in the top 20 results. A search for blue widgets sitename will put us around 90.
Our sitename is very unique, so there is definitly some kind of penalty applied to this page. IBL's will help, but only for the exact anchor text.
Our penalty is also 100% directory based. Directories may jump in and out of this penalty randomly on any given data refresh. When the penalty is lifted - a search for blue widgets will put us in the top 5 results. A search for blue widgets sitename will put us #1, a search for blue will put us top 10.
Another interesting coincidence is that all our problems started on June 27, 2006, one day before the "Detecting spam documents in a phrase based information retrieval system" patent was filed.
Is it a "950 Penalty"? or is it Phrase Based Re-ranking?
Reminder: There are over 100 factors in the algo (algos, actually - plural). For as far back as I can remember, with every single major Google update or upheaval I've seen for years, people have been looking for that "thing" that's going on - and it never has been just one thing, not ever. Not this, not the "sandbox effect," not Dominick, and not Florida.
Brett found and bumped this old thread [webmasterworld.com] shortly after the Florida debacle:
Stemming and keyword "families" [webmasterworld.com]
hmmm... wonder why. When Brett says "ahem" the right response is "wassup?"
The engines have been striving for contextual relevancy for years, and the efforts evolve and get more sophisticated as time goes on. But that still isn't the one thing, as there never is. Still there's no denying that context is being looked at, with more efforts in that direction all the time. Related phrases are part of keyword families and the whole idea is to extract concepts programmatically to present more relevant results. Concepts and themes are a lot harder to fake than simple on-page factors and massive link acquisition.
Brett also appeared a bit agitated after a long thread about "Hilltop" being that "thing" (which it wasn't, but it made very good link-bait) - and in that thread posted a link to Jon Kleinberg's classic paper. Still a valid paper, after 9 years.
Nothing new is ever brand new in itself, but adds on to what's gone before, which doesn't all somehow magically go away. It never has been just one thing and still isn't, so there's really no room for disagreement if people are seeing different things. There are different things, as always.
I don't the problem you mention with keyword + sitename. My problems started on the 20th of December. Saying that, traffic is actually rising and AdSense income as well so I don't actually know anymore what I am complaining about.
I definetely don't have the full 950 penalty but the fluctuationg 150-200 penalty which could be as a result, as some have mentioned, of inbound links from other sites to my own getting the 950 demotion.
1. Not all pages are affected.
2. pages are not "over-optimised" whatever that is given we don't really have a true yardstick of the boundaries.
3. some pages had dup content which is being fixed, the titles and descs were quite similar and these have all been fixed.
4.internal pages with anchor text "anchor anchor1 anchor2" rank better for search "anchor anchor1 anchor2" than actual page that should.
5. 3 sites on same server are mildly interlinked where appropriate Topic A Site 1 linked to Topic A site 2
6. .htaccess redirects http://example.com to http://www.example.com
7. Site over 5 years old - niche content
8. Reciprocal links pages are present - and why not?
9. URIs are descriptive but not overly - ie no kw-kw1-kw2-kw3.htm
10. Same with folders
11. Some affiliate links.
12. Affiliate content not present, rewritten my style!
13. Absolute URIs used in links
and the list goes on...
[edited by: tedster at 7:11 pm (utc) on Feb. 8, 2007]
[edit reason] use example.com [/edit]
1. Phrase-based or directory-based with some other trigger, or ...?
ANSWER: On my site, I had pages of similar subject matter, in the same directory, ranking at #1 and at 950 (or 755 out of 790 results, this is a niche subject, so not always highly competitive, which also rules out highly competitive terms)
I could use a 2 or 3 word search term and the page was at the bottom of the results, but if I added another word it would bounce to #1!
2. End of results in every case or not always?
ANSWER: In my case, the impacted pages were always within the bottom 20 results, and they did fluctuate within that small range, so they didn't just get dumped there and forgotten, there was obviously some calculation happening.
I also saw other sites that regularly show up in the Top 10 for that search term move in and out of the bottom of the results as well.
Our issue is with 2 and 3 word search phrases.
We were on the first page of serps, usually in position 1-3. Now we're all over the place #133, #86, #151.
Our site is PR5, about 4 years old.
Our home page which ranked about #7-#9 on our most popular 3 word key phrase for the last two years is now at #131. Content is all original and not over SEOed. I have reduced keyword density to see if that makes any difference.
Similar situation happened on 12/17/06, but bounced back within 2 days.
This current drop was noticed on 2/3/07.
Now, we all know when Google serves up results that Google first determines the language, country and the browser type from the searcher. That is evident via the referral typically:
Notice the US, en and the fire fox in the referral. Now, the majority of the sites that have been hit with the -950, -500, etc... Penalty do not display properly on each type of browser.
So instead of Google keeping an index for fire fox and one for internet explorer, Google is filtering out sites with very poor html and hitting them with some sort of penalty.
That is where being w3c compliant or close to being compliant can pull you out of the penalty. I myself have witnessed two sites pull out of the -950 by becoming w3c compliant. The only changes to their sites were becoming compliant. They came out of the penalty in less than a week.
This is only one reason for a penalty, but it applies to a lot of sites. Tedster, bad coding on a site that is penalized falls under the MSSA penalty even though you think it does not exist. ;-)
[edited by: trinorthlighting at 8:23 pm (utc) on Feb. 8, 2007]
Same with my site, hit Jan. 20th complete url based. Was around the 30 - 40 mark for 1 week now down to 70's.
I think what most likely caused this penalty was a combination of problems with the site (anchore text, over opt.). When it first hit in Nov and December it was directory related but all pages returned to #1 and #2 within 1 week or so (some pages i de-optimized). But now it seems more Site Wide......and not moving other than down.
So maybe a group of pages triggered a major flag and this time it penalized the entire site.
Supplier told us to drop the site and move on to one that is not penalized! Just hate giving up 6 years worth of work.