| This 175 message thread spans 6 pages: < < 175 ( 1  3 4 5 6 ) > > || |
|Google's 950 Penalty (part 4) - or is it Phrase Based Re-ranking?|
< continued from [webmasterworld.com...] >
< related threads: -950 Quick Summary [webmasterworld.com] -- -950 Part One [webmasterworld.com] >
Something is now becoming clear, and I think it's time we put away the name "950 Penalty". The first people to notice this were the heaviest hit, and 950 was a descriptive name for those instances.
But thanks to the community here, the many examples shared in our monthly "Google SERP Changes" threads and in the "950 Penalty" threads themelves, we now can see a clearer pattern. The demotion can be by almost any amount, small or large -- or it even might mean removal from the SERP altogether.
It's not exactly an "OOP" and it's not the "End of Results" penalty. From the examples I've seen, it's definitely not an "MSSA Penalty" -- as humorous as that idea is. (Please use Google to find that acronym's definition.)
It's also not just a Local Rank pheomenon, although there are defiitely some similarities. What it seems to be is some kind of "Phrase-Based Reranking" - possibly related (we're still probing here) to the Spam Detection Patent [webmasterworld.com] invented by Googler Anna Lynn Patterson.
So let's continue scrutinzing this new critter - we may not yet have it nailed, but I'm pretty sure we're closer. The discussion continues:
[edited by: tedster at 9:18 pm (utc) on Feb. 27, 2008]
steveb, I just re-read one of your posts about this from last month --
|Sometimes the pages can do better, like for searches for the eight words in the title a page might turn up in to 400s. |
The penalty seems most often directory-wide. I had one directory recover yesterday, 54 pages now all back to number one or two, while another directory gets hit, with 152 pages dropping from the the top ten to 950. Heh, unfortunately I saw the good news first, then checked and saw the bad news a minute later...
You're pointing here to a directory-wide phenomenon that hasn't been touched on all that much. Do you have any more observations on that? Might these cases be related to phrases in the menu labels within the directory, do you think?
I think a page can drop to 120 because of another page which was helping it rank being hit by the 950 or being thrown out of the results completely. Thus a 120 can be related to the 950 penalty.
Trustworthyness of the sites navigation is, imho the key. Many factors can trip that analysis, which is probably made offline and then applied according to a search phrase at run time.
|In one case I know of, the signs of this problem disappeared with one solid new inbound link from a very different domain, with the problematic phrase used as the anchor text. By "very different" I mean the linking domain was not in the 1,000 for the given search. |
Ted, the Patterson patent about a multiple index-based system also gets into "related phrase bit vectors" and some of it is along the same line of reasoning.
Multiple index based information retrieval system [appft1.uspto.gov]
Down in part b) about ranking documents based on anchor phrases (which is part of it, not all), this relates to the possibility of the IBL helping to pull a site out:
| The product value here is a score of how topical anchor phrase Q is to document D. This score is here called the "inbound score component." This product effectively weights the current document D's related bit vector by the related bit vectors of anchor phrases in the referencing document R. If the referencing documents R themselves are related to the query phrase Q (and thus, have a higher valued related phrase bit vector), then this increases the significance of the current document D score. The body hit score and the anchor hit score are then combined to create the document score, as described above. |
In my experience with this, the "referencing page" is right on target and has been linking out to other topical sites for years, so it's historically relevant as well. No, the page doesn't rank for the topic but the site itself ranks #1 and #2 for the primary and secondary *main* two word phrases for the parent category for the topic - which would actually be "parent" terms for the broader category.
Thinking beyond phrases, in terms of taxonomies, the terms ranked for by the linking out site would be a few levels up in the ODP category tree. That makes it "related" enough for me, even though not specifically.
Plus, there are IBLs to this (linking out to D) site from other sites, on the specific topic of the one in question, that DO rank well for the specific topic and those have been ranking well for related keywords for several years. Remember the two_hops_ back link thing we were discussing a few years ago? I dont think it's gone away, and what it does is establish a "chain of relevancy."
This was an experiment, and at this point I'm kind of curious what would happen if that link were taken down. That would make it an even more interesting experiment. ;)
I really need to add something. The linked_to site recovered 2-3 days after the link was put up. Coincidence? Maybe. But at that point, the cache date for my page - and it showed that link - was the date for the day before the other site popped out. The cache date then reverted back to the older date and showed the older cached page.
[edited by: Marcia at 11:11 am (utc) on Feb. 8, 2007]
"But I don't see things that way anymore, steveb,"
Then start another thread, and let people wanting to talk about you thing talk about one thing, even if you have a different opinion.
"Might these cases be related to phrases in the menu labels within the directory, do you think?"
No. It's the best example of why the penalty isn't strictly word-based, as the pages below it in a directory can be about anything and even can be tested by creating totally innocent pages titled as Pink Leftist Zebras in Nebraska that will likely be close to the last result.
MHes makes a good point. It makes perfect sense for there to be collateral damage if secondary pages get a lot of their link power from a page in the 950 dumpster. The secondary pages might see drops, or even themselves get a related ranking drop (like lost 12 scoringpoints for having five links or more from -950 pages).
There can be pages ranking for certain phrases in the top ten, while other pages in the very same directory are 950+ for other search phrases.
Sure it can be a phrase-based penalty and drop pages down to 900+ at query time. Theoretically anyway, unless the papers explaining it are incorrect.
Marcia - "and at this point I'm kind of curious what would happen if that link were taken down. That would make it an even more interesting experiment"
Recently I believe we have caused that effect between two heavily interlinked sites.
steveb is right. The problem I was facing was not just one phrased based, it was completely URL based. One URL would not rank for ANY term regardless of what it was. It's been a long process for these pages, a few months back they had even disappeared from site: results. It was very bizarre indeed.
Yeah I have been scratching my head over this one.
I have 2 sites in the same niche; both do well on Google local. The main one extremely well, it has highly optimsed pages (title, H1, internal linking etc etc) and has definitely coped a phrase penalty late December, for quite a few uncompetitive key words strings.
The main site has not been changed in over 8 months, The home page is ranking top 5 on keyword keyword but the actual informational pages for many phrases ... gone & un optimsed terms still ranking really well on the same pages. So not a page penalty.
The funny thing is that the other site, which is not highly optimised on page or off (well, it has had the basics done) but has no external link work done on it, is now doing really well on the phrases the other lost.
My guess is over optimised internal links, combined with on page (now) overoptimisation, throw it some on theme deep external recips and there you have the recipe for "Phrase Based Re-ranking"
I have changed the site nav and some on page factors to do a test today. I think that 2 months is long enough to think that this could stick.
Of course open to revision at any time and all feedback appreciated.
[edited by: Interent_Yogi at 12:45 pm (utc) on Feb. 8, 2007]
Based on what NickOr said, we're talking about two different things here. He said he thought it was URL based, because the page didn't come up for any search term. Yet I found adding an additional word to the search term would bring my page up from #755 to #1. Yet, if you took any combination of the original two or three word search terms it was buried.
The spam recognition theory where the SERPs are recalculated seems to make sense to me, I just don't think it's "spam" in the normal definition of spam, since I'm not finding spammy words or phrases on the pages that were hit. It's an overoptimization of some kind, but I honestly believe it isn't necessarily on the pages impacted, I think it's other factors on the site that cause some pages to drop.
I base this on the fact that I really didn't change all that much on the actual pages I've been watching, as they just didn't seem to be overoptimized. I took the search terms out of two outbound links on one page, and used a slang term for the keywords a couple of times to lower the number of times those terms appeared, but that was about it. The text on the page read fine. I did make changes to other pages higher up that were in a position to pass on more or less PR to the pages hit. I've also had two other people look at some of the pages that were hit, and they saw no evidence of overoptimization on those pages, but made some recommendations on pages that were higher up the chain.
I can get behind this theory to some extent. In November, all of my rankings disappeared for all phrases (for one site). Since then, many have been improving at a steady rate, but two phrases remain in the 300-600 range. This has been feeling like a phrase-based problem to me for a while now. Now, instead of just a feeling, I'm seeing some actual thoughts by others on it. I still don't have proof, but it really "feels right", so I'm going to work actively on implementing some things discussed here to see if I can prove it to myself (and hopefully recover at the same time). So, thanks.
It's necessarily (I suppose) difficult to string together what everyone is witnessing into a cohorent theory because it involves every witness to admit to actually scraping/spamming - Without that differentiation, the picture will be blurred.
If we're relating all this to the patent, I personally feel that one of the key-phrases of note in there is; "predictive phrase":
|Another aspect of good phrases is that they are predictive of other good phrases, and are not merely sequences of words that appear in the lexicon. For example, the phrase "President of the United States" is a phrase that predicts other phrases such as "George Bush" and "Bill Clinton." However, other phrases are not predictive, such as "fell down the stairs" or "top of the morning," "out of the blue," since idioms and colloquisms like these tend to appear with many other different and unrelated phrases. Thus, the phrase identification phase determines which phrases are good phrases and which are bad (i.e., lacking in predictive power). |
A lot of scraped content out there is basically, incomplete, ie... jumbled-together snippets centred around the keyword/phrase being targetted.
So whilst a snippet might contain a targetted phrase, it doesn't (or rarely) also contains another "expected" phrase in "support" of it.
This may also affect any sites (or pages) with RSS feeds or (merchant etal) data-feeds, which tend to be incomplete.
PS... That patent has got to be the worst one I've had the unfortunate pleasure to have read - Spelling mistakes, grammatical and sentence structure is appaling, for a patent. I'm almost inclined to believe it's deliberate for the purposes of obfuscation. They should be denied the patent on that basis IMO
Our site ranking for "quoted phrases" and way down when searched without quotes. Is this Phrase based re-ranking / filter are not applied to quoted phrase search?
Anyone here noticed ranking top for "red widgets" and not for red widgets?
does anyone have several pages affected and not have widespread problems of those pages being scraped widely, normally by sites listed above them?
could someone please summarise in "easy english"?
|possibly related (we're still probing here) to the Spam Detection Patent invented by Googler Anna Lynn Patterson. |
If this concept is the root cause of the present results I'm seeing, itís certainly a very big departure from Google's "signals of quality" as we have historically come to know them.
Bottom line, over the past 60 days the quality of the results are not as good as they used to be. Lots of good, useful sites (unique and older) have gone deep, and lots of weaker sites (scraped content, poor fundamental building blocks and design) have risen.
Perhaps theres some refinement coming.
I've changed the subject line of this thread to reflect that this a ongoing discussion, and not a conclusion -- sorry for jumping the gun. I would very much appreciate input on both sides:
1. Phrase-based or directory-based with some other trigger, or ...?
2. End of results in every case or not always?
Since WebmasterWorld's name for this thing is spreading around the larger SEO commuity, I want to be sure we don't create another frustrating label, like "sandbox" or "nofollow" or "reinclusion request" ;)
no...end of results is just one sympton of the same problem (which itself is a mix of elements)....and if you follow how its gone sites have fallen futher and futher back with the same pages....
Here's an interesting note. One site where this has been applied to just one directory has a certain phrase as navigation for that directory. Now, all the other sitewide linked directories perform perfectly well but are older than this one (this was created 2 months ago).
At some point, about 5 weeks ago the page was ranking about #6 for its primary phrase. Then, it disappeared. About 2 weeks later we discovered that a developer had accidentally closed off the </body> and </html> tags before the navigation area (it was last in code order). So, we fixed that immediately and about 2-3 days later the ranking came back. We assumed that it came back because it now had internal linking power... perhaps wrong. Perhaps, in retrospect, it was ranking based on the internal anchor being removed because after we fixed the navigation it went back into oblivion (about 6-7 days later)
I've just removed it again (not through that method obviously). I'll post back if anything interesting happens.
What we're experiencing is definitely not phrase-based, and also not necessarily end of results. The page itself is buried in the results regardless of what phrase you put in. Any inbound links we receive for the page will help, but only for the anchor text of the inbound link.
page - www.sitename.com/blue-widgets.html
inbound link with anchor text "blue widgets"
A search for blue widgets will put us in the top 20 results. A search for blue widgets sitename will put us around 90.
Our sitename is very unique, so there is definitly some kind of penalty applied to this page. IBL's will help, but only for the exact anchor text.
Our penalty is also 100% directory based. Directories may jump in and out of this penalty randomly on any given data refresh. When the penalty is lifted - a search for blue widgets will put us in the top 5 results. A search for blue widgets sitename will put us #1, a search for blue will put us top 10.
Another interesting coincidence is that all our problems started on June 27, 2006, one day before the "Detecting spam documents in a phrase based information retrieval system" patent was filed.
|Is it a "950 Penalty"? or is it Phrase Based Re-ranking? |
How about: Is it a "950 penalty" and Phrase Based Re-ranking, etc., etc., etc.
Reminder: There are over 100 factors in the algo (algos, actually - plural). For as far back as I can remember, with every single major Google update or upheaval I've seen for years, people have been looking for that "thing" that's going on - and it never has been just one thing, not ever. Not this, not the "sandbox effect," not Dominick, and not Florida.
Brett found and bumped this old thread [webmasterworld.com] shortly after the Florida debacle:
Stemming and keyword "families" [webmasterworld.com]
hmmm... wonder why. When Brett says "ahem" the right response is "wassup?"
The engines have been striving for contextual relevancy for years, and the efforts evolve and get more sophisticated as time goes on. But that still isn't the one thing, as there never is. Still there's no denying that context is being looked at, with more efforts in that direction all the time. Related phrases are part of keyword families and the whole idea is to extract concepts programmatically to present more relevant results. Concepts and themes are a lot harder to fake than simple on-page factors and massive link acquisition.
Brett also appeared a bit agitated after a long thread about "Hilltop" being that "thing" (which it wasn't, but it made very good link-bait) - and in that thread posted a link to Jon Kleinberg's classic paper. Still a valid paper, after 9 years.
Nothing new is ever brand new in itself, but adds on to what's gone before, which doesn't all somehow magically go away. It never has been just one thing and still isn't, so there's really no room for disagreement if people are seeing different things. There are different things, as always.
vividseats, that's also my birthday so put it in your diary please.
I don't the problem you mention with keyword + sitename. My problems started on the 20th of December. Saying that, traffic is actually rising and AdSense income as well so I don't actually know anymore what I am complaining about.
I definetely don't have the full 950 penalty but the fluctuationg 150-200 penalty which could be as a result, as some have mentioned, of inbound links from other sites to my own getting the 950 demotion.
1. Not all pages are affected.
2. pages are not "over-optimised" whatever that is given we don't really have a true yardstick of the boundaries.
3. some pages had dup content which is being fixed, the titles and descs were quite similar and these have all been fixed.
4.internal pages with anchor text "anchor anchor1 anchor2" rank better for search "anchor anchor1 anchor2" than actual page that should.
5. 3 sites on same server are mildly interlinked where appropriate Topic A Site 1 linked to Topic A site 2
6. .htaccess redirects http://example.com to http://www.example.com
7. Site over 5 years old - niche content
8. Reciprocal links pages are present - and why not?
9. URIs are descriptive but not overly - ie no kw-kw1-kw2-kw3.htm
10. Same with folders
11. Some affiliate links.
12. Affiliate content not present, rewritten my style!
13. Absolute URIs used in links
and the list goes on...
[edited by: tedster at 7:11 pm (utc) on Feb. 8, 2007]
[edit reason] use example.com [/edit]
1. Phrase-based or directory-based with some other trigger, or ...?
ANSWER: On my site, I had pages of similar subject matter, in the same directory, ranking at #1 and at 950 (or 755 out of 790 results, this is a niche subject, so not always highly competitive, which also rules out highly competitive terms)
I could use a 2 or 3 word search term and the page was at the bottom of the results, but if I added another word it would bounce to #1!
2. End of results in every case or not always?
ANSWER: In my case, the impacted pages were always within the bottom 20 results, and they did fluctuate within that small range, so they didn't just get dumped there and forgotten, there was obviously some calculation happening.
I also saw other sites that regularly show up in the Top 10 for that search term move in and out of the bottom of the results as well.
Mine is definitely not directory based. And it's also not just "dumped to the bottom". For what that's worth.
Our situation is not directory based.
Our issue is with 2 and 3 word search phrases.
We were on the first page of serps, usually in position 1-3. Now we're all over the place #133, #86, #151.
Our site is PR5, about 4 years old.
Our home page which ranked about #7-#9 on our most popular 3 word key phrase for the last two years is now at #131. Content is all original and not over SEOed. I have reduced keyword density to see if that makes any difference.
Similar situation happened on 12/17/06, but bounced back within 2 days.
This current drop was noticed on 2/3/07.
When talking of "phrase searches", what exactly is everyone using as a search query?
["Quoted search phrase"]?
Ours are just: red widget widget
no quotes or hyphens
I have looked at a lot of sites that are under this penalty. Most of them have major html errors that cause the sites to not display properly on Internet Explorer, fire fox and other browsers.
Now, we all know when Google serves up results that Google first determines the language, country and the browser type from the searcher. That is evident via the referral typically:
Notice the US, en and the fire fox in the referral. Now, the majority of the sites that have been hit with the -950, -500, etc... Penalty do not display properly on each type of browser.
So instead of Google keeping an index for fire fox and one for internet explorer, Google is filtering out sites with very poor html and hitting them with some sort of penalty.
That is where being w3c compliant or close to being compliant can pull you out of the penalty. I myself have witnessed two sites pull out of the -950 by becoming w3c compliant. The only changes to their sites were becoming compliant. They came out of the penalty in less than a week.
This is only one reason for a penalty, but it applies to a lot of sites. Tedster, bad coding on a site that is penalized falls under the MSSA penalty even though you think it does not exist. ;-)
[edited by: trinorthlighting at 8:23 pm (utc) on Feb. 8, 2007]
"NickOr The problem I was facing was not just one phrased based, it was completely URL based."
Same with my site, hit Jan. 20th complete url based. Was around the 30 - 40 mark for 1 week now down to 70's.
I think what most likely caused this penalty was a combination of problems with the site (anchore text, over opt.). When it first hit in Nov and December it was directory related but all pages returned to #1 and #2 within 1 week or so (some pages i de-optimized). But now it seems more Site Wide......and not moving other than down.
So maybe a group of pages triggered a major flag and this time it penalized the entire site.
Supplier told us to drop the site and move on to one that is not penalized! Just hate giving up 6 years worth of work.
For those who feel their situation is url based, does this mean that you've lost strong rankings for more than one phrase on that url?
Poor HTML on a page may be a factor, but why would that same page show up at #1 just by adding an additional word to the search term? That wouldn't have any impact on the fact that the page is poorly coded.
Looks like we're talking about a bunch of different issues here. Here's what we're experiencing:
- Directory wide penalty with individual directories cycling in and out of the penalty. Cycling only happens during data refreshes. We've had the same directory in and out of penalty 4 times in the last 3 months.
- Page-based penalty, not phrase based. Definitly seems like Google is assessing the penalty after the url's are retrieved, like the spam patent says.
- Directory pages under penalty always rank from 50-end of results. When the penalty is lifted all pages are top 10.
- Inbound links definitly help, but never to the extent when the directory is out of penalty.
- Searches for anchor text always rank higher than searches for 8-10 word phrases unique to that page.
- The less inbound links the harsher the penalty is - end of results occurs when there are no inbound links to a page
- End of results also occurs if the search term isn't an exact match of any inbound link anchor text. i.e. if inbound link anchor text is red widgets, a search for red widgets would not be end of results, but a search for red or red widgets cheap would be.
To sum up here's what I think is happening. A user enters a search term: red widgets. The algo returns a list of urls internally where our site is top 10. Our page is put through the spam filter, is labeled as spam, and pushed to end of results. Then the algo looks at inbound links and elevates it by some amount based on anchor text only and the number of inbound links for that anchor text. After the spam filter our site gets hit so hard that it's almost impossible to rank well, and when ranked its ONLY for exact anchor text.
We have our site navigation which is pretty much duplicated on each page. Sitemaps shows our home page being linked to from pretty much every page on our site. I doubt this is causing the problem, but if it is - doesn't it seem a little absurd to remove our site navigation?
[edited by: JerryRB at 8:49 pm (utc) on Feb. 8, 2007]
| This 175 message thread spans 6 pages: < < 175 ( 1  3 4 5 6 ) > > |