| This 175 message thread spans 6 pages: < < 175 ( 1 2  4 5 6 ) > > || |
|Google's 950 Penalty (part 4) - or is it Phrase Based Re-ranking?|
< continued from [webmasterworld.com...] >
< related threads: -950 Quick Summary [webmasterworld.com] -- -950 Part One [webmasterworld.com] >
Something is now becoming clear, and I think it's time we put away the name "950 Penalty". The first people to notice this were the heaviest hit, and 950 was a descriptive name for those instances.
But thanks to the community here, the many examples shared in our monthly "Google SERP Changes" threads and in the "950 Penalty" threads themelves, we now can see a clearer pattern. The demotion can be by almost any amount, small or large -- or it even might mean removal from the SERP altogether.
It's not exactly an "OOP" and it's not the "End of Results" penalty. From the examples I've seen, it's definitely not an "MSSA Penalty" -- as humorous as that idea is. (Please use Google to find that acronym's definition.)
It's also not just a Local Rank pheomenon, although there are defiitely some similarities. What it seems to be is some kind of "Phrase-Based Reranking" - possibly related (we're still probing here) to the Spam Detection Patent [webmasterworld.com] invented by Googler Anna Lynn Patterson.
So let's continue scrutinzing this new critter - we may not yet have it nailed, but I'm pretty sure we're closer. The discussion continues:
[edited by: tedster at 9:18 pm (utc) on Feb. 27, 2008]
Looks like we're talking about a bunch of different issues here. Here's what we're experiencing:
- Directory wide penalty with individual directories cycling in and out of the penalty. Cycling only happens during data refreshes. We've had the same directory in and out of penalty 4 times in the last 3 months.
- Page-based penalty, not phrase based. Definitly seems like Google is assessing the penalty after the url's are retrieved, like the spam patent says.
- Directory pages under penalty always rank from 50-end of results. When the penalty is lifted all pages are top 10.
- Inbound links definitly help, but never to the extent when the directory is out of penalty.
- Searches for anchor text always rank higher than searches for 8-10 word phrases unique to that page.
- The less inbound links the harsher the penalty is - end of results occurs when there are no inbound links to a page
- End of results also occurs if the search term isn't an exact match of any inbound link anchor text. i.e. if inbound link anchor text is red widgets, a search for red widgets would not be end of results, but a search for red or red widgets cheap would be.
To sum up here's what I think is happening. A user enters a search term: red widgets. The algo returns a list of urls internally where our site is top 10. Our page is put through the spam filter, is labeled as spam, and pushed to end of results. Then the algo looks at inbound links and elevates it by some amount based on anchor text only and the number of inbound links for that anchor text. After the spam filter our site gets hit so hard that it's almost impossible to rank well, and when ranked its ONLY for exact anchor text.
We have our site navigation which is pretty much duplicated on each page. Sitemaps shows our home page being linked to from pretty much every page on our site. I doubt this is causing the problem, but if it is - doesn't it seem a little absurd to remove our site navigation?
[edited by: JerryRB at 8:49 pm (utc) on Feb. 8, 2007]
|For those who feel their situation is url based, does this mean that you've lost strong rankings for more than one phrase on that url? |
Yes, we have been hit on a couple of sites, for the index page, for multiple terms we were ranking strongly for. Interestingly on one site it dropped to # 130 for its best term, then to # 990 for its second and third best.
Boy the folks at the Plex must really be getting a laugh out of this one. Makes you wonder if they just toss in a few random variables to keep us from jelling on what exactly happened.
No kidding. You can bet they have someone following this thread too.
Kinda sad really. No official comment on whats going on or why they slapped all these sites. If they send people emails about using hidden text, you'd think they could keep us somewhat informed.
Whatever it is it’s a big change. Once upon a time they ran around ranking sites based upon what they themselves called; “Signals of Quality”, and the negative stuff more or less didn’t help, but didn’t really hurt that much iether. Now it appears there is a concerted part of the algorithm that’s purposefully looking for “Negative Signals of Quality” and then re-ranking everything based on that input.
If you ask me it’s not working very well, but perhaps things will improve over time. I know others disagree, but IMHO (and man right now its gotten pretty humble) the past 60 days have brought the biggest changes to how Google ranks sites than we have seen in 2 or 3 years.
I’m torn between the anti spam “phrasing patent” theory and the last minute re-ranking based upon how many sites that initially ranked in the top 1,000 for that particular key word link to you. Links (always the almighty) + anchor text (the “penalized phrases) + trusted sites (what are your links really worth if your honest enough?) = your left alone. Its sort of like thier saying "how can you be optimizing like crazy for that term? What quality links do you have supporting that effort? Give us a break; end of the line!
Perhaps there’s hope, seeing some sites jump back and forth between the 90’s and the 950’s. But you have to believe even at the 90’s their still affected somehow.
One positive; its really gotten me polishing my ppc'ing game.
|One positive; its really gotten me polishing my ppc'ing game. |
A positive for Google shareholders too. The one theory that actually makes sense, follow the money. The adwords algo definitely has no problem with my site. :)
[edited by: JerryRB at 9:47 pm (utc) on Feb. 8, 2007]
After reading through this whole thread here are some thoughts.
|Another site had a page drop, with all the other important rankings unchanged, then we removed it from the navigation menu and it came back almost overnight. Put it back in navigation and it dropped out almost overnight as well. |
So it could have been a given phrase repeated through all those navigation menus. I think we are working with a moving target. Google is making changes and so are we so it's hard to tell which does what.
|which have an excessive number of related phrases present in the document |
I've been looking for phrases related to the topic but now I'm wondering if the second phrase could be a general phrase often found in spammy pages. I need to be more open about what the "predictive phrase" might be.
|the inbound anchor text scoring described in the patent could boost the document out of the danger zone if even a single new IBL shows up |
That's been my experience in one situation. So possibly the semantically related phrases score is combined somehow with a page strength score based on inbound links.
|There can be pages ranking for certain phrases in the top ten, while other pages in the very same directory are 950+ for other search phrases. |
This is what boggles my mind. In the case of my war related page if I search for the 'whatever war widgets' number two shows two articles (one indented) from another site of mine on the war topic. Number 4 shows two pages from the very same site as the missing page. This means including indented pages my pages take up 4 of the top 6 results. Yet the best, most complete and scholarly page is still down as something like 996! This sort of thing is happening with several of my missing pages. I might mentions some of these war related pages aren't linked to or from the missing page and others are. So it must not be just related to links.
|Anyone here noticed ranking top for "red widgets" and not for red widgets? |
|does anyone have several pages affected and not have widespread problems of those pages being scraped widely, normally by sites listed above them? |
Some are scraped more than others. I don't think it's the scrapers alone but if scrapers are also linking with the same phrase they just help the problem add up to the breaking point.
|Any inbound links we receive for the page will help, but only for the anchor text of the inbound link. |
it could just be part of this fine line I keep talking about. Any little thing can put it over or under that line. In my case the page comes back without the exact phrase of the new inbound link. But it could be related to other things in this delicate balancing act.
|When talking of "phrase searches", what exactly is everyone using as a search query? |
I mean no quotes or dashes or anything. These pages used to be in first page results on varied search phrases related to the article topic with no quotes but more like most searchers search, just words that fit the topic.
The 950 penalty has nothing to due with phrases; it has everything to due with single URL’s and their on page factors! People are overcomplicating this penalty too much. Get out of the SEO mindset, quit chasing links, quit thinking of keyword density and START to think of your users!
There are a few of us on webmaster world who always preach the following which are beneficial for end users:
Unique Content - Something interesting to read, not repetitive BS
Text outweighs html - Who wants to see a page full of links or other html?
W3C Compliance or very close to it - Every browser can view it! Also its 100% crawl able to every spider or search engine.
The few of us who always preach this never complain about our serps and never lose ground when search algo's update.
|Looks like we're talking about a bunch of different issues here. |
Unfortuately, I think so. Here's the way way I'm hoping to clarify this discussion.
1. URL based penalty --
Every time a certain url is returned for any search term at all, it will automatically be penalized. That is, the url carries a penalty wherever it shows up.
2. Re-ranking --
An initial set of results are generated for some phrase. Then a new rule-set is then applied to every url in that initial set. Every url in the initial result set can now, at least potentially, be re-ranked -- depending on whether the second rule-set "hits" it or not.
trinorthlighting, You have a great list of things that everyone should strive for. But there are those of us who do all that and are still loosing pages. There really is something awry here.
Just made a new discovery. The contents page for the subdirectory that had all pages gone is back. You will never believe why. I'd totally given up and decided to reorganize the whole section of my site. So that page now has links to the pages I got rid of. There is nothing else on the page now. Not even a title for the page or anything. It just links to empty pages with a word or two. it's just there as a way to be sure Google really deletes the pages that I've dropped.
This got me curious so I looked at the backlinks. Lo and behold all but a couple of links were from scrapers. And all the scrapers were on the topic of (writings about what a person thinks of those things you sit down, open up and read).
Soo of course that bare little page that is left of the original never mentions these. So I think I've found another phrase that may be a problem.
I looked over the scrapers and noticed what anchor text they had used. Then I redid my internal links, page title etc to make sure I never mention this phrase. Hopefully I can get the article pages from the section pages back this way. Plus I don't want to lose the new contents page so I removed the phrase from that as well.
It took a little creativity to figure out how to do a whole section indicating what it was about without ever using the most natural phrase for it. This is getting ridiculous!
"we're talking about two different things here"
More like ten or eleven things. Unfortunately the thread has been ruined with the renaming and wildly unrelated subjects mixing together so it should be closed and new ones that are about different things started.
This is not an "or" issue. There are multiple problems, with multiple symptoms that are vaguely similar. Talking about something dropping thirty places in the content of the 950 penalty though is silly (except how MHes mentioned in a secondary way.)
"The few of us who always preach this never complain about our serps and never lose ground when search algo's update."
" The product value here is a score of how topical anchor phrase Q is to document D. This score is here called the "inbound score component." This product effectively weights the current document D's related bit vector by the related bit vectors of anchor phrases in the referencing document R. If the referencing documents R themselves are related to the query phrase Q (and thus, have a higher valued related phrase bit vector), then this increases the significance of the current document D score. The body hit score and the anchor hit score are then combined to create the document score, as described above."
Makes you wonder if article writing for submission is not such a bad thing.
|An initial set of results are generated for some phrase. Then a new rule-set is then applied to every url in that initial set. Every url in the initial result set can now, at least potentially, be re-ranked -- depending on whether the second rule-set "hits" it or not. |
If I'm understanding you correctly, that is similar to how the patent describes the process (only with more complexity), before it presents the resultset to the user... but I don't see how we could witness that from the frontend.
You're talking on-the-fly, I presume? As in: A resultset matching the query is plucked from a "99% ready-made" pool of documents, whereupon another "refining" algo is applied just before presenting the final SERP?
As much as I respect the restrictions on link-dropping, here at WebmasterWorld, this is one of those anomalies that could probably be deciphered more quickly and efficiently if we could see, and post solid examples. It's like asking a group of people to describe, in 3 words, the same room.
I applaud Google for defining its algo such that it mostly sits outside of WebmasterWorld's TOS ;)
I might just add though, if you believe information architecture (IA), helps define your topic, vertical, whatever you want to call it... then to my mind, KW's, related KW's, and phrases, related phrases, predictive phrases etc... all form part of the parent directory "funnel", if you see what I mean.
In other words, if you dig-deep in to a directory, using only the top-level words to guide you at each stage (as you would using a filing cabinet), and continue all the way to the right piece of content (on the page); each level of that journey should tell a story that confirms you're on the right track. A you get closer, eventually you have to start reading page content from a cluster of others which are loosley related (Compression: MP3? or Muscle?).
Muscle might not be mentioned on the page but sinew is. You hold that aside and refine it again by looking for another phrase you'd expect to see, and so on. The point is, that one word might be enough to throw the "meaning" of that directory out of the resultset for that SERP (and possibly even show-up in the context of another (unintended) search-result instead).
It might be extra difficult to see because of, as we know, the multitude of other factors that go into also defining that page.
I would suggest playing around by picking your favourite KW phrase and play with:
[~KW] (For each single KW in your chosen phrase)
["KW phrase"] (Phrase in quotes)
[KW-phrase] (Hyphenated phrase)
... and see if that tells you a story
I said to meself; short post, soz.
[edited by: TheWhippinpost at 12:51 am (utc) on Feb. 9, 2007]
|It took a little creativity to figure out how to do a whole section indicating what it was about without ever using the most natural phrase for it. This is getting ridiculous! |
|The 950 penalty has nothing to due with phrases; it has everything to due with single URL’s and their on page factors! |
The second supposition would preclude the possibility that an IBL could affect the page with a penalty for a phrase, which isn't so, and can be disproven not only by by evidence actually seen that an IBL can and does have an affect - which is not on-page optimization at all - but by reference in the paper.
The topic of this thread is:
|Is it a "950 Penalty"? or is it Phrase Based Re-ranking |
And some of us have good reason to believe that it has a lot to do with phrases. It's very clear in the paper that there can indeed be a phrase-based penalty. It doesn't say how much of a penalty, but that's irrelevant.
So if we're talking about phrases, we're talking about keywords, keyword phrases, and keyword co-occurrence (not KWD, by the way). Then we can add a little about IDF, which is also mentioned in the paper.
If we delve further into keyword phrase construction and relationships a bit, we may get closer to finding some answers to the actual topic that this thread is asking about. Some us are not concerned in this thread about the many other topics being introduced - the topic of the thread is what we're acutely interested in right here.
Not 10, not 20, not 30 and not 90 or 150 or 300 and/or other miscellany - 950 and how it may be related to keyword usage, as it appears to some of us that it indeed does.
So if anyone disagrees, they'll just have to agree to disagree with many of us, and those who are interested can carry on with our analysis of the possibilities.
I don't know about everybody else, but trinorthlightning couldn't be more wrong in my case. Unique Content is not an issue, the text to html ratio is fine, and there are no major W3C violations. This is not the issue.
You have two pages. Red Widgets and Green Widgets:
1) Both pages have identical site design
2) Both have approximately the same amount of appropriate and unique text describing each widget
2) Both have an identical number of internal links with appropriate anchor text ("Red Widgets" and "Green Widgets" and that's it)
3) Both have the same PR and are linked to from the same internal pages
4) For YEARS both rank in the top 10 for their respective keyword combo
Now, the Red Widgets page has been filtered/penalized/whatever and is now somewhere hundreds of positions down for the search "Red Widgets".
The Green Widgets page is just fine in the top for "Green Widgets" like it always has been.
What's the difference? Nothing internally has changed. Both look virtually identical and have practically the same kw density and text/html ratios. So what's the cause?
May be the way people search for widgets.
Have you ran google trends on all the keywords to see what people are actually looking for?
May be 10,000 people a month search for red widget, but green widget is only being searched by 10 people a month.
If your only issue is the color of the widget, that sounds more like a duplicate issue since the pages are basically the same....
Try one page with green and red widgets and keep the url that is getting the traffic.
Use google trends, its very helpful! Also, keep in mind google builds it's serps around users and the way and keywords searchers search by.
Not that it's any help right now, because I don't have any answers. But we have the same thing going on. Several products, unique copy, same PR, same internal links. A few products rank, the other doesn't. All have been on page 1 for years. Now they are all over the place, #60, #131, #86, etc..
Our other problem is that our homepage has dropped 100+ for its number one search phrase. It was on the 1st page for the last several years.
There needs to be another thread started for those who want to explore other possibilities than the phrases aspect. I think the phrase patent and looking at it in relation to missing pages is complicated enough that this thread needs to completely concentrate on it.
It's not actually different color widgets, they're entirely different (poor example on my part). "JonesCo Widgets" and "SmithInc Gizmos" might be more accurate. There's definitely differences in the number of searches. Some of these terms I'm talking about only drew a couple of hits a day at most even at #1, whereas others drew hundreds at #10. In my case these aren't competitive terms, they're very niche and very specific.
The thing you mentioned about how people search is right on, however. You'll notice that many people are saying something to the effect of adding another keyword brings you back up to the top, and this is my experience as well. For example:
"JonesCo Widgets" ranks 484
"SmithInc Gizmos" ranks 689
"JonesCo Widgets Purchase" ranks #6
"Large SmithInc Widgets" ranks #1
Adding another keyword brings everything back to the top. It's only the 2 word phrases that are penalized/filtered. These phrases are the anchor text in the internal links and very likely the external IBLs as well. It's because of still ranking well for the additional keywords that we're continuing to receive traffic at all.
I didn’t think of the 950-penalty when it happened to me, it felt more like a new site sandbox. I had plenty of listings, some good, then overall my Google traffic dropped to domain searches only. First thing I did was started looking at errors.
I cleaned up my sites more, dropped more html, brought in more css and text. Sitemaps reported a couple dead pages, and checking my own listings I found 404 and bad pages within Google. It took about a week and I was back in action..
A friend had the same issue, he had listings, they went poof, we added the site-map, corrected the errors it returned, and within 24 hours he was getting fresh keyword traffic. We both had different types of errors causing the problem.
I’m sticking to my guns that most people have this penalty because of general mistakes/errors.
|I’m sticking to my guns that most people have this penalty because of general mistakes/errors. |
Based on what?
What were your rankings when they dropped? Were they 900+?
[edited by: Marcia at 4:37 am (utc) on Feb. 9, 2007]
If that's the case, then let's narrow it down. What's the "error" that's causing the penalty?
And, just for fun: if every page has identical HTML (they're all database generated after all) why are some pages penalized for the error and others not penalized? If there's a mistake or error then they either all have it, or none have it. And yet some pages rank well, and others don't. So far as I can tell it's completely random.
|And yet some pages rank well, and others don't. So far as I can tell it's completely random |
AND do the pages have the 900+ penalty? That's what we are trying to discuss here, in fact practically begging to. That and issues related to phrases, which happens to be the topic of this thread and a very relevant topic right now.
Not ranking well is not the same as the 900+ penalty, it's something else. So - do the pages have the 900+ penalty or did they, or don't they?
If what's the case? Says who that that's the case? and based on what? Is it related to the 900+ penalty or not? Or is it something else?
Based on what is that case, and what is it the case for?
[edited by: Marcia at 4:52 am (utc) on Feb. 9, 2007]
|There needs to be another thread started for those who want to explore other possibilities than the phrases aspect. I think the phrase patent and looking at it in relation to missing pages is complicated enough that this thread needs to completely concentrate on it. |
For how long do the people who want to discuss the topic of thread have to keep begging for it not to be hijacked off to other topics?
|most people have this penalty because of general mistakes/errors |
Of course it makes sense to check for HTML and other errors first. But after that I think the possibility of some sort of phrase penalty or filter seems quite likely. It's well worth looking into.
|And yet some pages rank well, and others don't. So far as I can tell it's completely random. |
It's not random though, it just seems that way as we have no way of knowing all the many things Google is taking into account.
In a very simplified form right now it looks to me that:
Google has some way of isolating certain phrases that they deem typical of spam sites.
Some of us have pages that have too many of these flag phrases.
But the number of phrases that can cause the penalty depends on other factors.
It appears that these problem phrases do more damage if they are in the page title and possibly the H1 tags.
These phrases are more damaging if they are in inbound or internal linking anchor text.
Strength of inbound links is computed in so the level of problem phrases that hurt a page might be different depending on inbounds.
Google is constantly adjusting this filter so some pages are going in and out of it.
As a result:
Some of us have had pages come back after getting good inbound links.
Some are finding pages come back when no changes in the page has been made at all.
Has anyone had good luck with changing phrases in anchor text or on the page? Mine changes are too new to see results.
What can we do:
More people can read the phrases patent and other related patents. We all notice different things so lets pool our observations.
We can look at the internal and inbound backlinks of the pages involved to see what anchor text is used.
We can explore our pages looking for possible phrases.
We can work on getting more inbound links on these pages.
Lets see suggestions as to what else we can do.
added >> Also looking at which pages from your site are clustered around your 900+ results might be helpful. Do the extended search to see this.
[edited by: annej at 5:38 am (utc) on Feb. 9, 2007]
|AND do the pages have the 900+ penalty? That's what we are trying to discuss here, in fact practically begging to. That and issues related to phrases, which happens to be the topic of this thread and a very relevant topic right now. |
Not ranking well is not the same as the 900+ penalty, it's something else. So - do the pages have the 900+ penalty or did they, or don't they?
I thought we've already determined that the -950 penalty is the same thing as the 300-950 penalty? Which is why tedster wanted to change the name from "The -950 Penalty" to something more appropriate. The "bottom of the results" penalty seems to be the same thing, just there's fewer results so it appears goes to the end. Is -950 acting like -30? In other words: is anybody who was previously #5 in the SERPs now exactly #955 out of 50,000 results? Not so far as I can tell, but maybe I'm wrong.
|If what's the case? Says who that that's the case? and based on what? Is it related to the 900+ penalty or not? Or is it something else? |
Based on what is that case, and what is it the case for?
Sorry, I was replying to TheDocBlog's post and didn't quote him. You got in under me by a 2 minutes and so it looks like I'm replying to nobody. The "case" that he proposed was that there's some simple HTML error causing all these problems.
My group of sites and an associate I helped with this, all had listings, some sites had good keywords others didn’t, either way they had Google listings. Then some sandbox took place and the only Google traffic coming in was from searches done on the direct domain names.
Since these sites are affiliate driven non-blog sites with rankings we didn’t use site-maps. Once we added them we found a mixture of stupid mistakes like missing robots.txt files, 404 pages and some other rather bad mistakes. We also found common html/css errors and Google had listed some of the bad pages. Also some of the back links / affiliate websites had lost rankings too. The errors seemed to have caused a chain reaction…
Now, this could be one type or group of penalties that is being weighed more, it may be a starting point for people to look at. Before reading this post I thought maybe the Supplemental Results feature was playing with backlink counts within googles algo.
If Google is detecting keyword spam phrases, then the system is broken. Plenty of clear-cut spam sites are still listed, and plenty of others that clearly do not target popular keywords/phrases or spam were hit with a penalty. To me Google would use this to detect higher targeted keywords for the spam and not so much at a global keyword level, at this point.
|If Google is detecting keyword spam phrases, then the system is broken. |
No, it does not mean Google is broken. If that is what they're doing - no problem, it's thoroughly covered in a couple of patents, and if they're indeed doing it (which is very likely), it's deliberate and very well thought out.
There are a minimum of five people here (aside from no doubt many lurkers) who want to have a serious discussion, specifically about the very serious topics of this thread. Please notice that the Moderator himself started this thread for the express purpse or discussing these specific issues - because they are IMPORTANT.
But we are not being allowed to. We keep trying an trying, and begging to be able to do so - but we are not being allowed to do that because of people "walking on the thread" by interfering with the flow of on-topic discussion by interrupting with *other* topics that thy want to discuss, in total disregard of the actual thread topic, and quite frankly, being quite inconsiderate of other members.
Those other topics need to be discussed in this other, general thread
and if necessary, by starting a new thread of ther own instead of constant intrusions with 30 other issues that are keeping this thread from staying on topic. That is not fair.
What should we do, give up on this thread altogether?
I am also seeing multiple versions of the same page at various rankings at that ip, 45, 260, etc
Exact same page, cache date, title....for what it's worth.
| This 175 message thread spans 6 pages: < < 175 ( 1 2  4 5 6 ) > > |