| This 175 message thread spans 6 pages: < < 175 ( 1 2 3 4  6 ) > > || |
|Google's 950 Penalty (part 4) - or is it Phrase Based Re-ranking?|
< continued from [webmasterworld.com...] >
< related threads: -950 Quick Summary [webmasterworld.com] -- -950 Part One [webmasterworld.com] >
Something is now becoming clear, and I think it's time we put away the name "950 Penalty". The first people to notice this were the heaviest hit, and 950 was a descriptive name for those instances.
But thanks to the community here, the many examples shared in our monthly "Google SERP Changes" threads and in the "950 Penalty" threads themelves, we now can see a clearer pattern. The demotion can be by almost any amount, small or large -- or it even might mean removal from the SERP altogether.
It's not exactly an "OOP" and it's not the "End of Results" penalty. From the examples I've seen, it's definitely not an "MSSA Penalty" -- as humorous as that idea is. (Please use Google to find that acronym's definition.)
It's also not just a Local Rank pheomenon, although there are defiitely some similarities. What it seems to be is some kind of "Phrase-Based Reranking" - possibly related (we're still probing here) to the Spam Detection Patent [webmasterworld.com] invented by Googler Anna Lynn Patterson.
So let's continue scrutinzing this new critter - we may not yet have it nailed, but I'm pretty sure we're closer. The discussion continues:
[edited by: tedster at 9:18 pm (utc) on Feb. 27, 2008]
The site I've been working with that was hit in Dec (for certain phrases only), was a -950. No changes were made to the pages, or backlinks, or anything. Rankings for those particular phrases recovered somewhat in January and now fluctuate between #60-700.
Can't speak for any others, but in this case it seems the phrase based penalty and -950 are the same thing...
ALbino, that's the point. A discussion about the 950 penalty was hijacked, and a confusing and destructive discussion ensued mixing comments about completely different topics. The 950 penalty is an end of results penalty, not some armwaving "it can be any drop in results".
Some people insist on blaming Nixon for everything. There is not just one boogeyman responsible for all Google's ills. They do lots of things right and wrong, but it is a bad idea to pretend everything is the same, when it is plainly obvious that some ways pages are effected are completely different than others.
Another way to say it that Google could pin a page at #950 for any of a hundred reasons. It's foolish to think that EVERY result pinned around #950 is there for the exact same reason.
(Page 1 post one of the thread is here [webmasterworld.com...] )
Something I've noticed is that on 18.104.22.168 I'm seeing sites I know to be hit with the -950 penalty appearing at various positions such as 185, 290, 487, etc.
Steveb, I was just re-reading your comments in that first thread...
|Yeah it doesn't have to be 950, although that is the easiest way to describe it... a few dozen spots up from the last result desplayed. Sometimes the pages can do better, like for searches for the eight words in the title a page might turn up in to 400s. |
The penalty seems most often directory-wide. I had one directory recover yesterday, 54 pages now all back to number one or two, while another directory gets hit, with 152 pages dropping from the the top ten to 950. Heh, unfortunately I saw the good news first, then checked and saw the bad news a minute later...
|Right now it is even weirder than usual. A LOT of pages are penalized for specific keywords, but rank fine for others. It doesn't seem to reflect anything I can make sense for -- like penalized for "word1 word2", ranking in top five for "word2 word1", when previously top five for both. |
Here it sounds like you've noticed that the effect is attached to the search phrase itself, and does not occur for every search phrase that returns a given url, or any url from some given directory. If I've misunderstood, please clarify.
|a confusing and destructive discussion ensued mixing comments about completely different topics. |
Diverging reports began even in that very first threand, which was weeks before I split this one off. In fact, it was the combined strength of those somewhat similar (but still divergent) reports that moved me to look harder at this thing, and eventually to change the subject line.
You remember the "Sandbox' debacle? The name itself obscured useful discussion and understanding for a long time. My concern is that we are doing the same thing: that by using a name like "950 penalty" we make it sound like we know there is such a thing and we know exactly what's going on, when we don't -- we just created a name.
We already have a situation with Google today where the combined action of various filters gets called a penalty, when it isn't one at all. So people now get very confused by imprecise language -- "Google tells me I have no penalty, so why did my rankings drop so much?" ...or... "Do I have a 'penalty' for duplicate content" etc.
Gratefully, I have no client site and no personal site (for now) that is showing these signs. That's why I worked to clarify the reports beginning with very first thread. But after a while, I had seen enough examples to have the lightbulb go off -- this could all be explained without any resort to the "penalty" model. After all, Google has been saying more and more that true penalties are becoming rarer. They are moving to a different approach.
Wherever this discussion goes from here, to be maximally useful (which is my goal) it needs to be rigorous:
1. Gather enough information to create a relatively complete description of the effect.
The community here is an awesome asset for this step.
2. Generate some theories that can account for what we see.
This step might involve separating out some reports as describing a different effect. But it also might involve discovering that a certain theory might also explain other puzzling effects.
3. No theory is any good at all until it is predictive - and allows us to change the effects we are seeing by taking new actions.
My concern is that by adapting a given label, we may tend to avoid the required rigor - and give the appearance that we've got something nailed down when we really don't. We can mislead others this way, and we can even mislead ourselves.
Google today is definitely complex - to a big degree it is an impenetrable "blackbox" effect that we struggle to comprehend. But they do throw us some light. In the case of these previously top rankings that now are extremely depressed, I found that the recent patents shed some real light, and that light even went into the predictive stage - some rankings were actively being recovered through certain actions.
That's really worth a hard look, IMO. And these patents are hard work to comprehend -- even in their friendlier language form. But 5 patents are pretty hard to dismiss as mere FUD.
We've all noticed that Google became a lot harder to read. I think it's time for the serious SEO to learn how phrase-based reordering can affect a site's ranking and traffic. I am certain it's a better metal model for many things we notice than the penalty model of an older day.
None of our sites were hit at all either, I have a feeling its all due to various filters that are out there or a combination of two or three filters.
It would be very interesting if one of the webmasters under this penalty pulled some webstats for the urls on their sites getting hit over the past few months to see how traffic was, bounce rates and if the urls were even being viewed by people.
"If I've misunderstood, please clarify"
It happened for a couple hours. Nothing to notice about that really except that the data refresh seems to have rolled out gradually at least that day.
"Diverging reports began even in that very first threand, which was weeks before I split this one off."
Exactly. There are different things going on here. Ignoring that is a very bad idea.
"we make it sound like we know there is such a thing and we know exactly what's going on"
Well we obviously know there is such a thing, and just as obviously nobody knows exactly why. Which again is why hijacking the thread into some general philosophsizing about algo influences is not helpful.
"So people now get very confused by imprecise language..."
So why did you muck it all up? We were talking about something precise. Turning it into armwaving is the wrong directions. Don't kill specific threads to talk about generalities that impact different things many different ways. That can be talked about too, but this thread has people saying "it" effects URLs, while others say "it" effects phrases when "it" is not the same topic.
"Gather enough information to create a relatively complete description of the effect."
No, the description of the effect is obvious so please don't try and again muck it up... last page of the results, penalized niche authority sites grouped with penalized (high algo scoring)spam.
You are instead looking at the various branches, and trying to build a tree when the tree is obvious. You just have to go to the last 100 results to see it.
"But 5 patents are pretty hard to dismiss as mere FUD."
Of course they aren't FUD, and they can be discussed, but its silly to say they are the only thing to discuss and that they are responsible for everything in the exact same way.
"I think it's time for the serious SEO to learn how phrase-based reordering can affect a site's ranking and traffic."
So start a thread on that. Don't hijack threads on other valid topics.
"I am certain it's a better metal model for many things we notice than the penalty model of an older day."
Well, you can be certain of whatever you want, but the 950 effect is an obvious penalty. Disputing that is absurd, and I think leads to the mangled discussion here.
Most things people talk about as penalties are not. This though plainly is one. High scoring pages (authority and spam) are not filtered, they are not given various scoring demerits, they are all penalized to a *precise* place in the results.
So again, discussions of reranking have their place, but so do ones on a clear penalty. Webmasterworld can have more than one thread.
Call it whatever you want but it doesn't make any sense.
Last month our site dropped to the end of results, then we did... NOTHING.
About 4 days later everything was fine again, until today when we dropped again.
So I have 2 theories:
- Google thinks that our site sucks, but just sometimes, not all the time
- Google is just rotten
What will we do? Nothing but curse G.
As far as I'm concerned, this thread focuses on "one aspect" of what's being called the 950+ (sure there could be a better name, let's hear one), though it's an important one. But that is not in any way related to sites being #8 or #9 and then going to #40 and then down to #22. So? What's new about everflux [webmasterworld.com], as MC calls it, except that now it's constant?
Furthermore, if anyone says that a filter can't knock a page clear out to the outer edge of the ballpark, then that party needs to start a new thread and fully explain the difference to everyone between penalties and filters, and provide some fairly authoritative background to support their allegations.
It's my contention that if a dup or near-dup filter can knock pages clear out of the ballpark at query time, then it's safe to assume that a phrase based filter can do likewise.
Easy, just take a millisecond to check the appropriate "blacklist" for the presence of DocID at query time, and if a page is found referenced for the phrase, it's:
You! Go to the back of the SERPs. NO SOUP FOR YOU!
Just so I'm clear Marcia, are you of the opinion that if Page A goes from #4 to #578 that it's a different phenomenon (filter/penalty/whatever) than if Page B goes from #4 to #999 (AKA "near the bottom of the results")? In other words, are there two penalties:
1) 200-949 penalty
2) 950+ penalty
I'm not trying to get into a semantics argument or anything, just trying to determine if these are actually different problems, and #2 isn't just an extreme version of #1.
ALbino - If I may step in here.... If page A gets sent to the 950 position this may have a knock on effect on page B, because page A was needed to help the ranking of page B. The effect could cause page B to rank 50 (or whatever) positions lower without the, for example, linking value of page A being fully counted.
Interesting. For a highly competitive phrase, here's a group of 7 of my pages clustered one after the other from 910 to 916, all from the same directory. But that's only 7 out of 50+ in that directory.
MHes, I definitely see where you're going with that, and it may be true in some cases. In my case the pages are all on the same tier or depth on the site, and don't interlink with each other. I don't know what others experience is, however, so I can't speak for them.
|"I think it's time for the serious SEO to learn how phrase-based reordering can affect a site's ranking and traffic." |
I think this phrase-based craze came into being because many (if not all) recent drops in ranking are due to a reduction in PR of the homepage. However, if a page ranks No 1 on a certain phrase then it will in general retain that ranking due to factors that override PR in the alg, such as user-behavior. If a page is No 1 on a key-phrase, then due to the high volume of traffic, Google has enough data to determine a page's actual popularity on that key-phrase and that's a factor that nullifies the influence of PR.
If a page was No 1 a dropped nevertheless, then it's No 1 position would in part have been due to PR.
If internal pages sort of maintain their position (such as dropping a few place, but not out the top 10), then they will probably have their own direct incoming links.
So yes, the current droppings are "phrase-related", but not phrase-based, it's just a reduction in PR that affects some key-phrases, but not all, for rather obvious reasons.
I don't want to highjack this thread, I just think barking up the wrong tree doesn't do this forum justice.
Isn't anybody interested in the question why some sites' PR has been reduced? Reading what Adam Lasnik has to says about it, it seem to me that it's not penalty. Google certainly hasn't become ultra-strict all of a sudden. Don go hog-wild was Adam's message.
So what's really happening here? I think it has been happening for many years, around Dec, Jan, but never so extreme and never for so long. But having said that, the phenomenon itself is not new and SEO-specialists should take into account this longer-term phenomemon. Most of the sites that get hit, rebound, and better than before.
In fact, this thread isn't going anywhere, AT ALL.
[edited by: Martin40 at 5:28 pm (utc) on Feb. 10, 2007]
I think this phrase-based craze came into being
I didn't know it was a craze? I have been writing about Phrase Based Indexing and Retrieval methods and Google's interest therin, since last fall.
I do agree that more 'serious' SEOs need to start looking into it more though.
Wow. Some of us have experienced highly ranked pages dropping to the end, or very deep in serps, for specific phrase searches - yet the pages still rank high for other phrases.
No change in PR (none have mentioned it), or bad html, could explain the very specifc nature of what many here have seen and reported. It does exist.
Others want to discuss a penalty that seems to be site wide and sends pages to the end of serps. I personally think they are related and can offer some more observations to suport that, but really don't care. I just want to know where we can discuss symptoms and possible causes of massive drops in serps for specific phrases. Is this the place, or do we have to use another thread?
|Others want to discuss a penalty that seems to be site wide and sends pages to the end of serps. |
That wouldn't be the phrase related penalty as it seems to be more page directed. One page may drop to the bottom while other seemingly similar pages on the site still do fine.
|I just want to know where we can discuss symptoms and possible causes of massive drops in serps for specific phrases. Is this the place, or do we have to use another thread? |
Have you gone through all the 950 threads? It's a pain but there are gems in there.
Meanwhile there are things you can do that may or may not be related but will strengthen your site.
1 -Try to get good inbound links to deep pages especially the missing ones. Not that easy I know. I tell someone about a page and they link to the site's homepage.
2- Do what you can to decrease word density. This will make the page look less spammy to Google and may eliminate some problem phrases in the process. I write very specific articles and in writing I've repeated the key words frequently. I've gone back and tried to use a more general terms like 'they' or 'it' instead of the actual object. The problem is in doing that and still having the article make sense. But I have been able to eliminate some.
3- Look at your internal link anchor text. Are you linking to a given page a great many times with the same anchor text? You can't do much about inbound links but you can control your internal links. You may be linking with a problem phrase or it may be just too much repetition of key words in anchor text
4- Read the phrases patent [appft1.uspto.gov] so you can see for yourself if your problems might be related to it.
Seems this thread has now also splintered into what the thread is actually about!
I fear this thread is losing its momentum now but I thought I'd just ask/suggest if anyone affected has investigated what clues Google Suggest might proffer?
I just tried Google Suggest and it suggested 4 other related pages on my site but not the most relevant and complete article on it which is the missing page. The one that has been in the top 5 for years.
Can you explain what I'm supposed to be learning from that?
|I just tried Google Suggest and it suggested 4 other related pages on my site but not the most relevant and complete article on it which is the missing page. The one that has been in the top 5 for years. |
Can you explain what I'm supposed to be learning from that?
I couldn't possibly infer anything from what you wrote, and I know you would break TOS if you elaborated.
I offered the suggestion as a "debugging" and/or discovery tool. It's prolly the closest thing G gives us for the topic being discussed (phrase).
I'm not affected by this but I think I would be looking at the subset of phrases it suggests as you type, finding where you rank (or not) for each of those suggestions, and looking at the logs to see if any of them provide traffic together with the page content.
As I said a page or two ago, it could be, for whatever reason, that the page is suddenly finding itself more relevant for a lower "trafficked" phrase, which you just haven't seen yet (in the serps).
... rather like when pushing your finger into a balloon, you're not sure where the balloon will protude in response, if you see what I mean.
|I didn't know it was a craze? I have been writing about Phrase Based Indexing and Retrieval methods and Google's interest therin, since last fall. |
It became a craze when some people started to use it to explain this winter's "update", or whatever it is.
|No change in PR (none have mentioned it), |
I don't assume you think it would be visible in the toolbar.
|No change in PR (none have mentioned it), or bad html, could explain the very specifc nature of what many here have seen and reported. |
If PR were reduced to zero, wouldn't that send pages to 950?
|Others want to discuss a penalty that seems to be site wide and sends pages to the end of serps. I personally think they are related and can offer some more observations to suport that, but really don't care. I just want to know where we can discuss symptoms and possible causes of massive drops in serps for specific phrases. Is this the place, or do we have to use another thread? |
I don't want to spoil the party, if there is one. Maybe I'm the one that has to look for another thread. A thread about the truth about Google's winter updates. Isn't anyone here interested in that?
A page can rank equally for two different but related keyword phrases, one being a bit more competitive than the other. In fact said page can even move up in spite of a reduction in its PageRank.
A change in PageRank may cause some minor shuffing around, which happens all the time for any number of reasons, but there is no way a page gets sent to the very end of the results set simply because of having a "normally" lowered PR.
Ya, if *all* the IBLs are from purchased and sitewides those suddenly get de-valued and the site/page loses all its PR, or maybe goes internally (within Google's real PR, not TBPR), from PR5 to PR1 or PR0, then that's a different issue entirely and isn't related to the kind of massive drops being reported.
Thinking that PR is the be-all and end-all is very 2002 thinking - this is 2007.
This seems to be unrelated to PR. In fact hundreds of pages with less and even 0 PR are ahead of my missing pages.
I see what you mean TWP. I was looking at the results not the suggestions. I tried it again and checked out some of the suggested searches. They tend to bring up related pages on my site, ones that are doing fine.
But looking at the results, especially the number of results for a given phrase could give us an idea of what phrases might be the problem.
I know some folks think there must be something spammy about these sites but the pages I lost have ranked well for years. The articles are well researched an I have references to books and papers at the end of each article. This stuff is not spam.
it seems to me the re-ranking or filtering is really only afecting the sites from page one or two of the serps. This seems to lead to the strange and seemingly common factor of your page dropping miles behind the pages that are scraping and using your exact text. Also as you delve into pages 3, 4 5 and onwards the results are so full of spam and scrapers you have to assume filtering is not occuring at this level. I cannot otherwise understand how it is becoming the norm to rank behind your own scraped content.
Here is a theory I have. Scrapers seem to scrap from the top pages on any given search. Scrapers usually use the title of pages in their links. If Google is looking at inbound anchor text there could be an increase of problem phrases. This would just be one factor of many as I found a missing page that was so obscure it was not highly scraped. Also I have many pages doing fine that are highly scraped. Perhaps they don't have any of these problem phrases in them.
> I cannot otherwise understand how it is becoming the norm to rank behind your own scraped content.
Perhaps the filter is only applied to the top 20 sites. Why waste cpu on results that are hardly looked at.
We've been around for 6 years and have thousands of scraped links as you describe. I don't think they matter and when we were 'out' they appeared to have no effect on some of our pages that ranked well for a competitve phrase but 950+ for another phrase (often less competitive and unrelated to the scraper link text etc.). As a general rule, I don't think links pointing to your site can harm you and especially not in a 950+ way. I really believe this is a different issue.
Perhaps the filter is only applied to the top 20 sites. Why waste cpu on results that are hardly looked at.
Thank you, that was what I suggested awhile back as a possibility and no one jumped in with any observation. I know my situation doesn't apply to all (or even most), but I was hit with the filter *after* I hit the top 20 in my topic.
Was it coincidence, possibly?
That's why I was curious if anyone else had a similar observation - that only sites/pages with top-ranked keywords were hit?
when i say scraper i mean copy and paste affiliate type sites too, not just mfa sites. If you have written your own widget pages you will find your text used over and over on other widget sites who are churning out volumes of sites. They dont link back to you but but to other sites in their large networks or to an affiliate program. These sites are packed into the 30-100 ranges way ahead of the sites who actually wrote the content.
|As a general rule, I don't think links pointing to your site can harm you and especially not in a 950+ way. I really believe this is a different issue. |
I hope you are right because if my theory is correct then so much is out of our control. I like to feel like there is something I can do to correct the problem.
|that was what I suggested awhile back as a possibility and no one jumped in with any observation |
Because this thread has been so busy I think people have missed a lot of good suggestions. It's worth repeating them.
Mine are all pages that were in the top 20 but my article pages are never more than 3 deep so they are generally up there in the serps.
We really need to hear from people if they have lost pages that were not ranked in the top 20. That would be good information to have. We'll see if the bold gets attention. ;)
I just noticed something with regard to the cache that google has of a particular page.
Say it was crawled on the 7th of Feb and added to the Google cache. Say you added a picture on the 10th of February.
Now, how could a pic, added on the 10th of Feb be part of the cache of page done on the 7th of Feb.
Basically, the Google cache is not a cache guys, it's a live download of your page and popped into the browser. It means bugger all so please stop talking about the google cache when analysing changes.
Right, one less thing to worry about or should I renew my domain name for the coming 30 years?
There's a lot to be said about the scraper theory. I manage a fairly significant number of sites for my company and our partners (mostly in hypercompetitive industries); as such I have the misfortune of encountering a lot of the funky penalties that crop up.
I've been lurking on the 950 penalty threads because I wasn't entirely sure what was going on, given that multiple factors do seem to be in play, but I have found some commonality with my own experiences.
1. Phrase-based penalties & URL-based penalties; I'm seeing both.
2. On phrase-based penalties, I can look at the allinanchor: for the that KW phrase, find several *.blogspot.com sites, run a copyscape on the site with the phrase-based penalty, and will see these same *.blogspot.com sites listed...scraping my and some of my competitors' content.
3. On URL-based penalties allinanchor: is useless because it seems to practically dump the entire site down to the dregs of the SERPs. Copyscape will still show a large amount of *.blogspot.com scraping though.
Getting rid of scrapers is a thousand page thread in and of itself, but what I've been doing so far is a mixture of modifying titles, slightly modifying on-page text, getting some new links that match the new title, and where possible, turning in the *.blogspot.com junk as spam on both the blogger and G spam report side.
Normally scrapers wouldn't be a huge problem, but with Google continually tweaking their authority knob, those *.blogspot.com are becoming instant authorities, which is really, really bad. That has to stop as of last year. I don't have an answer as to why sometimes the penalty is phrase-based and why it is sometimes URL based, but I can say that I've seen them alternate on the same domain, I've seen just the phrase-based issue occur and resolve itself, and I've seen the URL-based issue occur and resolve itself.
Confusing isn't it?
So that's my vote...false authority scrapers that are causing temporary filtering as Google attempts to determine which is the more valid source, rectified by modification of both on-page and off-page tactics.
| This 175 message thread spans 6 pages: < < 175 ( 1 2 3 4  6 ) > > |