Welcome to WebmasterWorld Guest from 18.104.22.168
I've been so quiet because MHes has said most of the things I would have said anyway.
Mind me, part of this is theory. I see instances, experience certain patterns and behaviours, then analyze what's in front of me. And the end result is what I call SEO. For the rest of the day at least.
Some points to remember...
As Martin mentioned, the anchor text of links from trusted/locally trusted sites is what decides 98% of what's in the SERPs. Title and body text are criteria to be relevant/filtered, but are thus binary factors. If present, and are matching the incoming anchor, or even the theme of the anchor, the page will rank. Meta is optional.
Title and the content text have two characteristics that are connected to this problem.
One being, that every single word, and monitored phrase gets a scrore. 7 word phrases are not monitored. Monitoring is probably decided based on search volume and advertiser competition, ie. MONEY. So there's no infinite number of them.
Second is, should the page gather enough votes from inbounds / trust or localrank through its navigation for any single word/watched phrase, it passes a threshold that will decide the broad relevance of the page. The page could be relevant for more than one theme. It could be relevant for "Blue Cheese" and "Blue Widgets" if it gets inbounds for both themes. ( Note I'm over simplyfying things, relevance is calculated long before that. ) If it's relevant for "Cheese" Google knows it's probably about "food".
The theme of the page now will make it rank better for certain queries. These aren't necessarilly semantically related. A site that ranks #1 for "Blue Cheese" may rank relatively better for "Azure Cheese" than before, even though this phrase in nowhere in the anchors or titles, and only appears in parts of the content.
If you cross a certain line of on-page factors, another theme might be evident to you, based on the title/content. But if the page does not have any support for that theme in the incoming anchor text, this may be viewed as trying to game the system if Google doesn't understand the relation. "Blue Cheese" IS relevant to "Kitchen Equipment" to some degree. Google might not know this.
Another, blunt example is mixing up "thematic relevancy" with "semantic relevancy", when your "Blue Chese" page starts to have an excessive number of instances of blue things, like "Blue Widgets", "Blue Hotels". Google will think that this is because you have noticed you can rank well for Blue. And tried to add a couple of money terms that are semantically relevant. But what AdWords, Overture or Trends, or in fact Google Search does not show... is that the algo now knows these things are not related.
Question is... to what degree is this filter programmed.
1. If you have N number of kinds of phrases on a page that are only semantically relevant ( ie. as "blue cheese" is relevant to "blue widget" ), and you don't have support for both, your site gets busted. If popular phrases, that you know to be thematically relevant to your page, aren't in the Google database as so, you're busted. Based on the previously mentioned problem, if you have a website that's relevant for modeling, and add internal links with names of wars all over, Google may not find the connection.
2. If you do a search on AdWords for "Blue", you'll get a mostly semantically relevant list of keyphrases that include/are synonyms/include synonims/related to "blue". A human can identify the "sets" within these phrases and subdivide the list into themes. Spam does not do this, or so Google engineers thought.
3. So there are subsets in the hands of Google that further specify which word is related to which. These are themes. You'll see sites rank for synonyms within these sets if they're strong enough on a theme, even without anchor text strenthening the relevance. A site that's #1 for "Blue" might rank #9 for "Azure" without even trying too hard.
4. If you have a site about "Cheese", you can have "Blue Cheese" and even "Blue Cheddar" in the navigation, titles, text, for they are included in the same subset. You can't have "Blue Widgets" on the "Blue Cheese" page.
5. What constitutes these sets? Who decides on themes and based on what? What is the N number of "mistakes", how well determined are these?
But then, so are the SERPs right now. There's at least 4 different kind of ranking I see in the past 3 days.
So far I've only seen instances of filtered pages when 5 to 6 themes collided all at once. Quite easy to do by chance if you have completely legit "partners" or "portfolio" page with descriptions, and/or outbound text links. But only a single theme that's supoorted with the navigation/inbounds, and only if there is a decided theme for the page. If there's no theme ( navigation and - lack of - inbounds doesn't strengthen either ) I'd say Google passes on penalizing.
As for the themes, I was thinking perhaps Google went back to the good old directory age, and started from there. Remember how you started with the broad relenacy, then narrowed it down to a theme, then an even closer match? With cross references where applicable.
This isn't new. Penalties that are based on it are.
If there is such a penalty it is by these lines.
[edited by: tedster at 9:16 pm (utc) on Feb. 27, 2008]
From the algorithmic point of view, the pages which contain laundry lists of "related phrases" (like "city-1 business, city-2 business, city-3 business", where "business is, say, "hotels" or "rentals" or "real estate" or "florists" or ... you know the bunch) are prime candidates for extreme measures. And so aggressively targeting two-word phrases where the set of first words is unrelated, is the best possible way for improving search results overall.
Under any scheme that accomplishes that goal, the niche sites will rise to the top (on their own specific topics); the giant shotgun-keywords-with-ignorant-or-plagiarized-blather sites (whether on one domain or on interlinked domains) will plummet. And the sites that are big enough to have their own content, in depth globally (like hotelnow.com itself) will survive the filter.)
I didn't mean to indicate it was a part of the 950 problem. My theory is too nebulous to say anything like that.
In my theory I'm thinking of sites on different domains and sometimes on different servers. I think Google can see linking patterns that indicate sites might be related in the sense of one person owning them all or interlinking among friends or perhaps a few sites owners decided to help each other out. In some cases the sites may not really be related in this way as in niche topics there can be a lot of natural linking between sites. I'm just suggesting Google may be aware of it in terms of seeing pattern like this. It would be a huge jump to say it was treating all these sites as one in terms of the phrase algos.
This is very speculative so take it with a grain of salt. I don't want to find I've started a new Google myth! ;)
If you cross a certain line of on-page factors, another theme might be evident to you, based on the title/content. But if the page does not have any support for that theme in the incoming anchor text, this may be viewed as trying to game the system if Google doesn't understand the relation.
Now I can see a possible reason I've been losing pages on my hobby topic related to specific wars. Google understandably doesn't see a relationship between the two.
(edit)... I just realised that this would be the exact opposite explanation to Miamacs theory: he's saying "too much on-page optimisation" compared to inbounds, I'm wondering if it's "not enough on-page optimisation" compared to inbounds.
Last week, after a month, we were back. Yesterday was the first monday, a day that use to bring more traffic.
Since our returning Google visits are still 1/4 bellow what it used to be, while in the other hand direct visits keep growing, so we really don't suck (the reason someone posted elsewhere explaining this issue).
At the end I do not complain, we know it could be much worse, but if you're back on the SERPS you should be able to get at least the old numbers/visits... unless something else is going on.
[edited by: Biggus_D at 5:04 pm (utc) on Mar. 13, 2007]
BTW - I've also noticed a crecendo effect. Traffic returns, gets much better, to near record traffic, and the next day plummets.
Seriously, #*$! google. There's some whack A/B testing going on for sure, and it's affecting all of our businesses.
We've yet to try degrading the internal navigation anchor text to rock-bottom but I suppose that's the next thing on the agenda :-(
Maybe this is what Big G wants, so that pages can ONLY be found via search, and not through a site's internal linkages. Certainly, a novel way of getting rid of spam and not without imperialistic overtones (reminds me somewhat of the notion of bringing "democracy" to the world).
What I find galling is the amount of crap that now has top rankings. It looks like the owners of these sites have found a way to game the system already. Quite ironic, really.
Nice post and I've been thinking about what you said for a few days.
If I'm reading you right, you suggest conflicting themes are a big problem, especially if one of them has no supporting links etc. I don't think conflicting themes is a problem. A page could have several high N values for unrelated themes and be perfectly legitimate. An N value is only referred to if one of the themes is seen in a search phrase. If so, the supporting links need to be seen for that N value phrase for the page to rank. If the search phrase has two potential N related phrases it could look up 2 N values and want to see links for both.... but this is where the system could easily break down. A page about dolphins and their ability to play chess may have high N values for the chess theme and animal theme, but the link to the page may be just 'Dolphin'. Are you saying that this page won't rank? That would be a bit harsh and I think there would be so many examples of pages discussing unrelated themes that the 950 would have had a far bigger impact if this was the case. However, there is a neat solution to this... If ANY N value triggered by a search phrase is redeemed, the page ranks as normal. Therefore a page about Blue cheese that also has Blue hotels mentioned won't rank for the search "blue hotels" but will for "blue cheese" because the blue cheese theme is redeemed by links and the blue hotel theme isn't. BUT, a search for "blue cheese and blue hotels" will also be redeemed because of the blue cheese N value being redeemed.
Thus I think conflicting themes is not an issue. The N value for a theme is taken in isolation as per the search phrase.
I agree with you, I think we're just using different terms.
I'd say Google doesn't monitor everything to the same extent. In other words, if a page ranks well for something in the travel sector, that means 10-50x as much trust and link power than if it did so for some generic food phrase.
And once you try to jump from the top of a high pole to another, so that you could stand on two feet, the other will sink as soon as you set your toe on it.
There're quite accurate indications on what phrases are monitored closely, for these you have unusually high trust thresholds to clear before your pages show in the index. ( you've seen it before... showing results 1-97 of 98.000.000 )
That's why I said, this affects authority sites the most, although I have no data on non-authority sites. An authority site would have signals for its theme that are very, very strong.
I've recently done... no make that I'm currenlty doing some tests on to what extent anchor text decides the filtering of pages, that steer away from what they're trusted for. So far I've added a single link with a less generic anchor to the homepage of a quite old site. This move made the homepage jump for the phrase of course, but it also pulled out the subpages from a long time penalty. These subpages have no links to them apart from each other and the homepage, that distributes all incoming parameters.
Seems that while the homepage had very... very high scores for a single theme, the old-timer navigation saying "news" kept that page in the penalty box. The old relevance of the homepage could not be combined with this word. But now, the homepage got a link for something it already had in its title and body, which could be combined with it. And out the page came, with some phrases ranking number one. Nothing competitive though.
I'll be interested in how this turns out all in all, it's interesting to see the algo in motion.
My question to all those affected, has anyone actually managed to escape this penalty? And by escape I mean at least 2 weeks with relatively stable results across all pages?
The amount of tests I have run across my site is mind-numbing, and no matter what I have tried, nothing at all makes a difference. If it was really an on page factor, I really believe that many of us would have recovered. I am really beginning to believe that this penalty has absolutely nothing to do with on page factors. My theory may seem outrageous and is not as technical as others, so forgive me ahead of time.
What if Google is simply trying to regulate the amount of traffic they are directing to a site? If they feel a site is receiving too much traffic (lets say because of a recent data refresh) they will simply penalize a directory to reduce the traffic supplied. Once the traffic to a site is reduced by enough (perhaps below their "natural" threshold), they may let a full directory or two back to the top. This cycle continues to repeat until the algorithm can find a happy medium with what the predefined "natural" growth should be.
I believe that Google is collecting very detailed data of each and every referral they send to our site. If all of the sudden a site in a very competitive field receives more traffic than the standard deviation of what the referral growth should be (for a site with similar PR, inbound links, size, maybe even theme), and this growth is not substantiated by other factors, then its pretty safe for the algo to assume a mistake was made or the site is spamming the algo.
It's pretty well known that there's a penalty for acquiring too many inbound links too quickly, why not a penalty for acquiring Google referrals too quickly?
Like any filter, I think there are many exceptions to the rule. Non-competitive keywords and phrases may be exempt. Pages which receive an increased number of natural links from trusted sites can convince the algo that this "unnatural" referal growth is actually natural.
I think this filter is still in its early stages of development. Eventually I think that instead of seeing your pages jump from top 10 to bottom 50, the jumps will be less and less dramatic with every iteration (similar to the process of guessing a number between 1 and 100). The current version of the filter affects entire directories, I think eventually the algo should be able to limit this to individual pages. Major changes to your site (especially rate of inbound links) will aggravate this filter and may send you into limbo again, while minor ones are natural and accepted.
[edited by: JerryRB at 5:20 am (utc) on Mar. 15, 2007]
Yes, it is a nasty thing, and hard to cure. I've been working on a customers site since December. Several posts by Mhes, AnneJ and Miamacs made a lot of sense with what I've been seeing, and I feel I have a fair idea of what they are trying to accomplish and what "should" help , but no joy yet.
Have ripped it apart and stripped it waaayy down and still haven't cured it. Several phrases are up from the 950 position and rank 50-100, but no competitive phrases have made it back at the top . It's like a glass ceiling somewhere around #50. Anyone else experience that?
We were hit in December and jumped in and out on roughly a 3 day cycle for 6 weeks. The cycles almost became predictable. We have now been OK for 18 days in a row with traffic at normal levels. This could be coincidence, but it was possibly due to a eureeka moment after 3 weeks of head scratching when we realised how we may be being effected. The unknowns are when google do offline analysis but I now hope that has happenned and we have escaped. There are two fixes you can do, one to rank well if you are being filtered by 950 or one to escape being filtered in the first place. For us, and if what we have done is having an effect, it is all about phrases and links.
I think we were lucky because we had pages that always survived with which we could test things. We actually managed to get it penalised to test the theory.
>What if Google is simply trying to regulate the amount of traffic they are directing to a site?
Well they are.... but not the way you think. I don't believe they are monitoring hits over a time period and then deciding that irrespective of relevancy, they will stop some traffic. They are identifying pages that could qualify for a lot of searches and increasing the filter to confirm 'relevancy'. In a way, they are saying that if a page targets many potential phrases, they treat the page with suspicion because there comes a point when if a page covers many 'topics' it's focus is inherantly compromised and it begins to fit the profile of a 'honey pot' page.
My question to all those affected, has anyone actually managed to escape this penalty? And by escape I mean at least 2 weeks with relatively stable results across all pages?
Yes, I have several pages back and they have stayed back.
I can't be certain of what brought them back. I know one contents page was brought back by a good inbound link. But the others I used kind of a scatter gun approach. I tried several things at once like decreasing word/phrase density, changing the page title, and changing or cutting back on links with the related anchor text.
Since then I've added some things back. I don't think it was the titles that were causing the problem. But it appears that cutting back on internal linking did help. So it's worth trying that.
I should say though, I think I'm one of those people who are kind of on the edge of the filter. I lost two directories and a few other scattered pages. I didn't lose most of all of my site like some people did.
It's like a glass ceiling somewhere around #50. Anyone else experience that?
If I get that kind of improvement, I think I might stop making changes for 4 weeks or so. Google's historical tracking of the site may also be filtering the full improvement and only allowing it to return to its full glory by degrees. Further tinkering could actually slow recovery, if done too soon.
I think for any member the way forward is to find searches that still rank top and start from there, even if they are not competitive searches. The reasons they rank at all, will be inversly related to the reasons other searches don't rank for that page. Your analysis has to be extremely detailed and assume a new level of semantic ability.
There is a danger that trying potential fixes that other people mention and have had success with ( I think annej will agree with this) may produce no effect or even a negative effect. This is because the algo is concerned with phrases related to other phrases and it all becomes very specific to your site and its structure.
The other approach to finding a cure is to anticipate what google wants to find, rather than trying to understand how google works. People say "...write for humans not spiders" which is nice if it worked, but spiders will read the page first and then decide if it wants to show a human! Therefore to some degree you have to write for both. Google has employed recently a lot of 'librarian' thinking people. Ask a librarian to find a piece of information by giving them a few keywords and without understanding the topic, they should be able to nail down the information by using 'filing cabinet' navigation techniques. The more folders and sub folders, the more focused the information becomes. I think spiders work this way as well. They want a trail that consistently confirms they are on the right path, with each folder clearly qualifying and leading to the next folder. Specific phrases will increase, related phrases will decrease as you go deeper into the filing cabinet. If on that path, they find a folder that gives them illogical options.... many new 'phrases' or 'unrelated phrases', then they don't know where to go. As you go deeper you should have less folders to choose from but very specific.... we're talking good breadcrumb navigation here. So navigate your site as if you don't know the meaning of the words and want a specific piece of information. By removing possible confusion for a 'librarian' you could by default remove your 950 penalty.
I think this penalty is an on/off switch. I would guess you have escaped the filter.... but trashed your ranking ability in the process. If you have gone the route of diluting normal seo for a specific phrase, then this would be the inevitable result.
From the recent posts by MHes & co, I can see how you can diagnose the problems that the specific page s are having, by running numerous 'test searches' to pinpoint exactly what keywords out of many phrases are tripping the penalty.
What made me really think they were somehow factoring in referrals is that about two weeks ago Google took a directory that had never ranked well, and all of the sudden thrust it and all of its pages into the top of the serps. This directory is the largest one on our site, comprising of nearly 500 pages. Prior to this the directory was ranking relatively normal: some pages in the top 10, others in the top 20, and others in the top 50-100.
Up until that point the 950 penalty for us was incredibly predictable, like clockwork, I knew that when I awoke on any given Monday morning, I could fine at least one or two directories of my site trashed, along with one or two rescued. When a directory gets rescued it usually remains stable for 2 weeks, some 1 week, and others 4-5 weeks, the length seemingly random.
Turning any other directory in our site on/off in the Google results is fairly predictable, but not this one. Our Google referals shot through the roof, literally increasing by more than 300% in a single day. Our traffic hit records that we had never seen even before we fell victim to this penalty.
Like I said before, it's been very customary for our site (I know other sites have a different pattern) with this penalty that Google leave results stable for at least a week, and the average 2 weeks. Google pulled this directory, along with our next highest referral producing directory 3 days later. Coincidence? Maybe, but never has this happened so quickly before, so it got me thinking.
I decided to study a data over time graph of Google referrals and manually plotted the 950 penalty events we experience on that graph. There seems to be a very clear correlation between the number of Google referrals and the problems we are experiencing. Anytime our referrals would spike, directories would be filtered out, anytime our referrals would experience a trough, directories would be filtered in. And those points where referral traffic was steady experienced the least number of 950 changes.
On another note, why not single comment from any of the Google people about this? I have been following this forum for a long time now and when so many sites fall victim to a filter that no one really understands, usually somebody from Google would chime in. So any idea why they are remaining so silent, or did I miss something?
why not single comment from any of the Google people about this?
This is a real change from what Google has done in the past. No Google Guy, no Adam Lassnik and Matt Cutts doesn't seem to be addressing this in his blog.
we're talking good breadcrumb navigation here
I'm wondering if I should switch to breadcrumb navigation. Right now I usually put a link to each of the other articles in a given section. This may be as many as 15. I assumed that people visiting one page would likely be interested in the other pages in a given subtopic. With breadcrumb they have to go back to the contents page and I don't think as many people would do that. But if they see something that interests them right in the navigation they will click and visit the page. It's back to that conflict between usability for visitors or keeping Google rankings. The problem is if people can't find the articles usability doesn't matter.
If Miamacs' theory is correct, wouldn't Google's page analysis statistics be a good tool to check your site? If the words used in the content of your site match the links to your site (i.e. compare left with right column in Webmaster tools page analysis stats), then you should be ok (on the other hand, the tool says nothing about links being weighted).
Mhes may be on to something with the librarian/'filing cabinet' navigation techniques argument. I've noticed that trust/relevance passes "vertically" easier than "horizontally". Put a new page in the same folder as a trusted page and it may go supplemental. If you put it in a folder under the trusted page (with the same linking), then it's less likely to go supplemental.
No Google Guy, no Adam Lassnik and Matt Cutts doesn't seem to be addressing this in his blog.
I tend to think that Google's webmaster groups were created to:
1. relief MC of the stress of stardom
2. undercut WebmasterWorld
MC, for one, doesn't seem to be very happy with WebmasterWorld at the moment.
If the Google SERPs are as bad as we say they are, would Google admit they are testing or having problems? Would that be like saying: use Live.com until we're done?
[edited by: Martin40 at 6:28 pm (utc) on Mar. 15, 2007]
it is all about phrases and links
I was thinking along those lines, but am leaning towards offsite factors at the moment.
For us many sites have been unaffected, the one site that we don't really try to influence ranking as its always seemed to rank based on its content got whacked to 950ish for one phrase, its an active community site, people post ads, write stories and there's an active forum, so why it got whacked back so far for this particular phrase is a real mystery.
I wasn't going to post the details here as there are way too many 'my whitehat site with unique content dropped so many places today' posts in here, but I did notice that most talk about total sites, or pages, this isn't the case with this site so I think that the details may help those that look for patterns and maybe some kind soul may be able to offer some helpful advice based upon our experience.
Site established 2002
No active link gathering or exchanges
100k+ uniques per day, mostly regulars
domain name term1term2 dot co dot uk
title: term1 term2 > Word Word term1(ers instead of ing) Info
The site is about term1, funny thing is it still appears on the first page for term1(ers varient) but not term1(ing varient), also it ranks #1 for term1 term2 with 4 links to main parts of the site below and uses the meta description.
I suspect that others would link back to the site using term1 term2 in the anchor text, as that is the name of the site, what baffles me is the fact that we get bounced back to the end of the serps for term1 but not the term1(ing varient) and also that the site name continues to rank at #1, why?
[edited by: Symbios at 10:12 pm (utc) on Mar. 15, 2007]
I refuse to believe that including the word "news" will get your page penalized. It's impossible to write an article using only related keywords. Now, that would be keyword spamming.
Sorry if I didn't make myself clear. The word "news" by itself would not have penalized the page. This word however was the link to this page, one of the main navigation links, present on every other part of the site, with but the single word "News" in its anchor. And the page had no inbounds to it. The homepage which distributes all parameters ( all inbounds point to that URL, and none had "News" in the anchor ) was relevant for a theme in which the combination with "News" is seen as an entirely different "cabinet". ( As in, the emphasis is on "News", thus a site that's relevant for XYZ and tries for XYZ News becomes suspicious ).
I'm not sure at this point whether it was the links with the word "News" being discounted, or the target page being penalized, but I'd assume it's the links. A single added inbound with which Google could calculate an additional theme for the homepage ( and then the the anchor and the "News" page ) was added, and it pushed the site onto a new playground.
But this test isn't over yet, so... I'll keep you posted.
If Miamacs' theory is correct, wouldn't Google's page analysis statistics be a good tool to check your site?
It is, actually, I've been using it for some time.