Forum Moderators: Robert Charlton & goodroi
I agree, and I'd like to collect people's observations about what "signs of fundamental change" they've may have observed in recent months.
One big one for me is that no matter how common the search term -- it could generate billions of results -- you always seem to bang into an "omitted results" link before your reach #1,000. In fact, i just checked out a search on the word "the" which google says generates 5,300,000,000 results. And even on this monster, #928 is the "omitted results" link. Hmmmm....
Now 5,300,000,000 also seems like a low number to me - unless it does not include any Supplemental Results. So my current assumption is that by fattening up the Supplemental Index, Google has pared down the main index to somewhere in the vicinity of 5-6 billion urls.
A related sign of fundamental change, I feel, is the problems Google currently has generating understandable results for the site: operator or the Webmaster Tools reports. It looks to me like the total web data they've collected is now broken up into far-flung areas of their huge server farm -- making it very difficult to pull comprehensive site-wide information together again.
in the old days, they would rank a page by age, incoming links and all of that kind of stuff -- stuff that remains the same from week to week (okay, the incoming links would go up and down, but the big sites would always have more incoming links than the smaller sites, so relatively speaking it would stay the same). this meant the algo was relatively stable.
but now they are including a lot more things that change from week-to-week, even day-to-day -- gleaned from their millions and millions of cookies and tracking programs. this means that a site with a big advertising budget can affect their position in the ranking a lot quicker than a small site - as they can grab a lot of people in off the street, so to speak -- something that a smaller, academic site cannot do. and google will track all these new visits and assume the site is popular.
[edit- i've just read Bentler's post from the previous page, and he said pretty much the same thing. didn't see it before!]
[edited by: londrum at 9:15 pm (utc) on April 5, 2007]
so it would be best to have the greatest possible negatives and hope that the equation turns out positive? AND THEN some dampening factor is employed?
ie. -100 * -100 * -100* -100 = +1,000,000 x .5dampening = +500,000
vs.
+10 * +10 * +10 * +10 = +10,000 x 1.0(nodampening) = +10,000
--------
Just pop into these threads to inject some logic and prevent the chicken bone throwing that runs rampant.
No, i don't believe some fundamental change is occurring. "Fundamental" implies root or basic scoring change.
If that was happening, these boards would be filled with the usual bloody murder posts... on the contrary, it's relatively quiet.
No, i don't believe some fundamental change is occurring.
All my sites and sites of my clients and sites of my friends clients are ranking exactly the same way and with the same predictability as the past 2+ years.
No, none of my sites, my clients sites, or my friends clients sites, or the 1000s of sites I keep track of, have encountered any "penalties"
Yes, G (as always) tweaks the importance of several minor scoring factors but NOTHING fundamental has changed.
-------
if one is going to imply any type of "negative" factoring, I would like to see the MATH on that, cause it makes little sense for G to employ such a scoring system considering the inherent laws of mathematics. Esp when there are INFINITELY easier ways to form sets and subset equations that don't involve negatives, zeros, etc. (All math "no-nos")
The theme so far is about trying to analyse the how/what/when that create negative outcomes for certain sites... the logic being that if those factors are recognised and understood, then that knowledge can be applied to remedial action.
Are we perhaps not placing enough attention on the opposite... ie... the how/what/when that create positive outcomes for certain sites? When we see sites that demonstrate, and hold, big SERP improvements, are we putting enough time into understanding what is different about those sites that causes Google to favour them? I'm talking about in depth review, not just labelling them as spam because they outrank you.
As a group we are very focussed on perceived filters and penalties but I see very little sustained focus on what creates reward and improvement. Some will say that when we know the reasons for the filters, then improvement will follow automatically.... that would be a big assumption.
Google's interpretation of a good site is well documented and has been spelled it out in their guidelines since day 1. The algo(s) that rank sites are far too complex for us to know with any certainty what they are doing at any given point in time... algo interpretation is always going to be guesswork.
But looking at sites that show good rankings gain would seem to be a controlled process that can be measured against known, published guidelines. In other words, you start measuring the reasons for success instead of trying to guess the reasons for failure.
Perhaps rather a simplistic view but the KISS principle has worked for me for more years that I care to recall.
Are we perhaps not placing enough attention on the opposite... ie... the how/what/when that create positive outcomes for certain sites? When we see sites that demonstrate, and hold, big SERP improvements, are we putting enough time into understanding what is different about those sites that causes Google to favour them? I'm talking about in depth review, not just labelling them as spam because they outrank you.
Exactly. I can never repeat this enough.
So I give my usual piece of advice.
Don't visit "SEO" boards for 3 months and do nothing but study the ranking sites of 50+ terms.
SEO (and esp. the "mysteries" of G) become incredibly easy when one does this.
Original Score=OP+BL+H+PR
...and a kind of SERP could be generated by sorting all the original urls according to these total scores. however, that step isn;t necessary if a re-ranking is going to be applied.
So next, re-rank by measuring various factors over just the preliminary set of urls rather than nusing the entire web -- that saves lots of computing cycles because the more intense "dials" are only applied to a very small sub-set of urls, rather than the entire web. By these tests, generate multipliers (m1, m2, m3, m4) for each component of the original score.
For instance, backlink influence could be modifed by what percent of those backlinks come from within the original set (a LocalRank calculation.) Or on-page scoring might be modified by discovering what percent of related words and phrases occur (phrase-based re-ranking). And so on.
So then you get this, very roughly:
Re-ranking Score=(m1*OP)+(m2*BL)+(m3*H)+(m4*PR)
...and now the final SERP is generated by sorting all the original urls according to these total scores.
The key here is that only the original set of results gets re-ranked. If you make it into the top 890 or whatever size the preliminary set is, then you will not drop out completely. But if you cross the threshhold for one of the second step tests, then you could fall dramatically within that set.
Now my pseudo-equations above are very, very rough - grossly simplified to illustrate the kind of math that I think I see shining through the current fog. It is the kind of math that more than a few Information Retrieval patents point to. For example:
[0223] If the document is included in the SPAM_TABLE,
then the document's relevance score is down weighted by
predetermined factor... [0224] The search result set is then
resorted by relevance score and provided back to the client.Google's recent 'Detecting Spam Documents' patent [webmasterworld.com]
So I see Google wresting with many issues here - spam detection, discerning the end-user's intent, better scalability for faster results, an infrastructure that permits continual updating -- on and on. Some kind of re-ranking seems like a tool that could take care of many factors at once, and it could account for several more fundamental changes we've noticed in the past 6-12 months - including things "breaking" that were not broken before.
Such an approach also would install many tweakable "dials", ranging from the re-ranking tests, to the weighting of the multipliers. But, (sigh) since this is all my guesswork, the real situation could be quite different. But this is the best I have for now.
[edited by: tedster at 3:41 am (utc) on April 6, 2007]
Such an approach would make it easy to grant relative importance to different factors - quality inbound links might be given a big maxium number (say 20), run-of-the mill links a lower number (5), key words something in between, etc, etc. If each factor is calculated seperately, it would be easy to change its importance over time.
With that type of math, penalties (anything under 1) can lower the product really fast. That might explain lots of pages going supplemental. Those types of functions are highly sensitive to the initial assumptions used to create penalties or maximum value, and it would take some tinkering to get things right.
There is an infinite quantity of values between ZERO and ONE.
1 = the top
0.5 = the middle
0 = the bottom
And there are plenty more decimal places to be had.
e.g. - 0.4999999 is one millionth less than 0.5, and it's still between ZERO and ONE.
Why bother with negative numbers?
The algorithm is a living, breathing entity.
In just the past couple of days, I've seen a fair amount of bouncing around on some of our more competitive target phrases. Today, I saw something really interesting - for a phrase that is particularly competitive there were several changes - a different URL on our site, and a change of page 1 SERP results going from a mix of informational and promotional/sales related sites to the first 16 results being purely informational. After that, there are one or two promotional sites, but again a much higher proportion of informational sites.
Anyone else seeing this type of change? I did see where one person posted that they saw the opposite.
There's a HUGE difference between a "penalty" ie negative co-efficient and a fraction.
Don't simply discount the words and their meanings.
It's PRECISELY why people are having difficulties understanding why their sites have tanked.
If you stop thinking in terms of "penalty" and in terms of multipliers between 0.000001 and 1, you start realizing what needs to "enhanced" on your site... instead of what needs to be 'changed', fiddled around with, etc. to "remove a penalty"
When you start with the right fundamentals (pun intended) it's incredibly easy to figure out what's going on with G and your rankings.
Using Tedster's simple but effective equation, one could look at the algo as using a standard or modified bell curve for it's rankings.
The top 1% reflect the top 1000 SERPS... start drifting into the >2% arena and suddenly you find yourself with a -950 "penalty"
Depending on which of the factors had your site ranking at all being given different weight, you also find yourself asking if G has changed their algo. When in fact, your site was only ever ranking mainly due to 1 or 2 factors. Figure out what the 5,6,7 other factors that your site's weak on and then your not dependent on G engineers tweaking any 1 or 2 ranking coefficients.
Why bother with negative numbers?
[webmasterworld.com...]
I admit I haven't read all the relevant threads on 950 effect, etc. Has anyone tried a reverse-engineering perspective in their thinking; i.e., don't ask "Why was my legit site knocked way down," but instead ask, "If I want to filter splogs from my database, how would I go about it?" (Signals of supposed spam, as opposed to signals of quality.)
For our main keyword a site is riding on top with even 2 urls since ages that feature around 50 validation errors each including:
This page is not Valid (no Doctype found)!
In addition that site has a lower PR than we have, that site has never seen any update whilst we do that almost daily.
Also that sites` content is almost a perfect contradiction to the initial search term "Free .... " with only selling some software.
PR, regular updates, validation errors, relevance and other elementary criteria do not seem to have much of a meaning for Google search engeneers in these days.
I would love to have a site to look at that is -950 and completely validates or has less than 10 html errors. I doubt anyone who is in this penalty actually has one.
Please, pulleeze - please stop trying to level accusations against all who are suffering with pages with this phenomenon occurring of being spammers - because they are NOT, and there is absolutely no evidence to back that up.
Pot. Kettle. RTFM
There are 5 patent apps out there indicative of what's been going on. Read them.
Added:
Glass houses.
[edited by: Marcia at 9:43 am (utc) on April 6, 2007]
If you have a site that has not broke guidelines and is -950, sticky it to me to prove me wrong.
Tedster is right on the money, do one little thing wrong, trust rank, page rank, etc.. It will blow the algo up and make you spiral downward.
We know that all human beings are inherently flawed (some more than others), ergo all algorithms are inherently flawed. It simply isn't possible to return "perfect" search results.
Having said that, and provided we can all agree that any and all algorithms are flawed from the get go ... take a look at Wikipedia rankings and try to figure out what it is that allows them to rise to the top almost all the time?
1) The site is incredibly massive.
2) Different authors, different writing styles, different keyword densities.
3) Interlinking all over the bloody place! (Anchor text - very important)
4) Outbound links on just about every page if not every page.
5) It's considered an "authority site" by the mindless masses who link to it.
In the end and no matter how you cut it, inbound links still count a great deal in the Google algo. I am guessing that even Google is somewhat disgusted that Wikipedia ranks as well as it does ... but since their algo is still heavily reliant on "votes", there is little they can do.
Enter "trust rank". How is "trust" determined? My guess is:
1) Any site which does not overtly try to manipulate Google's search results.
2) Inbound links.
I'm sure it may be more complicated than that, but not much more when all is said and done!
So ... if we use Wikipedia as a standard by which we may measure success, then my guess is that if I just keep building my site content without concern for the usual SEO practices, I should be just fine!
For those who find themselves in the -950 category, I can only advise you to:
I started out slowly with just two links to competitors. When I found that those two pages actually jumped up a couples of spots in the rankings, I linked out to a couple more. Yes, they jumped up the rankings as well, but so what? They were still below me sufficiently that I never perceived them as any more of a threat than they always were.
Unless you are number one and your competitor is number two, I wouldn't worry at all about linking to them. In fact, linking to certain competitors can actually help your business if you think about it! (It's who you don't link to that could make all the difference in the world.) ;)
I haven't investigated the -950 thing because I haven't had time or the inclination. Yes, a few of my pages have fallen into the trap ... I think. But I am not worried about it. At some point, I may have to start worrying, but so far, it has not had any (noticeable) impact on my overall business. However, I can tell you that if and when it does impact my business, I will likely just build more new pages of interesting content and forget about those which didn't make the grade. The content is still good and my clients appreciate it, so why worry or change anything?
I also agree with Whitenight that this is not a "fundamental" change at all. It's just another algo tweak. Google will change their algo again and again ... and who knows, maybe those pages will suddenly become stars! Oddly enough, some of the pages which I think have fallen to the -950 trap are doing very well on Yahoo. Page one in all cases. :)
What you lose on the swings, you pick up in the roundabouts! I don't mean to sound flippant to those of you who are suffering from this newest penalty, but over the years, I have discovered that trying to manipulate the algos is just a huge waste of time and energy. Just provide good content and good things will follow.
I agree they are easy to spot, its also easy to spot why they are there though.
I have read through countless hours of patents and as google/yahoo/msn have evolved, we all know the algo gets larger overall. When multiplication is involved, a larger algo means even more room to go up or to go down.
If your in the regular index, your scoring will put you at -950, if your in the supplemental index, the page will more than likely show -hundred thousand.
Are we perhaps not placing enough attention on the opposite... ie... the how/what/when that create positive outcomes for certain sites?
This problem has opened my eyes to the need to improve the navigation and the organization of a site that has been evolving online for 11 years. I feel like my site is better for visitors now and hopefully for Google as well. But it's taken more individual tweaking to get my 950ed pages back and a few are still out.
Wikipedia is a great example of what is doing well. The pages are moving up like crazy, even ones with just a paragraph or so that are marked as stubs. I don't know if Google is giving less weight to sites that address one topic in depth or if its that Wikipedia just gets up there on other factors. About is another one that gets up there based on a massive site
In terms of validated code That couldn't have been a factor in my case as I'd lost just a small fraction of a few hundred pages. The coding is the same on all of them.
I think our time is better spent studying the phrase based patents. Start here with
Detecting spam documents in a phrase based information retrieval system [appft1.uspto.gov]
Example use: allinanchor:blue widgets
(no need for quotes and left align keyword phrase against colon)
You'll see only a few variations from these results compared with those in the actual SERPs.
I have also experienced a multi hundred page drop before in Google SERPs for many phrases where we'd been at or very near the top.
Very kindly lady at G instructed me to remove multiple keyword variations from nav menu links and site bounced right back to top within a week.
To me, it was a strong indication of an LSI filter being triggered. Nav menu had been in place for years and did not repeat same keyword phrases, but definitely had a lot of LSI overlap as most internal section links were using closely related keyword phrases.
Now same site has much higher rankings (above the fold) for multiple highly competitive single word keywords that it didn't have before.
BTW, keyword phrase density was almost always below 2% before the drop and in some cases well under that.
Bottom line: Fear the LSI filter!
Best secondary advice: Get hundreds more one-way links with plentiful variations on your keywords.
How? Press releases, article submission, and plentiful submissions to SEO friendly directories. Offshore manual services will do 500 submissions with five variations for $85 - it's money well spent!
Very kindly lady at G instructed me to remove multiple keyword variations from nav menu links and site bounced right back to top within a week.
Thanks for that, hitsusa. I've helped a few people with that approach but it's the first bare whisper I've heard of the advice coming from a Google source.
Sorry for being dense.. Can someone explain this a little more? I suspect it might be a problem in some of my sites.
Are we talking about something where I have a widget site and on each page have a nav bar that reads
big widgets¦little widgets¦ugly widgets¦green widgets
or even
Widgets:
Big¦little¦ugly¦green
especially if you just have a picture of a widget on each page and there isn't much differential text
If it isn't what I'm thinking, I'd appreciate any enlightenment.
(added)
if I am correct above, then I assume that it would be better to have a main page listing all the widget variations and just have a 'back to widgets' link on each page instead of a navbar listing all the widget types. (end added)
thanks
cg
Despite being directly opposite what the "kindly lady" told you?
Google still likes one way blog spam, but varying anchor text seems at the top of the risk list right now (and follows the years of WebmasterWorld "vary your anchor text" threads).