Signs of Fundamental Change at Google Search

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Signs of Fundamental Change at Google Search

tedster

12:07 am on Apr 5, 2007 (gmt 0)

In the 950 penalty thread [webmasterworld.com], randle recently posted that "...there is a broader picture here, fundamental changes are afoot."

I agree, and I'd like to collect people's observations about what "signs of fundamental change" they've may have observed in recent months.

One big one for me is that no matter how common the search term -- it could generate billions of results -- you always seem to bang into an "omitted results" link before your reach #1,000. In fact, i just checked out a search on the word "the" which google says generates 5,300,000,000 results. And even on this monster, #928 is the "omitted results" link. Hmmmm....

Now 5,300,000,000 also seems like a low number to me - unless it does not include any Supplemental Results. So my current assumption is that by fattening up the Supplemental Index, Google has pared down the main index to somewhere in the vicinity of 5-6 billion urls.

A related sign of fundamental change, I feel, is the problems Google currently has generating understandable results for the site: operator or the Webmaster Tools reports. It looks to me like the total web data they've collected is now broken up into far-flung areas of their huge server farm -- making it very difficult to pull comprehensive site-wide information together again.

trinorthlighting

8:57 pm on Apr 5, 2007 (gmt 0)

I would love to have a site to look at that is -950 and completely validates or has less than 10 html errors. I doubt anyone who is in this penalty actually has one.

londrum

9:09 pm on Apr 5, 2007 (gmt 0)

carrying on from what randle said, about adding 'minus signs into equation'... i think those minus signs aren't really decided by google at all.. at least, not really. they are just factoring in more and more user stats from their tracking programs.
user stats are slowly replacing the old school methods of ranking pages.
the past year-or-so has seen lots of new ways for google to acquire stats from all of our websites. things like visitor trends, time spent on the site etc.
they have introduced google analytics, google sitemaps, got more google toolbars out there etc..

in the old days, they would rank a page by age, incoming links and all of that kind of stuff -- stuff that remains the same from week to week (okay, the incoming links would go up and down, but the big sites would always have more incoming links than the smaller sites, so relatively speaking it would stay the same). this meant the algo was relatively stable.

but now they are including a lot more things that change from week-to-week, even day-to-day -- gleaned from their millions and millions of cookies and tracking programs. this means that a site with a big advertising budget can affect their position in the ranking a lot quicker than a small site - as they can grab a lot of people in off the street, so to speak -- something that a smaller, academic site cannot do. and google will track all these new visits and assume the site is popular.

[edit- i've just read Bentler's post from the previous page, and he said pretty much the same thing. didn't see it before!]

[edited by: londrum at 9:15 pm (utc) on April 5, 2007]

whitenight

9:10 pm on Apr 5, 2007 (gmt 0)

lol perhaps i fell asleep during calculus class, but would someone please explain how the algo could multiply(factor) ANYTHING by a negative number and not blow up the entire algo?

2 * -2 = -4
2 * 2 * -2 = -8
-2 * -2 = 4?!?

ie. why G employees insist there are no such things as penalties.

trinorthlighting

9:56 pm on Apr 5, 2007 (gmt 0)

(2*-2)*(2*-4)
(-4)*(-8)= +32 (Now subtract some sort of dampening factor from the first equation since it was negative)

I can see where tedster is going with it.

whitenight

10:16 pm on Apr 5, 2007 (gmt 0)

lol think about that logically.

so it would be best to have the greatest possible negatives and hope that the equation turns out positive? AND THEN some dampening factor is employed?

ie. -100 * -100 * -100* -100 = +1,000,000 x .5dampening = +500,000

vs.

+10 * +10 * +10 * +10 = +10,000 x 1.0(nodampening) = +10,000

--------
Just pop into these threads to inject some logic and prevent the chicken bone throwing that runs rampant.

No, i don't believe some fundamental change is occurring. "Fundamental" implies root or basic scoring change.
If that was happening, these boards would be filled with the usual bloody murder posts... on the contrary, it's relatively quiet.

No, i don't believe some fundamental change is occurring.
All my sites and sites of my clients and sites of my friends clients are ranking exactly the same way and with the same predictability as the past 2+ years.

No, none of my sites, my clients sites, or my friends clients sites, or the 1000s of sites I keep track of, have encountered any "penalties"

Yes, G (as always) tweaks the importance of several minor scoring factors but NOTHING fundamental has changed.

-------
if one is going to imply any type of "negative" factoring, I would like to see the MATH on that, cause it makes little sense for G to employ such a scoring system considering the inherent laws of mathematics. Esp when there are INFINITELY easier ways to form sets and subset equations that don't involve negatives, zeros, etc. (All math "no-nos")

austtr

11:26 pm on Apr 5, 2007 (gmt 0)

Some great posts in this thread... lots of thought provoking ideas and opinions.

The theme so far is about trying to analyse the how/what/when that create negative outcomes for certain sites... the logic being that if those factors are recognised and understood, then that knowledge can be applied to remedial action.

Are we perhaps not placing enough attention on the opposite... ie... the how/what/when that create positive outcomes for certain sites? When we see sites that demonstrate, and hold, big SERP improvements, are we putting enough time into understanding what is different about those sites that causes Google to favour them? I'm talking about in depth review, not just labelling them as spam because they outrank you.

As a group we are very focussed on perceived filters and penalties but I see very little sustained focus on what creates reward and improvement. Some will say that when we know the reasons for the filters, then improvement will follow automatically.... that would be a big assumption.

Google's interpretation of a good site is well documented and has been spelled it out in their guidelines since day 1. The algo(s) that rank sites are far too complex for us to know with any certainty what they are doing at any given point in time... algo interpretation is always going to be guesswork.

But looking at sites that show good rankings gain would seem to be a controlled process that can be measured against known, published guidelines. In other words, you start measuring the reasons for success instead of trying to guess the reasons for failure.

Perhaps rather a simplistic view but the KISS principle has worked for me for more years that I care to recall.

whitenight

11:34 pm on Apr 5, 2007 (gmt 0)

Are we perhaps not placing enough attention on the opposite... ie... the how/what/when that create positive outcomes for certain sites? When we see sites that demonstrate, and hold, big SERP improvements, are we putting enough time into understanding what is different about those sites that causes Google to favour them? I'm talking about in depth review, not just labelling them as spam because they outrank you.

Exactly. I can never repeat this enough.
So I give my usual piece of advice.
Don't visit "SEO" boards for 3 months and do nothing but study the ranking sites of 50+ terms.
SEO (and esp. the "mysteries" of G) become incredibly easy when one does this.

tedster

12:10 am on Apr 6, 2007 (gmt 0)

Agreed that any multiplier or divider would need to be fractional/decimal and not negative. Let's over-simplify and say that preliminary ranking is determined by some relatively basic combination of scores for on-page factors, backlink influence, history and pagerank.

Original Score=OP+BL+H+PR

...and a kind of SERP could be generated by sorting all the original urls according to these total scores. however, that step isn;t necessary if a re-ranking is going to be applied.

So next, re-rank by measuring various factors over just the preliminary set of urls rather than nusing the entire web -- that saves lots of computing cycles because the more intense "dials" are only applied to a very small sub-set of urls, rather than the entire web. By these tests, generate multipliers (m1, m2, m3, m4) for each component of the original score.

For instance, backlink influence could be modifed by what percent of those backlinks come from within the original set (a LocalRank calculation.) Or on-page scoring might be modified by discovering what percent of related words and phrases occur (phrase-based re-ranking). And so on.

So then you get this, very roughly:

Re-ranking Score=(m1*OP)+(m2*BL)+(m3*H)+(m4*PR)

...and now the final SERP is generated by sorting all the original urls according to these total scores.

The key here is that only the original set of results gets re-ranked. If you make it into the top 890 or whatever size the preliminary set is, then you will not drop out completely. But if you cross the threshhold for one of the second step tests, then you could fall dramatically within that set.

Now my pseudo-equations above are very, very rough - grossly simplified to illustrate the kind of math that I think I see shining through the current fog. It is the kind of math that more than a few Information Retrieval patents point to. For example:

[0223] If the document is included in the SPAM_TABLE,
then the document's relevance score is down weighted by
predetermined factor... [0224] The search result set is then
resorted by relevance score and provided back to the client.
Google's recent 'Detecting Spam Documents' patent [webmasterworld.com]

So I see Google wresting with many issues here - spam detection, discerning the end-user's intent, better scalability for faster results, an infrastructure that permits continual updating -- on and on. Some kind of re-ranking seems like a tool that could take care of many factors at once, and it could account for several more fundamental changes we've noticed in the past 6-12 months - including things "breaking" that were not broken before.

Such an approach also would install many tweakable "dials", ranging from the re-ranking tests, to the weighting of the multipliers. But, (sigh) since this is all my guesswork, the real situation could be quite different. But this is the best I have for now.

[edited by: tedster at 3:41 am (utc) on April 6, 2007]

Keniki

12:22 am on Apr 6, 2007 (gmt 0)

A very interesting discussion. I think for what its worth that google is attempting to establish the original source of content amongst other things right now and there is a hell of alott of collataral damage. I think a DMCA report right now is more valueable than a spam report and I am making moves to protect content via this method. Trust through neighbourhoods great, trust through original content great as well.

nonni

12:36 am on Apr 6, 2007 (gmt 0)

You don't need calculus to see how multiplication could be used to raise or lower scores without having 'negative' scores cancel each other out ... It involves reciprocal fractions. If a scale for evaluating the number of links (or quality of links, or spelling) goes to 100, anything between zero and .9999 will lower the total when multiplied out, 1 has no effect, and the bigger numbers increase things. In fact, compressed between zero and one is the entire magnitude of range between 1 and 100.

Such an approach would make it easy to grant relative importance to different factors - quality inbound links might be given a big maxium number (say 20), run-of-the mill links a lower number (5), key words something in between, etc, etc. If each factor is calculated seperately, it would be easy to change its importance over time.

With that type of math, penalties (anything under 1) can lower the product really fast. That might explain lots of pages going supplemental. Those types of functions are highly sensitive to the initial assumptions used to create penalties or maximum value, and it would take some tinkering to get things right.

lexipixel

1:02 am on Apr 6, 2007 (gmt 0)

I agree with nonni re: the negative values / multiplication / math..

There is an infinite quantity of values between ZERO and ONE.

1 = the top
0.5 = the middle
0 = the bottom

And there are plenty more decimal places to be had.

e.g. - 0.4999999 is one millionth less than 0.5, and it's still between ZERO and ONE.

Why bother with negative numbers?

nonni

1:06 am on Apr 6, 2007 (gmt 0)

I have noticed Weird Harmonics in my traffic lately. The old daily cycles and weekly cycles don't explain it. One weekend is gangbusters, the next is a dud. For a few days, I get a surge of traffic on a few unusual terms, then it drops. Then something else comes along. But the daily graphs for a month have definite cycles -waves permeate the universe of my statistics. Maybe its just a chaotic pattern arising from a thousand different things, maybe not. Looking at the graphs, I would say not.

The algorithm is a living, breathing entity.

Keniki

1:23 am on Apr 6, 2007 (gmt 0)

Its great your working out these numbers, although personally think you are talking absolute nonsense. I think as I said before you are seeing collateral damage from a filter for original content that I think has failed on this occasion but it would be a amasing thing to achieve and I hope they keep going at it.

ebuzzmaster

1:37 am on Apr 6, 2007 (gmt 0)

I follow (natural search) positioning closely for a number of highly competitive phrases. We have a very strong site, having been in the space just about as long as anyone, and have a lot of keywords on which our site appears.

In just the past couple of days, I've seen a fair amount of bouncing around on some of our more competitive target phrases. Today, I saw something really interesting - for a phrase that is particularly competitive there were several changes - a different URL on our site, and a change of page 1 SERP results going from a mix of informational and promotional/sales related sites to the first 16 results being purely informational. After that, there are one or two promotional sites, but again a much higher proportion of informational sites.

Anyone else seeing this type of change? I did see where one person posted that they saw the opposite.

whitenight

2:08 am on Apr 6, 2007 (gmt 0)

ok, a couple of things here.

There's a HUGE difference between a "penalty" ie negative co-efficient and a fraction.

Don't simply discount the words and their meanings.
It's PRECISELY why people are having difficulties understanding why their sites have tanked.

If you stop thinking in terms of "penalty" and in terms of multipliers between 0.000001 and 1, you start realizing what needs to "enhanced" on your site... instead of what needs to be 'changed', fiddled around with, etc. to "remove a penalty"

When you start with the right fundamentals (pun intended) it's incredibly easy to figure out what's going on with G and your rankings.

Using Tedster's simple but effective equation, one could look at the algo as using a standard or modified bell curve for it's rankings.
The top 1% reflect the top 1000 SERPS... start drifting into the >2% arena and suddenly you find yourself with a -950 "penalty"

Depending on which of the factors had your site ranking at all being given different weight, you also find yourself asking if G has changed their algo. When in fact, your site was only ever ranking mainly due to 1 or 2 factors. Figure out what the 5,6,7 other factors that your site's weak on and then your not dependent on G engineers tweaking any 1 or 2 ranking coefficients.

Keniki

2:18 am on Apr 6, 2007 (gmt 0)

Its good numeric versus a human. I tank all figures here and go on gut instinct. Google is searching for original content.........

callivert

2:33 am on Apr 6, 2007 (gmt 0)

Why bother with negative numbers?

because you can impose penalties.
e.g., here's a real simple ranking algorithm.
score = (domain age) + (number of inbound links) +(total number of original words of text) - (number of HTML errors).
add weightings and more variables as you see fit. The minus sign factor (number of HTML errors) constitutes a penalty.
The critical thing is that these equations are pretty easy to generate from existing data, and easier to tweak, and easier to troubleshoot. Fractional multipliers are really hard to generate computationally.

anax

3:23 am on Apr 6, 2007 (gmt 0)

I like tedster's thoughtful interpretation very much. And I think this whole situation is the months-long fallout and collateral damage from Google's attempt to filter out splogs:

[webmasterworld.com...]

I admit I haven't read all the relevant threads on 950 effect, etc. Has anyone tried a reverse-engineering perspective in their thinking; i.e., don't ask "Why was my legit site knocked way down," but instead ask, "If I want to filter splogs from my database, how would I go about it?" (Signals of supposed spam, as opposed to signals of quality.)

night707

8:57 am on Apr 6, 2007 (gmt 0)

Google does not mind even poorest coding quality.

For our main keyword a site is riding on top with even 2 urls since ages that feature around 50 validation errors each including:

This page is not Valid (no Doctype found)!

In addition that site has a lower PR than we have, that site has never seen any update whilst we do that almost daily.

Also that sites` content is almost a perfect contradiction to the initial search term "Free .... " with only selling some software.

PR, regular updates, validation errors, relevance and other elementary criteria do not seem to have much of a meaning for Google search engeneers in these days.

Marcia

9:39 am on Apr 6, 2007 (gmt 0)

trinorthlighting

I would love to have a site to look at that is -950 and completely validates or has less than 10 html errors. I doubt anyone who is in this penalty actually has one.

Validation has absolutely nothing to do with Google rankings, and there's been authoritative documentation and statements made to back that up.

Please, pulleeze - please stop trying to level accusations against all who are suffering with pages with this phenomenon occurring of being spammers - because they are NOT, and there is absolutely no evidence to back that up.

Pot. Kettle. RTFM

There are 5 patent apps out there indicative of what's been going on. Read them.

Added:

Glass houses.

[edited by: Marcia at 9:43 am (utc) on April 6, 2007]

trinorthlighting

11:30 am on Apr 6, 2007 (gmt 0)

I have looked at about twenty -950 sites Marcia! Each one broke webmasters guidelines.

If you have a site that has not broke guidelines and is -950, sticky it to me to prove me wrong.

Tedster is right on the money, do one little thing wrong, trust rank, page rank, etc.. It will blow the algo up and make you spiral downward.

soapystar

11:34 am on Apr 6, 2007 (gmt 0)

im not sure why you would need to request -950 examples..you only have to look at that section of the serps and they are very easy to spot.....

Liane

11:48 am on Apr 6, 2007 (gmt 0)

We all know there are filters and various ways for Google to try to identify web sites which infringe upon their "guidelines". And ... we all know that their guidelines are a moving target that change as the mood suits.

We know that all human beings are inherently flawed (some more than others), ergo all algorithms are inherently flawed. It simply isn't possible to return "perfect" search results.

Having said that, and provided we can all agree that any and all algorithms are flawed from the get go ... take a look at Wikipedia rankings and try to figure out what it is that allows them to rise to the top almost all the time?

1) The site is incredibly massive.
2) Different authors, different writing styles, different keyword densities.
3) Interlinking all over the bloody place! (Anchor text - very important)
4) Outbound links on just about every page if not every page.
5) It's considered an "authority site" by the mindless masses who link to it.

In the end and no matter how you cut it, inbound links still count a great deal in the Google algo. I am guessing that even Google is somewhat disgusted that Wikipedia ranks as well as it does ... but since their algo is still heavily reliant on "votes", there is little they can do.

Enter "trust rank". How is "trust" determined? My guess is:

1) Any site which does not overtly try to manipulate Google's search results.
2) Inbound links.

I'm sure it may be more complicated than that, but not much more when all is said and done!

So ... if we use Wikipedia as a standard by which we may measure success, then my guess is that if I just keep building my site content without concern for the usual SEO practices, I should be just fine!

For those who find themselves in the -950 category, I can only advise you to:

Provide the content people want and you will gain natural inbounds.

If you must try to manipulate the search engines, be more subtle about it. If your pages have fallen into the -950 category, you have obviously tripped an alarm/filter but rather than trying to figure out what that filter is designed to find ... try writing naturally now and again. By all means, tell the search engines what your pages are about, but there's no need to scream. You don't have to hammer them with keywords and other blatant attempts to manipulate search results! Subtlety is the best policy.

Pay particular attention to page titles and your first paragraph. Contrary to what some would have you believe, on page factors do matter a great deal. Your first paragraph is very, very important. (I once read many years ago that the first paragraph should basically be a synopsis of what is on that page). I still believe this holds true today.

Don't be stingy with outbound links to quality content. Link away and stop worrying about giving your competitors a "free ride". Give credit where credit is due. It hasn't hurt my rankings in the least and I have linked to several competitors for quite some time now.

I started out slowly with just two links to competitors. When I found that those two pages actually jumped up a couples of spots in the rankings, I linked out to a couple more. Yes, they jumped up the rankings as well, but so what? They were still below me sufficiently that I never perceived them as any more of a threat than they always were.

Unless you are number one and your competitor is number two, I wouldn't worry at all about linking to them. In fact, linking to certain competitors can actually help your business if you think about it! (It's who you don't link to that could make all the difference in the world.) ;)

Get "creative" on the marketing side. There are hundreds of ways to promote your site/business if you just stop and think about it and put some real work into it. Concentrate on expanding your leads by methods which originate from sources other than search engines. Other sources often end up becoming suprisingly rewarding "link bait".

I haven't investigated the -950 thing because I haven't had time or the inclination. Yes, a few of my pages have fallen into the trap ... I think. But I am not worried about it. At some point, I may have to start worrying, but so far, it has not had any (noticeable) impact on my overall business. However, I can tell you that if and when it does impact my business, I will likely just build more new pages of interesting content and forget about those which didn't make the grade. The content is still good and my clients appreciate it, so why worry or change anything?

I also agree with Whitenight that this is not a "fundamental" change at all. It's just another algo tweak. Google will change their algo again and again ... and who knows, maybe those pages will suddenly become stars! Oddly enough, some of the pages which I think have fallen to the -950 trap are doing very well on Yahoo. Page one in all cases. :)

What you lose on the swings, you pick up in the roundabouts! I don't mean to sound flippant to those of you who are suffering from this newest penalty, but over the years, I have discovered that trying to manipulate the algos is just a huge waste of time and energy. Just provide good content and good things will follow.

trinorthlighting

11:58 am on Apr 6, 2007 (gmt 0)

soapy,

I agree they are easy to spot, its also easy to spot why they are there though.

I have read through countless hours of patents and as google/yahoo/msn have evolved, we all know the algo gets larger overall. When multiplication is involved, a larger algo means even more room to go up or to go down.

If your in the regular index, your scoring will put you at -950, if your in the supplemental index, the page will more than likely show -hundred thousand.

annej

3:33 pm on Apr 6, 2007 (gmt 0)

Are we perhaps not placing enough attention on the opposite... ie... the how/what/when that create positive outcomes for certain sites?

This problem has opened my eyes to the need to improve the navigation and the organization of a site that has been evolving online for 11 years. I feel like my site is better for visitors now and hopefully for Google as well. But it's taken more individual tweaking to get my 950ed pages back and a few are still out.

Wikipedia is a great example of what is doing well. The pages are moving up like crazy, even ones with just a paragraph or so that are marked as stubs. I don't know if Google is giving less weight to sites that address one topic in depth or if its that Wikipedia just gets up there on other factors. About is another one that gets up there based on a massive site

In terms of validated code That couldn't have been a factor in my case as I'd lost just a small fraction of a few hundred pages. The coding is the same on all of them.

I think our time is better spent studying the phrase based patents. Start here with
Detecting spam documents in a phrase based information retrieval system [appft1.uspto.gov]

glengara

3:47 pm on Apr 6, 2007 (gmt 0)

Interesting Q&A in SEOMoz with a bunch of the usual suspects on +/- ranking factors for G...

hitsusa

8:23 pm on Apr 6, 2007 (gmt 0)

Haven't seen anyone mention how closely the SERPs match the results from the search results for "allinanchor:keyword"

Example use: allinanchor:blue widgets

(no need for quotes and left align keyword phrase against colon)

You'll see only a few variations from these results compared with those in the actual SERPs.

I have also experienced a multi hundred page drop before in Google SERPs for many phrases where we'd been at or very near the top.

Very kindly lady at G instructed me to remove multiple keyword variations from nav menu links and site bounced right back to top within a week.

To me, it was a strong indication of an LSI filter being triggered. Nav menu had been in place for years and did not repeat same keyword phrases, but definitely had a lot of LSI overlap as most internal section links were using closely related keyword phrases.

Now same site has much higher rankings (above the fold) for multiple highly competitive single word keywords that it didn't have before.

BTW, keyword phrase density was almost always below 2% before the drop and in some cases well under that.

Bottom line: Fear the LSI filter!

Best secondary advice: Get hundreds more one-way links with plentiful variations on your keywords.

How? Press releases, article submission, and plentiful submissions to SEO friendly directories. Offshore manual services will do 500 submissions with five variations for $85 - it's money well spent!

tedster

9:19 pm on Apr 6, 2007 (gmt 0)

Very kindly lady at G instructed me to remove multiple keyword variations from nav menu links and site bounced right back to top within a week.

Thanks for that, hitsusa. I've helped a few people with that approach but it's the first bare whisper I've heard of the advice coming from a Google source.

cmendla

10:27 pm on Apr 6, 2007 (gmt 0)

Very kindly lady at G instructed me to remove multiple keyword variations from nav menu links and site bounced right back to top within a week.

Thanks for that, hitsusa. I've helped a few people with that approach but it's the first bare whisper I've heard of the advice coming from a Google source.

Sorry for being dense.. Can someone explain this a little more? I suspect it might be a problem in some of my sites.

Are we talking about something where I have a widget site and on each page have a nav bar that reads

big widgets�little widgets�ugly widgets�green widgets

or even
Widgets:
Big�little�ugly�green

especially if you just have a picture of a widget on each page and there isn't much differential text

If it isn't what I'm thinking, I'd appreciate any enlightenment.

(added)

if I am correct above, then I assume that it would be better to have a main page listing all the widget variations and just have a 'back to widgets' link on each page instead of a navbar listing all the widget types. (end added)

thanks

steveb

10:31 pm on Apr 6, 2007 (gmt 0)

"Get hundreds more one-way links with plentiful variations on your keywords"

Despite being directly opposite what the "kindly lady" told you?

Google still likes one way blog spam, but varying anchor text seems at the top of the risk list right now (and follows the years of WebmasterWorld "vary your anchor text" threads).

This 138 message thread spans 5 pages: 138