Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

New Models for Thinking about the Google Algorithm

         

tedster

9:31 pm on Feb 11, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We don't KNOW how Google works internally. We try to understand it as a "black box" - that is, we look at what kinds of data are going in and then we look at the SERPs that come out. Then we build our own theories, our own mental models of what "might" be going on inside that black box to make such-and-such an input give us back such-and-such an output.

The more diverse our data samples, the more we stand a chance of catching some fringe behaviors that require us to revise our mental model. This can be one advantage of a forum community, as well as many kinds of networking.

The challenge here can be that our entire tool-kit for building our black box models may be missing some important or even essential, elements.

For example, many people seem to think of Google as a rather linear score-keeper. The central model here is that Google takes a url, runs it through different tests, and each test then gives that url either some plus points towards its scoring, or some minuses of some kind -- then we add up all the points and the url gets a final score to use in ranking.

This is not a very strong or predictive model these days. In fact, I think Google has moved pretty far away from any internal methods that could be well-modeled by this particular kind of thinking.

It may have been functional for pre-Google search engines -- and even for early Google, perhaps. But today, this kind of thinking creates more and more departure from the effects we see in the real SERPs.

The algo elements now are quite diverse and complex when compared to simple text-matching and yes/no scoring. Have you noticed that even as Google tell us they are moving away from using many penalties, we seem to need MORE "penalties" to describe what we notice?

My own cure for the "common model" involves study of the academic papers and patents, plus close listening to statements from Google reps -- not so much what they say as the word choices they make, the "how they say it". Whatever they are doing (and precise reverse-engineering of the algo is pretty hopeless today) it will color the style of commuication, the internally used jargon and techology inevitably leaving it tell-tale signs.

My main point is that to comprehend today's Google, at least beyond all but the most superficial level, requires some fresh tools and some re-programming of our analysis habits that are now many years behind Mountain View.

How are you all coping with the "new Google"? What kind of thinking have you needed to put on the scrap heap?
Have you ffound something new that really helps?

martinibuster

11:20 pm on Feb 12, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



>>>Have you found something new that really helps?

Articles about brain research. I read something in the Economist, I think, about short term memory loss as you get older, compensated with the ability to spot patterns and make decisions based on that.

Younger people have higher ability to remember more short term but can't apply a matrix of collected experiences over the decision making process, or something similar. The upshot is that the ability to retain so much detail was data gathering, a data push if you will, moving forward to the moment when the brain matured and switched gears and was able to use that data, while discarding the details, to make pattern based decisions.

Maybe it's a coincidence, but some of what I read about brains almost seems to apply to the way Google handles data.

Marcia

4:49 am on Feb 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



short term memory loss as you get older, compensated with the ability to spot patterns and make decisions based on that

Absolutely, I can witness to that!

Spotting patterns is basically right brain processing of left brain data, which is largely on an intuitive level. I'm finding that as more brain-rot sets in, intuitive sensitivity and awareness becomes stronger and quicker.

vincevincevince

5:49 am on Feb 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My suspicion is that the model is not a single algorithm but a network of nonlinear modifiers to an initial score, which may or may not be pagerank.

For example:
---
Site has PR3.5: Score 3.5
SERP Clickthrough shows 10% below average performance, lose 0.3: Score 3.2
Link-footprint analysis shows similarities with 31 other sets of backlinks, lose 0.2: Score 3.0
Hidden text in a small amount, lose 0.4: Score 2.6
3.4 points rank from manually reviewed Authority sites, gain 0.5: Score 3.1
---
Final Score: 3.1

The amount which is gained or lost for a given factor is probably nonlinear in that it will effect those who already have a high score differently than those who start with a low score. Likewise, some things could be negative in small sites but positive in larger ones.

The reason why I think this is probably the case is the immense difficulty of making a single-stage algorithm to handle everything. That makes it much more likely that the new algorithm is always the old algorithm plus adjustments based upon a new factor. That way Google staff members can show their boss a preview of the new results easily, and only one new calculation needs to be undertaken per site to move to the new index.

In terms of analysis, with this model we will observe effects similar to synergy or inhibition when we combine more than one factor as the score modification from one factor will lead to a change in the way that the other factor is interpretted.

tedster

6:03 am on Feb 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>>Have you found something new that really helps?

Some months back now, I was struggling to get a client's site to rank decently on a pretty competitive 2-word phrase. I thought, from previous work with their business, that we had all the right ingredients in place. But still they were ranked down on page 3 or 4 for their trophy phrase.

Finally in frustration I tried an experiment along the line of my recent studies. I took the text from the top ranked sites and analyzed it for 2-word and 3-word phrases.

There were a number of phrases that just jumped out of the analysis at me, because they appeared on so many well ranked pages. None of these phrases used the words in the trophy phrase, but they were all clearly related and "on theme". Some of these phrases even made decent anchor text.

So I worked these new phrases into the index page's content in various ways. I did not remove even one occurance of the target keywords, so even though the keyword density went down (we're talking 4% to 3%) the actual number of occurances of the phrase stayed the same.

In about three days, the page jumped from #36 into the top 5, and it has stayed there - sometimes even at #1. That experience is part of the reason I started this thread.

Oliver Henniges

12:16 pm on Feb 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thx for sharing this experience, tedster. That points towards the patent on phrase-based indexing.

Did you cross-check the phrases with the data provided by google's keyword suggestion tool?

If there was any relationship, I might add a similar story: One of the major kws, I was targetting at, got stuck on page 3 to 5. In order to help it climb, I designed a "dictionary" around this kw, covering about 30-40 pages, the page-names and page-titels of which were taken from that google tool and other similar tools available on the net. Not all phrases were picked up, but most and the most relevant ones. The major kw itself did not really climb in the first place: It seems to me that there was a filter at work, because I additionally had added internal links with "widget" in the anchor text at the bottom of a few hundred related product pages. After two years I took these links away, and now the kw is well placed in the serps.

What stroke me most, is the fact, that all these "dictionary"-pages ranked far better and attracted much more visitors, than I had expected, although their content was relatively thin. For instance, they covered widgets from different manufacturers, which we actually don't sell, and we had a number of requests from customers concerning these. I viewed it as interesting information regarding customer-needs and the long tail, but it turned out that many of these products don't fit in the rest of our shop. So finally, I put google-adsense on some of those pages, so that my customers at least may find a way to a relevant resource.

To sum up: It is indeed an essential ranking factor, that the target-phrase is well embedded into a cloud of semantically related phrases. On the same page, and on the surrounding internal pages. I'd second, that this is probably one of the strongest factors at all at present.

What, exactly, is a phrase in the eyes of a search-enginge? Is it just a sequence of ascii codes, or are syntactic transformations recognized as well?

What tools do you use to identify (relevant) phrases?

trinorthlighting

3:15 pm on Feb 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Want to really look how google groups similar keywords, use the adwords tool. Its very good and even tells you when keywords overlap a bit. Also, it gives you a small chart on search volume as well.

tedster

3:07 pm on Feb 14, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What, exactly, is a phrase in the eyes of a search-enginge?

In the Phrase patents [webmasterworld.com] they are looking for relevance prediction in the phrases

What tools do you use to identify (relevant) phrases?

Nothing special so far - mostly my eyeballs and note-taking. I've only used this approach 4 or 5 times. Since I'm looking for phrases that are really obvious and would be predicitve of the search term involved, it doesn't take much. I'm considering having a special tool programmed to make it easy to include more pages in the analysis, but up to now I don't feel a strong need.

use the adwords tool

That's a good idea, I think. I have used it for other purposes and this would be a natural.

annej

3:32 pm on Feb 14, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In the "Detecting spam documents in a phrase based information retrieval system" patent at one point refers to phrases that are known to be of interest to advertisers.

One very simple thing that I've done is search on Google for the key phrase for my missing pages. Then I notice the Google ads on the right. In one case a phrase that you wouldn't think would be of that much interest is in every ad. The ads are mostly for MFAs. Also ebay, chain stores and such do this. They must have a huge list of words and phrases that will bring up their ads.

You can experiment around and see if it is just the name of the widget or if it has to be kindof and widget. It may not be the phrase you expect so try different things from the page.

Marcia

3:48 pm on Feb 14, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It may not be the phrase you expect so try different things from the page.

Exactly. For example, I've looked through a number of pages that come up for a search for which one page recovered, and without fail there were other phrases on the pages that you wouldn't think of as being related to the "main theme" or topic of the "main" keywords involved. It was the added word in the phrase in question for which they were looking for "information gain."

But to stay on topic about a model for thinking, I think it also pays off to look beyond the obvious, and really, really read what things are saying - and what they're not saying.

Added: Correction, there is one exception, in that case another factor outweighed the "related phrase" element. imho it's a good model for thinking to not get painted into a corner about just one thing.

[edited by: Marcia at 3:51 pm (utc) on Feb. 14, 2007]

thegypsy

3:55 pm on Feb 14, 2007 (gmt 0)

10+ Year Member



You know it is strange.. I take nearly the oposite approach...

I study it - absorb it into pre-existing knowledge and continue on with the everyday work of SEO. I don't dig deeper... I put it to use and let the resulting data speak to me. That's what brought me to PaIR in the first place. Real world experience.

I studied martial arts for more than 20 years. You practice and train so that when it is time, it all happens naturally. This is the same. I study and absorb, knowing it will come out in my daily activities of Search Marketing....

It becomes pervasive in ones thinking... just be 'at one with the concept' and continue on.. he he.... a little SEO Karma for U...

kevsh

5:16 pm on Feb 14, 2007 (gmt 0)

10+ Year Member



The AdWords tip is a good one and has been used by many for this purpose. However, there seems to be some evidence (by actual experience) that Google tracks the results of the keyword tool to help build related terms for it's marketers.

The result can be harmful for your own PPC campaign as phrases you had on the cheap all of the sudden become expensive. Why? Google has now determined that broad matches that turned up in your keyword tool search apply to others listed by competing advertisers - where they didn't exist before. Shortly thereafter, doing a search for that term in G will show up 10 results where there were none, or only a few, before.

Now if Google uses this KT data to help build it's phrase matching system (in the organic database), is another question entirely. If it does ... well, I'll leave it up to the brilliant minds here to scheme how resourceful that could be. (Actually, anyone who has read this far has already thought of it).

tedster

5:22 pm on Feb 14, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm thinking the data might flow the other way -- do the phrase analysis over all sites and then migrate that information into the keyword tool.

jimbeetle

6:21 pm on Feb 14, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How 'bout mix 'n match?

  • Web Pages
  • User searches and refinements (and behaviour?)
  • Advertiser Keyword lists
  • Keyword tool

    <aside>Hmmm, advertiser bids on [this keyphrase]. G's AdWords quality score algo says [this or that predicted phrase] should also appear on the landing page. Neither does, so quality score drops, min bid increases.</aside>

  • Marcia

    6:22 pm on Feb 14, 2007 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I'm thinking the data might flow the other way -- do the phrase analysis over all sites and then migrate that information into the keyword tool.

    I'm thinking the same thing, based on how the taxonomy is constructed, and the way that words traditionally make their way into the dictionary, by reason of usage.

    People have different ways of learning, but I find that digging - exegesis [google.com], if you will, is what opens up understanding (and relating to historical factors).

    It's digging until hearing clearly what's being said, and like in this case, what isn't being said. Like the sound of one hand clapping. ;)

    Oliver Henniges

    6:45 pm on Feb 14, 2007 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Tedster, you left out the important second part of my question:
    > Is it just a sequence of ascii codes, or are syntactic transformations recognized as well?

    To give a quick and dirty example: Would google recognize

    A) the President of the United States
    B) the President of our beloved United States

    as basically the same key-phrase? To my understanding of the cited patent, it would not. However, it should be clear, that such slight abbreviations form standard-phrases, easily (?) detectible after some sort of natural-language-parsing, would even be a far stronger indication for a naturally written page. Particularly in a few months or years, when exploiting the keyphrase-suggestion tool will have become a standard SEO technique for spammy sites;)

    Maybe I'm just splitting hairs, but considering that google presents us this translation tool for dozens of different languages, I believe that it has much more sophisitcated means for a thorrough linguistic analysis of pages, in addition to mere statistical methods.

    > I don't dig deeper...

    that makes sense from a certain leel onwards, but if we all did so, there'd be noone left to absorb from. So personally, I feel the duty to contribute as much as I can a give a little bit back from what I learned here at ww. Reading those patents is no fun for anyone, but some of us have to;)

    I did some research in LSI recently and was quite surprised that some very large databases for word-vector-space-analysis were available on some univerty-sites. I'd speculate that google draws the relevant data from it's own databases, but maybe this phrase-based indexing has become a general field of study in semantic analysis meanwhile, and maybe some such databases also exist. To study these might help us a lot. Any idea how to search for them?

    tedster

    10:37 pm on Feb 14, 2007 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Right now at least, I'd say it is straight ascii phrases -- at least that has been my approach, and it worked for me in several cases last fall. But certainly this area is bound to grow more sophisticated. If anyone has had their content writers under a tight, keyword-driven rein, I think its time to set them free.

    annej

    11:03 pm on Feb 14, 2007 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    To give a quick and dirty example: Would google recognize

    A) the President of the United States
    B) the President of our beloved United States

    I've wondered the same thing. I'm not sure the patents tell it all and it seems like they could look for the phrase words within a longer phrase.

    Marcia

    11:18 pm on Feb 14, 2007 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    A) the President of the United States
    B) the President of our beloved United States

    It could also be two phrases:

    C)the President AND United States

    Any number of topics can relate to each of the two phrases individually:

    President of the United States
    President of Argentina
    President of Microsoft
    President the Alpha Phi Fraternity
    President of the Elbonian Chamber of Commerce

    Even without the stop word "the," President can be a one_word phrase. It's the presence of the two phrases together that indicates a possible contextual setting. And then it can go on from there to use different factors for scoring relevancy for the particular document or how it rates for the factors compared to other documents.

    This 48 message thread spans 2 pages: 48