Forum Moderators: Robert Charlton & goodroi
The more diverse our data samples, the more we stand a chance of catching some fringe behaviors that require us to revise our mental model. This can be one advantage of a forum community, as well as many kinds of networking.
The challenge here can be that our entire tool-kit for building our black box models may be missing some important or even essential, elements.
For example, many people seem to think of Google as a rather linear score-keeper. The central model here is that Google takes a url, runs it through different tests, and each test then gives that url either some plus points towards its scoring, or some minuses of some kind -- then we add up all the points and the url gets a final score to use in ranking.
This is not a very strong or predictive model these days. In fact, I think Google has moved pretty far away from any internal methods that could be well-modeled by this particular kind of thinking.
It may have been functional for pre-Google search engines -- and even for early Google, perhaps. But today, this kind of thinking creates more and more departure from the effects we see in the real SERPs.
The algo elements now are quite diverse and complex when compared to simple text-matching and yes/no scoring. Have you noticed that even as Google tell us they are moving away from using many penalties, we seem to need MORE "penalties" to describe what we notice?
My own cure for the "common model" involves study of the academic papers and patents, plus close listening to statements from Google reps -- not so much what they say as the word choices they make, the "how they say it". Whatever they are doing (and precise reverse-engineering of the algo is pretty hopeless today) it will color the style of commuication, the internally used jargon and techology inevitably leaving it tell-tale signs.
My main point is that to comprehend today's Google, at least beyond all but the most superficial level, requires some fresh tools and some re-programming of our analysis habits that are now many years behind Mountain View.
How are you all coping with the "new Google"? What kind of thinking have you needed to put on the scrap heap?
Have you ffound something new that really helps?
Articles about brain research. I read something in the Economist, I think, about short term memory loss as you get older, compensated with the ability to spot patterns and make decisions based on that.
Younger people have higher ability to remember more short term but can't apply a matrix of collected experiences over the decision making process, or something similar. The upshot is that the ability to retain so much detail was data gathering, a data push if you will, moving forward to the moment when the brain matured and switched gears and was able to use that data, while discarding the details, to make pattern based decisions.
Maybe it's a coincidence, but some of what I read about brains almost seems to apply to the way Google handles data.
short term memory loss as you get older, compensated with the ability to spot patterns and make decisions based on that
Spotting patterns is basically right brain processing of left brain data, which is largely on an intuitive level. I'm finding that as more brain-rot sets in, intuitive sensitivity and awareness becomes stronger and quicker.
For example:
---
Site has PR3.5: Score 3.5
SERP Clickthrough shows 10% below average performance, lose 0.3: Score 3.2
Link-footprint analysis shows similarities with 31 other sets of backlinks, lose 0.2: Score 3.0
Hidden text in a small amount, lose 0.4: Score 2.6
3.4 points rank from manually reviewed Authority sites, gain 0.5: Score 3.1
---
Final Score: 3.1
The amount which is gained or lost for a given factor is probably nonlinear in that it will effect those who already have a high score differently than those who start with a low score. Likewise, some things could be negative in small sites but positive in larger ones.
The reason why I think this is probably the case is the immense difficulty of making a single-stage algorithm to handle everything. That makes it much more likely that the new algorithm is always the old algorithm plus adjustments based upon a new factor. That way Google staff members can show their boss a preview of the new results easily, and only one new calculation needs to be undertaken per site to move to the new index.
In terms of analysis, with this model we will observe effects similar to synergy or inhibition when we combine more than one factor as the score modification from one factor will lead to a change in the way that the other factor is interpretted.
Some months back now, I was struggling to get a client's site to rank decently on a pretty competitive 2-word phrase. I thought, from previous work with their business, that we had all the right ingredients in place. But still they were ranked down on page 3 or 4 for their trophy phrase.
Finally in frustration I tried an experiment along the line of my recent studies. I took the text from the top ranked sites and analyzed it for 2-word and 3-word phrases.
There were a number of phrases that just jumped out of the analysis at me, because they appeared on so many well ranked pages. None of these phrases used the words in the trophy phrase, but they were all clearly related and "on theme". Some of these phrases even made decent anchor text.
So I worked these new phrases into the index page's content in various ways. I did not remove even one occurance of the target keywords, so even though the keyword density went down (we're talking 4% to 3%) the actual number of occurances of the phrase stayed the same.
In about three days, the page jumped from #36 into the top 5, and it has stayed there - sometimes even at #1. That experience is part of the reason I started this thread.
Did you cross-check the phrases with the data provided by google's keyword suggestion tool?
If there was any relationship, I might add a similar story: One of the major kws, I was targetting at, got stuck on page 3 to 5. In order to help it climb, I designed a "dictionary" around this kw, covering about 30-40 pages, the page-names and page-titels of which were taken from that google tool and other similar tools available on the net. Not all phrases were picked up, but most and the most relevant ones. The major kw itself did not really climb in the first place: It seems to me that there was a filter at work, because I additionally had added internal links with "widget" in the anchor text at the bottom of a few hundred related product pages. After two years I took these links away, and now the kw is well placed in the serps.
What stroke me most, is the fact, that all these "dictionary"-pages ranked far better and attracted much more visitors, than I had expected, although their content was relatively thin. For instance, they covered widgets from different manufacturers, which we actually don't sell, and we had a number of requests from customers concerning these. I viewed it as interesting information regarding customer-needs and the long tail, but it turned out that many of these products don't fit in the rest of our shop. So finally, I put google-adsense on some of those pages, so that my customers at least may find a way to a relevant resource.
To sum up: It is indeed an essential ranking factor, that the target-phrase is well embedded into a cloud of semantically related phrases. On the same page, and on the surrounding internal pages. I'd second, that this is probably one of the strongest factors at all at present.
What, exactly, is a phrase in the eyes of a search-enginge? Is it just a sequence of ascii codes, or are syntactic transformations recognized as well?
What tools do you use to identify (relevant) phrases?
What, exactly, is a phrase in the eyes of a search-enginge?
In the Phrase patents [webmasterworld.com] they are looking for relevance prediction in the phrases
What tools do you use to identify (relevant) phrases?
Nothing special so far - mostly my eyeballs and note-taking. I've only used this approach 4 or 5 times. Since I'm looking for phrases that are really obvious and would be predicitve of the search term involved, it doesn't take much. I'm considering having a special tool programmed to make it easy to include more pages in the analysis, but up to now I don't feel a strong need.
use the adwords tool
That's a good idea, I think. I have used it for other purposes and this would be a natural.
One very simple thing that I've done is search on Google for the key phrase for my missing pages. Then I notice the Google ads on the right. In one case a phrase that you wouldn't think would be of that much interest is in every ad. The ads are mostly for MFAs. Also ebay, chain stores and such do this. They must have a huge list of words and phrases that will bring up their ads.
You can experiment around and see if it is just the name of the widget or if it has to be kindof and widget. It may not be the phrase you expect so try different things from the page.
It may not be the phrase you expect so try different things from the page.
But to stay on topic about a model for thinking, I think it also pays off to look beyond the obvious, and really, really read what things are saying - and what they're not saying.
Added: Correction, there is one exception, in that case another factor outweighed the "related phrase" element. imho it's a good model for thinking to not get painted into a corner about just one thing.
[edited by: Marcia at 3:51 pm (utc) on Feb. 14, 2007]
I study it - absorb it into pre-existing knowledge and continue on with the everyday work of SEO. I don't dig deeper... I put it to use and let the resulting data speak to me. That's what brought me to PaIR in the first place. Real world experience.
I studied martial arts for more than 20 years. You practice and train so that when it is time, it all happens naturally. This is the same. I study and absorb, knowing it will come out in my daily activities of Search Marketing....
It becomes pervasive in ones thinking... just be 'at one with the concept' and continue on.. he he.... a little SEO Karma for U...
The result can be harmful for your own PPC campaign as phrases you had on the cheap all of the sudden become expensive. Why? Google has now determined that broad matches that turned up in your keyword tool search apply to others listed by competing advertisers - where they didn't exist before. Shortly thereafter, doing a search for that term in G will show up 10 results where there were none, or only a few, before.
Now if Google uses this KT data to help build it's phrase matching system (in the organic database), is another question entirely. If it does ... well, I'll leave it up to the brilliant minds here to scheme how resourceful that could be. (Actually, anyone who has read this far has already thought of it).
<aside>Hmmm, advertiser bids on [this keyphrase]. G's AdWords quality score algo says [this or that predicted phrase] should also appear on the landing page. Neither does, so quality score drops, min bid increases.</aside>
I'm thinking the data might flow the other way -- do the phrase analysis over all sites and then migrate that information into the keyword tool.
People have different ways of learning, but I find that digging - exegesis [google.com], if you will, is what opens up understanding (and relating to historical factors).
It's digging until hearing clearly what's being said, and like in this case, what isn't being said. Like the sound of one hand clapping. ;)
To give a quick and dirty example: Would google recognize
A) the President of the United States
B) the President of our beloved United States
as basically the same key-phrase? To my understanding of the cited patent, it would not. However, it should be clear, that such slight abbreviations form standard-phrases, easily (?) detectible after some sort of natural-language-parsing, would even be a far stronger indication for a naturally written page. Particularly in a few months or years, when exploiting the keyphrase-suggestion tool will have become a standard SEO technique for spammy sites;)
Maybe I'm just splitting hairs, but considering that google presents us this translation tool for dozens of different languages, I believe that it has much more sophisitcated means for a thorrough linguistic analysis of pages, in addition to mere statistical methods.
> I don't dig deeper...
that makes sense from a certain leel onwards, but if we all did so, there'd be noone left to absorb from. So personally, I feel the duty to contribute as much as I can a give a little bit back from what I learned here at ww. Reading those patents is no fun for anyone, but some of us have to;)
I did some research in LSI recently and was quite surprised that some very large databases for word-vector-space-analysis were available on some univerty-sites. I'd speculate that google draws the relevant data from it's own databases, but maybe this phrase-based indexing has become a general field of study in semantic analysis meanwhile, and maybe some such databases also exist. To study these might help us a lot. Any idea how to search for them?
A) the President of the United States
B) the President of our beloved United States
C)the President AND United States
Any number of topics can relate to each of the two phrases individually:
President of the United States
President of Argentina
President of Microsoft
President the Alpha Phi Fraternity
President of the Elbonian Chamber of Commerce
Even without the stop word "the," President can be a one_word phrase. It's the presence of the two phrases together that indicates a possible contextual setting. And then it can go on from there to use different factors for scoring relevancy for the particular document or how it rates for the factors compared to other documents.