Page is a not externally linkable
Brett_Tabke - 6:46 am on Oct 5, 2002 (gmt 0)
It's all about context (or Topic Distillation) and how easily a search engine could identify an appropriate set of keywords for your site. The original hypothesis was that it would all be done "on site". They'd take your entire site and index it as one giant page, density analyze it, rank the keywords found, and create a core group of keywords that would be appropriate for your site. You would only be found in the search engine results related to those words. Back then, that would address much of the problem they were having with bait and switch, and the early days of cloaking with in appropriate content. Then came all the link and off page criteria theories. There was the growing realization that external data could define a site as much as on-the-page words. Directory listings and inbound links are the main off page data that could be used. There are also the simple semantic relationships between words that can be used to define a site. Googles "sets" in the labs.google.com utilities is a prime example of how keywords can be related to one another. This is a working example of what was a very hot topic three years ago on WebmasterWorld: the infamous Term Vectors [www9.org]. It is the ability to make numeric associations between words. Remember those fancy iq tests many of us took in highschool? Spot the odd man out: Car is to truck, is to motor, is to battery, is to trees, is to leaves, is to plants, is to ecoshpere, is to pollution, is to green house gases, is to muffler, is to truck. As we can see in the Google Sets, all those associations can be given a numeric score and either included in a list, or excluded in a list. I can't think of another thing Google has ever done that has tipped it's hand as to what it will do in the future than that utility. The only thing better would be if Google would print the actual numeric score between the words. (that and validate the html on the 'sets' results). So how is Google using all that contextual data to rank your site? Details are unknown at this point of course, but a few techs at recent conferences have indicated contextual data such as page titles of linking pages maybe being used. That use of context in its various forms could be very powerful is finally rooting out the dreaded "off context" results that plague other search engines. We've all seen some widely inappropriate listings in the middle of a results page on other search engines. By using various forms of "context" to make sure that query terms are an appropriate match for any page, se's can eliminate that occasional bad result. I don't think we can under estimate how much that one bad listing can cost a search engine. If you are searching for "printers" and run into a page in the results from "vacations in California" because it happens to mention "printers" on the page, what do you do? How many of us do something different at that point? We change the search, hit the back button, or just go to another search engine. That one bad listing poisons the whole page. I still think this is primarily why other search engines have not be successful. Peoples patience and attention span with web work is very short. I think it is a no brainer that context will play a greater and greater role with all the search engines. Every scrap of data they can get their hands on to help define your site will be used. The core group of contextual items: page title, inbound link text, directory listings, domain names, site directory names, dns information, whois information, toolbar data, voting data, referral strings, click through data, and proxy cache data are the major ones available to se's. After that, we get into some of the real guru stuff with query relationships, search refinement relationships, predictive search terms, personalized search histories, follow up query prediction, and community identification. Some of that has already come to pass such as the predictive search terms we can see in the auto spell correction and the query relationships in the "sets" again. The real challenge is going to be synthesizing all that data down into a usable tool. If you've ever worked with huge data sets, they can either be poetry or chaos. It takes serious and slow long term testing to synthesize a googol [webmasterworld.com] of data. If you look at a few of the smaller moves Google has made over the last year such as the purchase of Outride and the "labs" stuff, I think it points to a major overhaul of Google that is in the works. All these little refinements to Google we've seen over the last year are evolutionary steps to a complete evolutionary overhaul of the ranking systems. As each of those data sets mention above is implemented, adjusted, or inject into the mix, there will be small sets of results that change radically as a result. You'll see things like we saw this month, where a wide swath was cut through a group of like sites, and other site saw increases. Watching and trying to come to terms with those changes is near impossible. Just because you can identify something, doesn't mean you will be able to adjust anything on your site to benefit you. That's where the whole theme concept comes in to play. It's about staying on topic and on mission throughout everything you do for your site. That translates into two parts to themeing a site: it is part a philosophy that Content is the king and part pragmatic in the way you arrange your site. It's the realization that everything you do online with regard to your site can potentially effect its ranking in the future. Google said it [google.com] best: It's coming to terms with the fact that you can only temporarily drive the search engines and the only successful optimization is to let them come to you. That is done by building an excellent site that serves your visitors long term. Focus on the visitors, and the search engines will eventually follow. In conclusion, although some of the specifics of the themeing theory such as whole site indexing never came to pass, the contextual heart of the theory is stronger than ever.
Themeing is all about context. I wrote the original article in late 97 after several long discussions with a programmer at Infoseek on the directions they were going to take after the switch to go.com. Those actions never materialized due to go.coms faltering. I updated the article in late 98 and early 99 after G came on the scene and seemed to confirm many of the propositions about themeing. "#2: It's best to do one thing really, really well."