| This 45 message thread spans 2 pages: 45 (  2 ) > > || |
|Theming: Is it a buzz word or is it real?|
What is it?
Is there any evidence that Google (or any other search engine) actually uses themes? I like the concept, but would like to know its practical implications, if any.
I am not an expert but here's what I believe:
Google does everything based on themes. Themes equate to what category you are in for DMOZ and/or Yahoo.
For a given keyword search, Google first tries to figure out what theme the searcher is chasing after. That involves looking for text matches in the DMOZ/Yahoo directory names as well as text matches in the DMOZ/Yahoo listings. Or maybe it's just a search on the keyword in all of its sites - and then based on results, there's an attempt to find the right theme. In any case, there's gotta be an algorithm to get the best theme match.
Based on that, Google will prioritize search results such that theme-specific sites (i.e. sites in the right DMOZ/Yahoo directory) come up on top.
PageRank appears to also be allocated by theme. In other words, you can easily get a PageRank of 8 in an obscure category where there are not many competitors (e.g. Civil War Memorabilia) but its much harder to get that PageRank in a very intensely competitive category (e.g. marketing companies). To my mind, it's very likely that PageRank calculations are performed theme by theme, not performed across all the sites in the Google index. That's why links from sites outside your theme might hurt you more than help you.
Mind you, this is all conjecture, but if you were going to create Google, isn't that what you'd do to get the most relevant results?
Yes - Teoma was almost certainmly built around theming.
See the following discussions
|To my mind, it's very likely that PageRank calculations are performed theme by theme, not performed across all the sites in the Google index. That's why links from sites outside your theme might hurt you more than help you. |
I agree with some of your "conjecture," but I don't agree with any of this part at all, and see no theming component at Google or specifically in PageRank.
Upon re-reading your post, Whoa, I think you may be using the term "theme" differently than it's come to be commonly used in SEO.
A couple of threads worth reading, to be sure we're talking about the same thing:
[edited by: JayC at 5:33 pm (utc) on Sep. 30, 2002]
Im almost 100% sure Altavista uses theme ranking, my pages are build up for a good theme and I rank perfect in Altavista and mean perfect.
Pagerank is 1 dimensional.
Bring 2 themes into play and you have 2 dimensions (i.e. an x and a y axis); 3 themes and 3 dimensions (x,y,z)etc.
To develop real themed pagerank you would need thousands of dimensions. I wouldn't like to be the programmer who has to implement that!
It's much more likely that they will continue with 1 dimensional pagerank and use keywords and links to track down the theme. Of course it means Google will never really offer true theming.
Another way to do theming would be to use the library numbering system. This might have more potential than themed pagerank.
Google is not a theme search engine.
Google has been using factors that take into account context. Link text compared to the search phrase being the most notable. The essence of a theme search engine is the ability to rank a page based on there being other pages in the same site of the same or related theme of keywords. Google uses many factors in its algo. PR is one very important one. But remember the precise name... PAGErank. This works on a PAGE by PAGE basis, not on a group of pages within a site. To the extent that Google factors in how any pages relate, it is using link text (at least currently as I understand it). Such linked pages can be in the same site or across many sites
I agree with the other posts about Teoma, and AltaVista. AltaVista was one of the first theme engines as I recall . The now defunct Excite was supposedly a Term Vector Database theme search engine also.
One other point. Themed pagerank would be pointless anyway because it's only half the time an outbound link is related to the source site. I normally link to sites that are not related to me in any way. Why would I send my customers to my competition?
The same applies to non-profit pages. There may be a couple of links on a page to similar pages, but most will be unrelated to the topic of the site. E.g. Embassy site gives links to currency converter, local airline, hospitals etc. Where is the theme?
The truth be told, the only theme that can be isolated is in the anchor text of the link. Google's already got that one sorted. The only thing they can do more is to "interpret" the keywords. This might do more harm than good though, as it would be easy for google to look pretty dumb by putting the wrong interpretation on the keywords.
I agree to the others, Google's only relation to themes is anchor text. It's hard to implement themes into Google and especially into PageRank, albeit the Haveliwala paper on topic-sensitive PageRank is pretty interesting.
The Kleinberg algo which Teoma uses has certain aspects of themes. I've real only little about it, but I think I remember that there are non the less severe problems about staying on topic.
IMO, the best candidate for themes is Fast/Alltheweb. I think a Fast official has once said that they do use themes and, personally, I can't find a good explanation on why many sites/pages rank well for some keywords and are nowhere to be found for others.
Search engines aside, I look at theming kinda like I look at xhtml. If used properly, it makes your whole site much more logically structured and easier for visitors to navigate. That's a good thing whether it boosts your ranking or not...
Themed PageRank may not be so impractical and far-fetched at all. In fact, elements of it could already be in play for all we know.
See the following research paper and threads devoted to "Topic Sensitive PageRank":
One way to develop topic connections would be to partner with Alexa who keep a database of which websites users move between
I agree with mivox completely. If nothing else a themed site is easier to take care of, easier to internally link within the content, and really can be much easier to navigate. Also, more attractive to link partnerships.
Also, it makes a lot more fun to build a site as a themed site because you get deeper insights of your theme while building the structure. I learn a lot about my audience and the people i target simply by seeking related words, topics, typos and such!
Unfortunately themeing can be misused like any other technique - wait for the big themed spam portals :(
That's actually a misconception. Technically, the "Page" in PageRank is taken from the name of the inventor, Larry Page.
But what you're saying is right on target.
>Google is not a theme search engine
Maybe not in the strict sense of the word, but it's not conclusive that some of the factors that contribute to the concept of theming don't come into play.
I believe it goes further than link text, and I've seen a little evidence this update that supports that. There's no other explanation other than that a couple of the factors that go into theming made a generous contribution to the jump in rankings of a couple of "worth nothing" pages I'd basically forgotten all about.
Those insignificant pages, one PR4 and one PR5, are now on the first page out of about 600K pages, while a PR6 site with around 500 backlinks and a PR6 site with about 1500 backlinks dropped way down. Those other sites are actively promoted, but unlike the 2 little dark horses there are none of the factors that go into themes applied on them. And BTW, the PR4 page shows no backlinks at Google and the site itself has only about 3 external sites linking to it altogether - all with a similar, though minor, factor in common. They're roughly related in theme.
>elements of it could already be in play for all we know.
Dante, maybe not in the actual calculation of Page Rank, but supportive of it in determining the relevancy of the page that's the origin of any given link. That's exactly what happened with those two pages, with a site they link to, which is 100% relevant (and themed) sitting in #1 and 2 positions for that same search.
Since I first saw the paper on term vector databases, I've never been able to get it out of my head, and still go back and read it periodically - again just this past week. There was another paper that got little attention, on identifying web communities - definitely worth another read, because in the natural course of things certain identifiable "communities" of like sites do tend to link back and forth with each other quite honestly and naturally.
Linking within logical communities, combined with incorporating the elements that go into theming, might just be the way to do "safe SEO" because that combination is not easily contrived or forced.
Whoa>>Themes equate to what category you are in for DMOZ and/or Yahoo.
The sites linked to from any given DMOZ category represent a collection of sites that are a "community" of like sites. Categories that are horizontally positioned within the same higher category grouping are loosely related. Not identical in the sense of a specific community, but in proximity, roughly in the same broader neighborhood.
Themeing is all about context. I wrote the original article in late 97 after several long discussions with a programmer at Infoseek on the directions they were going to take after the switch to go.com. Those actions never materialized due to go.coms faltering. I updated the article in late 98 and early 99 after G came on the scene and seemed to confirm many of the propositions about themeing.
It's all about context (or Topic Distillation) and how easily a search engine could identify an appropriate set of keywords for your site. The original hypothesis was that it would all be done "on site". They'd take your entire site and index it as one giant page, density analyze it, rank the keywords found, and create a core group of keywords that would be appropriate for your site. You would only be found in the search engine results related to those words. Back then, that would address much of the problem they were having with bait and switch, and the early days of cloaking with in appropriate content.
Then came all the link and off page criteria theories. There was the growing realization that external data could define a site as much as on-the-page words. Directory listings and inbound links are the main off page data that could be used.
There are also the simple semantic relationships between words that can be used to define a site. Googles "sets" in the labs.google.com utilities is a prime example of how keywords can be related to one another. This is a working example of what was a very hot topic three years ago on WebmasterWorld: the infamous Term Vectors [www9.org]. It is the ability to make numeric associations between words.
Remember those fancy iq tests many of us took in highschool? Spot the odd man out:
Car is to truck, is to motor, is to battery, is to trees, is to leaves, is to plants, is to ecoshpere, is to pollution, is to green house gases, is to muffler, is to truck.
As we can see in the Google Sets, all those associations can be given a numeric score and either included in a list, or excluded in a list. I can't think of another thing Google has ever done that has tipped it's hand as to what it will do in the future than that utility. The only thing better would be if Google would print the actual numeric score between the words. (that and validate the html on the 'sets' results).
So how is Google using all that contextual data to rank your site? Details are unknown at this point of course, but a few techs at recent conferences have indicated contextual data such as page titles of linking pages maybe being used.
That use of context in its various forms could be very powerful is finally rooting out the dreaded "off context" results that plague other search engines. We've all seen some widely inappropriate listings in the middle of a results page on other search engines. By using various forms of "context" to make sure that query terms are an appropriate match for any page, se's can eliminate that occasional bad result.
I don't think we can under estimate how much that one bad listing can cost a search engine. If you are searching for "printers" and run into a page in the results from "vacations in California" because it happens to mention "printers" on the page, what do you do? How many of us do something different at that point? We change the search, hit the back button, or just go to another search engine. That one bad listing poisons the whole page. I still think this is primarily why other search engines have not be successful. Peoples patience and attention span with web work is very short.
I think it is a no brainer that context will play a greater and greater role with all the search engines. Every scrap of data they can get their hands on to help define your site will be used. The core group of contextual items: page title, inbound link text, directory listings, domain names, site directory names, dns information, whois information, toolbar data, voting data, referral strings, click through data, and proxy cache data are the major ones available to se's.
After that, we get into some of the real guru stuff with query relationships, search refinement relationships, predictive search terms, personalized search histories, follow up query prediction, and community identification. Some of that has already come to pass such as the predictive search terms we can see in the auto spell correction and the query relationships in the "sets" again.
The real challenge is going to be synthesizing all that data down into a usable tool. If you've ever worked with huge data sets, they can either be poetry or chaos. It takes serious and slow long term testing to synthesize a googol [webmasterworld.com] of data.
If you look at a few of the smaller moves Google has made over the last year such as the purchase of Outride and the "labs" stuff, I think it points to a major overhaul of Google that is in the works. All these little refinements to Google we've seen over the last year are evolutionary steps to a complete evolutionary overhaul of the ranking systems.
As each of those data sets mention above is implemented, adjusted, or inject into the mix, there will be small sets of results that change radically as a result. You'll see things like we saw this month, where a wide swath was cut through a group of like sites, and other site saw increases.
Watching and trying to come to terms with those changes is near impossible. Just because you can identify something, doesn't mean you will be able to adjust anything on your site to benefit you.
That's where the whole theme concept comes in to play. It's about staying on topic and on mission throughout everything you do for your site. That translates into two parts to themeing a site: it is part a philosophy that Content is the king and part pragmatic in the way you arrange your site. It's the realization that everything you do online with regard to your site can potentially effect its ranking in the future.
Google said it [google.com] best:
|"#2: It's best to do one thing really, really well." |
It's coming to terms with the fact that you can only temporarily drive the search engines and the only successful optimization is to let them come to you. That is done by building an excellent site that serves your visitors long term. Focus on the visitors, and the search engines will eventually follow.
In conclusion, although some of the specifics of the themeing theory such as whole site indexing never came to pass, the contextual heart of the theory is stronger than ever.
must be short hand for -- battery is to acid, is to rain, is to tree...
|Focus on the visitors, and the search engines will eventually follow. |
This is where most get huge up. Too busy in the hunt for the "visitor from some search engine" that when they actually arrive the visitors needs are not met.
The SEARCH WAS MORE IMPORTANT THAN THE FIND.
|It's about staying on topic and on mission throughout everything you do for your site. |
So simple yet so true. And what Marcia said about communities. It’s relationships and the relationships between words, groups of words, the ‘idea’ behind the words, all which make up community. Bravo to you both for sharing.
Thank you Brett for continuing to take us to the next level. Your post is thought provoking and worth taking the time to digest. I just want to again thank you for pushing the envelope for us, giving us the opportunity as an industry to continue to grow.
It's easy to stay on topic if you are selling what people are looking for. But what about the guy who sells, say vans. He might know that there are 100 times more searches for cars than vans, and he might have much more success selling to the car audience than the van audience, purely because of the search volume.
So, he can build a great site on vans, but no-one will find it if it is too on-topic. This guy needs to build a great site on cars and steer visitors towards his real topic. It's not great for the search engines, I know, but it's real life. It's basically the same principle as "sex sells". First get their attention and then kill the sale.
In my opinion this is the real domain of the SEO expert. Building a good site which is on topic is a piece of cake. Bringing in visitors from periphery searches is much harder. Not quite so simlple as just "building an excellent site"
Thanks Brett. It's easy to get stuck in the 'what is Google doing now' mode. To predict what Gogle does later, the background is important.
> ...points to a major overhaul of Google that is in the works
I agree. There are many good ideas in your post, but I'm not seeing them in use right now. Some of the elements (using page title and anchortext on the referring page) would surely be quite cheap to implement.
SlyOldDog cars made indeed have more queries but the point is --convincing someone that they don't know what they want is the wrong premise.
You may get a few to scrap their original query but more likely -- most will back on out and look elsewhere, therefore you have wasted much time and effort targeting the wrong market.
It would be far better to capture accessories traffic on Van Parts, and secondary items, developing inside of your market's needs.
If you develop a "car" site you had better have "cars" and not just "vans".
Too bad you aren't my competition. I wish they thought like you ;)
It's just a financial equation. Which is the most profitable avenue? I am pretty sure of my ground, at least in my sector (it's not vans by the way). Once my customers find my site they are more than pleased they found it. They just didn't know my product existed. I need to find them, because they CAN'T find me.
This must be quite common on the internet. If it isn't, there are certainly a lot of niches out there to be filled.
Fathom may be stiff competition for the quality traffic.
If my site sells vans I want the people looking for vans.
My most grateful customer this month gets less than 20 people per day into his site.
Just one thing I worry about this idea of theming in the ranking of sites. When I do a search, I want the most relevant *pages* that match my search. If I do a search on "purple peguins", I don't care if a page about purple penguins happens to be on a site that is almost entirely about green walruses. While a search engine based on themes will find well the sites that best deal with the content, this could result in missing important pages because they are off theme to the site they are on. I can see theming being particularly bad for "potpourri" type sites, which have a lot of content about a lot of unrelated stuff on the same domain. A lot of personal type sites come to mind here. The person just has a lot of different content that interests them, but it isn't really related.
hmm... regardless of industry or market - vans, home furniture, or hotdogs, I would believe your most profitable online customers are the ones looking for these items and not cars, office supplies or hamburgers. They need less enticement. If you need to target other queries to convert more "less qualified traffic" your marketing cost are increasing whether you consider this or not. You need to accommodate more.
Should vans be your niche -- the more you target "cars" -- the less likely your site will appeal to a "van" market. This is the trade-off of what you are suggesting.
The financial equation - I would rather have 1000 visitors and a 100% conversion rate than 1 million visitors with a 1% conversion rate -- even though the latter may seem (10,000 customers) like you have more revenue - your cost to produce has increased expediently.
I on the other hand need not add, change or spectulate on what my target market wants - and I can still open up a new store (site)for "cars" and reproduce the same thing.
You can convert someone to looking at something else -- sell the sizzle and not the steak... but these are much harder sales.
If "vans" brings your site 100 visitors a day but you only convert 1 to a customer your greatest profits would be found in increasing your conversion rate of your niche -- not looking towards someone elses niche.
Sorry to take this thread off topic. It's just that this particular discussion with Fathom is worth taking further.
Here are some simple, real stats for you.
I make a sale to 1 in 5 visitors who search for "core product". Average revenues $X
However, I only get about 20 hits a month for "core product". That means my focused core sales are around $4X
I sell to 1 in 60 customers who search for "periphery product". Same average revenues.
But I get 12000 hits a month for "periphery product". This means $200X.
My advertising cost (PPC, web site maintenance) is about $8X and might be $4X if I dropped the periphery focus.
There isn't much targetting I can do for other keywords similar to "core product". People just don't look for it.
So that's the equation. There's no business model unless the periphery product is targeted.
We can just go back to the topic of the thread, which is themes, and relate it in concept to the side discussion which has developed, instead of having to split it off. We can relate it to the whole forest rather than having to split to a second discussion about a single tree.
There are some products, even other than automotive, that are impossible to describe; there are no familiar words to describe them for which searchers would type in search queries. The only answer for those is to pull in visitors on peripheral items, or for a broader, more generalized category, who would also be interested in those hard-to-describe items.
There's specific knowledge that's gleaned from familiarity with a particular industry or market, and the fact is that results and comparative ROI that have been tested and proven themselves over time cannot be theoretically argued with, especially since there can be so many variables for different sets of core keywords.
This is where deciding on the overall theme of a site benefits searchers and site owners both, regardless of whether or not any particular search engine is "themed." It's taking a broader look and integrating the strategy into an educated, market-driven choice of core keywords and organization of the site navigation.
Great to see you agree! :)
I think your original example:
|It's easy to stay on topic if you are selling what people are looking for. But what about the guy who sells, say vans. He might know that there are 100 times more searches for cars than vans, and he might have much more success selling to the car audience than the van audience, purely because of the search volume. |
inferred that this site had no cars only vans but themed towards cars to get large volumes of traffic so the site could push vans out the door.
In reality your particular site is theming towards the needs of the market. Not just attempting to target traffic with the hope of them settling for something they really were not looking for.
Markets are not always politically correct.
plate tectonics vs tectonic plates
customized software vs custom software
which also extends into spelling errors and language barriers
science project vs science progect
optimization vs optimisation
Theming infers that you actually have this content/information/product/service.
Well, I sort of agree.
I mean, my site looks like this:
"Peripheral keywords" are no good. What you need is my product instead. "Peripheral keywords" are expensive. You need my product.
So I guess from a search engine perspective I fall within the theme of the peripheral product because I am giving my visitors the chance to see the flip side. In fact, why not put me at #1 becuase it's always good to hear what else is on offer before you take the plunge :)
It feels kind of odd building a big theme site about why the product I am searched on is no good. Still, so long as it puts food on the table....
| This 45 message thread spans 2 pages: 45 (  2 ) > > |