Forum Moderators: open
"Any clue as to the possible role greater reliance on semantics is playing in your never ending quest for more relevant results?"
I'd say that's inevitable over time. The goal of a good search engine should be both to understand what a document is really about, and to understand (from a very short query) what a user really wants. And then match those things as well as possible. :) Better semantic understanding helps with both those prerequisites and makes the matching easier.
So a good example is stemming. Stemming is basically SEO-neutral, because spammers can create doorway pages with word variants almost as easily as they can to optimize for a single phrase (maybe it's a bit harder to fake realistic doorways now, come to think of it). But webmasters who never think about search engines don't bother to include word variants--they just write whatever natural text they would normally write. Stemming allows us to pull in more good documents that are near-matches. The example I like is [cert advisory]. We can give more weight to www.cert.org/advisories/ because the page has both "advisory" and "advisories" on the page, and "advisories" in the url. Standard stemming isn't necessarily a win for quality, so we took a while and found a way to do it better.
So yes, I think semantics and document/query understanding will be more important in the future. pavlin, I hope that partly answers the second of the two questions that you posted way up near the start of this thread. If not, please ask it again in case I didn't understand it correctly the first time. :)
Want to bet? I happen to do a lot of stuff in the car hire area and can show you loads of destinations where the top 10 sites have nothing to do with renting or buying!
I don't really think so. That would be DMOZ, and not Google. As far as we've seen on Google Romania (who accidentally has some inversed links that sort of ruins Google's credibility in our country), they did not update the serps as previously done on each Friday. Nor did the PR change, as done on each Wednesday. And as far as we can see, Google only display results into the first three serps. If you quickly browse the first three serps, you'll see Google doing another search for you term. Results on Google are very easy to handle, when being backed-up by a traffic legion. And what's with that error we're getting? (e.g., link a web page to a web site that has PR 6 for example and is located on the same web server you're on. Sooner or later, you'll get its PR on your web page). Further more, how come Google gives carte blanche for a randomly-generated forum web pages? For instance, we freely receive a PR 1 on each randomly-generated web page, while webmasterworld gets 3. Is that some sort of "trust vote"?
Thanks.
It does not explain what hapened to my site, that conteins the keyword in the url, in the site's name and on all of the backward links.
Well Pavlin, maybe you have gone too far by trading links with the same anchortext every time?
In the area my business is in, the 64 results are very very good and also in every search I needed to doon 64 not regarding my business I found relevant information in the TOP 10 results.
So for me the 64 index is absolutely great!
I went out for a very nice meal last night, started late this morning and find that GoogleGuy has confirmed all the stuff we have been belly aching about for three months.
SteveB has it exactly right IMHO re "qulity signals". But what does that mean. Well if you boil it down to its pure essense it means that Google can understand what the page and to some extent site is about if you blank out the term searched for. Just try it on your pages. Print out the source and take a thick felt tip. Score out all of the HTML then score out all of the words that are in your top term. Do you still know what its about? How does it compare with the top three in SERPs for that term whan you do the same to their pages?
City searches are particularly difficult here because very often there are no synonyms or stems for the city name. You need to look for what the top sites have as triggers. Build those things into your page, have links to pages on those terms using that term in the anchor text. Google doesn't and can't assess quality subjectively although quality is a subjective measure. It therefore measures objective things that approximate to a subjective assessment.
Everyone here who is interested in this stuff should go and read the thread started by Marin about Latent Semantic Indexing. Read the paper that he cites and try and find the white paper on CIRCA. The penny will drop.
I'm certain that Steveb has not implemented this in a contrived way his ite is just so full of large pages of rich language around his subject. He has achieved high ranking by doing what comes natural to hime. For those of us who need to make a change to break old habits and give the Google algo what it is looking for there are ways to do so. Its metaphorically like following a diet, you just need to learn the basics of what to do and stick to it.
If you want to find what Google has in its Ontology (if you don't know what one is do this search define:ontology) then do a search like this ~widgets -widgets and not the words that are bold in the reults (if you have prefs set to 100 you can quickly scan the results. Then feed these words back in to create a map of associated words. Search for the term and look what the top three pages use in terms of associated terms and wher they use them. Now use this new vocabulary you have learned to broaden the language in your pages and in your site. Pretty soon we'll all be doing what Steveb does naturally.
Best wishes
Phil
PS The roast Partridge was excelent
I just have one question which nobody has ever adequately answered for me. It seems to be generally accepted practice among SEOs to have a links page and perform link swaps. These are necessary to rank well in competitive areas especially for commercial sites that don't receive many natural links, and where everybody who is anybody has a (user indifferent) links page. Probably on topic - but about as targeted as a double barrel shotgun.
Of course the whole concept is rediculous, and users would never think of reading most of the links pages on these sites. The Google Guidelines even prohibit artificial linking to deceive the search engine algorithm.
On the other hand. For many competitive areas, no links page means no good ranking, and so far Google has tolerated sites which have these pages. So what is your take on this activity?
So if you have a page that is non-english but use english word as a keyword in the url an site's name or is so closely tied with a topic, with the LSI you are in trouble.
It seams the new algo is based realy on those dictionaries of close words and G is expecting that if your site is dedicated to topic "kw1" it have to say something about "kw2", "kw3", "kw4" and so on. If not - this is a SPAM, the algo asumes.
So if you have done good optimisation for kw1, but do not use the rest of the kw's, you end up "penalised".
The problem is G is using this algo everyhere even if it knows that the pages are non-english.
I guess thats the problem with my MIA page - it is non- english language, but uses an english word in it's title and as a main kw. But when G sees this kw (kw1) it expects to see the other kw's in it's dictionary. And when they do not show up, it thinks the site is spam. The truth is that the rest of the kw's are there (I did some "~kw1" testing and know what my synonyms are), but are writen in other language and even in other alphabet (cyr). (It will be pain for the users if I go and use all of the english words.)
I guess thats the same with the sites, that are so closely on some subject, that do not include the other words. And that's why there are so many portal sites on top - their directory listings contain links with desriptions that use almost every kw that G expects to see.
So the question I ased untill now is wrong. It's now what is happening to the sites that the alglo fails to understand. The answer is - they are getting handled the old way (pre-Austin).
! -> So it will be nice if G stops to aply the semantic algo on the sites it knows are non-english!
As for the english sites - Hissingsid is absolutely right - do some "~KW1" testing and try to use as much of the other kw's that come up.
Also - make those kw's links to some hight relevant sites.
It's not what people think is relevant any more. It's what the machine (in this case G) thinks is relevant. Obey and God help us all!
[edited by: pavlin at 1:55 pm (utc) on Feb. 16, 2004]
Does that mean that it's over? If it is, my site is toast!