homepage Welcome to WebmasterWorld Guest from 54.242.18.232
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
Are your KW's in the right place?
"Understanding" your page...
TheWhippinpost




msg:112604
 5:42 pm on Feb 6, 2004 (gmt 0)

Google Sets, Broad Matching, Stemming, the "~" (tilde) operator, blah blah blah...

Are we overlooking this technology?

Are we looking for our SERP's in the right place?

I've noticed something in my SERP's which I'm still struggling to conceptualise so I'm throwing this out for comment to either be kicked down or expanded on.

Here's an observation:

I'm using an example based on 2 keyphrases which are both related in topic but concern different product models.

EG:
1) widget model tutorial
2) widget tutorial

...where widget is the same brand name/product and model is a version number or abbreviation of that product, EG: Dreamweaver MX <-- PURELY AN EXAMPLE!

Pre-Florida, my (steady) rankings:

1) widget model tutorial = #1
2) widget tutorial fluctuated between #1 and #2

These appeared on totally different pages using the above keyphrase queries.

Today:

1) widget model tutorial = #1 (No change).
2) widget tutorial = #427! (ergh!)

BUT... on the SERP for widget model tutorial my dumped on, widget tutorial, is right underneath at #2!

It's as though widget tutorial has been "re-classified", grouped, deemed as related to, or a variant of, this particular (widget model tutorial) SERP instead of the pre-Florida SERP...

Here's something else that I've noticed within the same SERP...

If you use the "~" operator like so:

KW ~tutorial

You get "variants" of the word 'tutorial', like:

- tutorials
- help
- introduction
- basics
- guide
- manual

... as expected, BUT[/i]...

LOOK at results #3 - #10 for [b]widget model tutorial:

3. widget model tutorial : Buy at the best price on [STORE] -
4. KvR : widget model Video Manual and Tutorial CD-ROM Set for the Mac
5. KvR : [DOMAIN-NAME] widget model Video Manual & Tutorial CD-Rom
6. [MANUFACTURER NAME] widget model Bundle w/Tutorial
7. [DOMAIN TRADEMARK]: widget model Video Manual and Tutorial from ...
8. widget model Video Manual and Tutorial CD-ROM Set for the Macintosh
9. Using [TRADEMARK NAME] Mouse Keyboard with widget model - [DOMAIN NAME]
10. widget model tutorial: Hitpoints und Slices Tutorial -

NOTE: Words within "[...]" omitted in compliance with TOS.
NOTE 1: Hold the thought [MANUFACTURER NAME] (above), see below.

Now, notice that the next occurence (after #3) of the phrase widget model tutorial doesn't appear until #10.

Why? Well I don't profess to know the answer - There may be off-page factors (I haven't looked as of yet) and/or it could be sommat as simple as the colon (:) at the end of the word "tutorial".

But look at the occurences of the word "manual"!

Call it themes, call it grouping, call it Sets, related, similar, variants, or whatever... is it now that G can better determine subject matter by looking also for "expected" words within the documents... both linkin from and within the target page perhaps and/or contextually also?

Further...
RE: Note 1 (above):

KW1 = widget = Brand/Product name.

If I do:

~KW1

[MANUFACTURER NAME] comes back as one of the variants (See result #6 in the SERP's above)

G knows that [MANUFACTURER NAME] is related to the [BRAND/PRODUCT NAME] even though they are NOT "common language" words used outside of the industry.

What am I saying?

It seems to me that G "expects" to see related "variant" words within documents so as to "better" classify, or understand what it's actually about.

This is, in theory, a smart algo for not only weeding out pages solely targetted at KW's without regard to the expected language, but also - again in theory - to apply more weight to, or "classify" documents it can "understand".

It's like a multi-pincer "attack" from different angles.

So maybe it's not neccessarily that your pages have bombed but perhaps been "grouped" more "relevantly" elsewhere... if it's not relevant - in your view - maybe it's time to look at what G is "expecting" to see and help it "understand"

Forgive me here but I'm tryin to make sense of this myself whilst writing so I hope that's clear.

Thoughts, observations, expansions?

 

pavlin




msg:112605
 7:55 pm on Feb 6, 2004 (gmt 0)

It has already been pointed out, that on some "industries" G has a "dictionary" of related words.
There were a close example with searcg for "car" and a site of a car company.

Maybe it would be interesting for you :
[webmasterworld.com...]

annej




msg:112606
 1:24 am on Feb 7, 2004 (gmt 0)

I just spent an interesting hour exploring my key words with the ~ before them. Now I have a list of words that Google seems to have grouped together. In my topic they seem to make sense and there were very few results that didn't fit even as low as page 10 - 15.

I am just curious as to how Google is sorting out which pages are on top. Is it how many related words are on the page or in links to the pages? What have other people noticed?

sblake




msg:112607
 3:21 am on Feb 7, 2004 (gmt 0)

annej makes a good point--

I've also spent an hour or so playing with the ~ operator, finding out which words Google groups with the various words in my relevant keyword phrases.

Then went back and ran my relevant keyword phrases straight up, and looked at the source of the pages that returned in the top 5 or so for each search. Keyword density for the exact search phrase was usually pretty low. But-- in each case, the high-ranking pages were loaded with most of the other words that Google has determined are related to the keywords.

If theming is the future on Google, this may be a way of getting with the program.

drewls




msg:112608
 3:28 am on Feb 7, 2004 (gmt 0)

It definitely helps to add these synonyms to the mix. However, whatever Google is doing to decide it's ranking at the moment is very inconsistent and not conducive to testing. I don't know about anyone else, but every time I think I've figured out something like this, I come across a site or two in the top 10 that prove me wrong.

This leads me back to my original theory, which is, whether we call it a filter or not, Google is applying a different set of rules to pages that match a certain profile. Meaning, for example, if your page has a large number of incoming links with the keyphrase in the anchor text, the page is looked at differently for on-page characteristics than pages that don't.

annej




msg:112609
 6:51 am on Feb 7, 2004 (gmt 0)

Are there others in this forum who have looked carefully at themed words and whether Google is more interested in what is found on the page or if incoming link text is the major key.

I wonder also what effect it has. Is Google giving pages with a variety of themed words more weight? If they aren't yet will the be doing it in the future? It occures to me it won't hurt to be ready.

valeyard




msg:112610
 9:22 am on Feb 7, 2004 (gmt 0)

This could be right, but I dunno. I'm still feeling that the new algo actually rates on-page factors a lot less. My guess is that Google finds its preferred hubs/authorities for "widget tutorial" then gives huge weight to the pages they link to. Many of these will contain the word "manual". Many will be pages of links ("widgetco has a great list of manuals...")

The (a?) flaw in the implementation appears to be that preferred hubs/authorities are also given huge weight for any search term they happen to mention. Hence a page on WW temporarily gets into the top ten for "Bucharest apartments" despite containing no obvious synonyms for "Bucharest" or "Apartments".

Unfortunately all this analysis is being made more difficult by the fact the Google seem very definitely to be using at least two different algorithms, whether triggered by filter or whatever.

Certainly at the end of the day including synonyms is a good thing if only because many users will search on them directly!

TheWhippinpost




msg:112611
 6:05 pm on Feb 7, 2004 (gmt 0)

It has already been pointed out, that on some "industries" G has a "dictionary" of related words.

Hmm... I'm not so convinced there's a "dictionary" of industries personally as that implies a "hitlist", and that in itself implies intervention. Moreover, G likes to automate and with language being evolutionary and fluid in nature, automation would be the best solution.

As I said above, the manufacturers name and brand/product name (in this example) are not common language words you'd find in a dictionary outside the industry - well, possibly the brand name in (very) modern dictionaries but certainly not the manufacturers name which also happens to be German - but G is defo linking the words together as being contextually related and that relation is not "naturally" apparent as say, MUSIC <--> SONG which would be a "natural" relation.

This, it seems to me, supports assertions made here by others that G may be recording search terms made by users and forming a dynamic "dictionary" maybe based on common word frequencies found both in the document and from the search query and/or DMOZ also, and making a contextual judgement.

sblake
But-- in each case, the high-ranking pages were loaded with most of the other words that Google has determined are related to the keywords.

Thanks for your observation, I've seen it again on one of my (other) SERP's today too.

drewls
I don't know about anyone else, but every time I think I've figured out something like this, I come across a site or two in the top 10 that prove me wrong.

Post your observations. I think you're essentially right and the best algo's I suppose will be the ones that're scalable to differing dynamics, ie... the more (or less), competitive, the more, or less relevancy tests and scores applied - Makes sense really as the more pages there are to judge, the more strict the judging has to be.

annej
whether Google is more interested in what is found on the page or if incoming link text is the major key.

It's got to be both but I guess you're askin which one is the more influential, dunno.

Maybe a hypothetical question to ask is what would happen if you had a page with just a title, header (h1), link and a line of text, each containing just the key-phrase and then set up loads'a links to it - from various PR values...

...And then compare the resultant ranking to an on-topic (proper) content page with just enough links to get on G's radar.

valeyard
I'm still feeling that the new algo actually rates on-page factors a lot less.

If that's so, then the hypothetical test (above) would rank the "no-content" page with loads'a links above the content page... and with enough links may even get to #1 position - If that's true, then it's a farce. It's reliant on one factor over all others.

OTOH, if you have an understandable page-title describing succinctly the content within, and then have content that has subject-validating "co-words", ie... the keyword "variants" that're normally "expected" to be seen in such a topic, you instantly make life more difficult for spammers as it dilutes keyword-reliance.

Adding these "expected co-words" into the mix can really help lock-down the subject-relevancy of a doc.

A (highly) speculative example may be; if out of, say, 1 keyword search query, there were 10 "expected co-words". One doc had 6 and the others only a max of 3, then the 6 "co-words" holds more relevancy to the query...

The more keywords used in the query, the more co-words expected and the more "precise" the understandin of the page... like I say, a "pincer-like" assessment of the page.

It's a test which is one amongst others including incoming links I guess.

Hence a page on WW temporarily gets into the top ten for "Bucharest apartments" despite containing no obvious synonyms for "Bucharest" or "Apartments".

Ah, but isn't that just the Freshbot? As you say, "temporarily".

claus




msg:112612
 2:07 am on Feb 8, 2004 (gmt 0)

Nice thoughts whippinpost, it reminded me me of something from back in December - i had to dig a bit but i finally found it. Here are my own thoughts at that time (two short posts from one thread on getting "Back to number one"):

[webmasterworld.com...] (post 67)
[webmasterworld.com...] (post 80)

If that post #80 isn't humor... humour, even... a bit dry or weird perhaps, but i guess that was the flavour of the day.

This quote from #67 nicely sums it up:
Two keywords are rarely enough to adequately describe any product or business. Or, to quote Brett in a relevant article: I find it helpful to meditate with "I'm not CNN" as a mantra

A day/page or two later i elaborated a bit. Uhm...in fact i was rather verbose. I still think this post holds some value - the thoughts are similar to those you expressed above, but it's a long post and i don't like double-posting so i'll just post the link here in stead:

[webmasterworld.com...] (post 106)

As you mentioned German language you would want to read post #16 in this thread closely - it has a few more subtle points as well:

[webmasterworld.com...]

... some even related to this:
>> but every time I think I've figured out something like this, I come across a site
>> or two in the top 10 that prove me wrong

tenerifejim




msg:112613
 11:59 am on Feb 8, 2004 (gmt 0)

Guys, the best way to work out what G is up to with related 'on-topic' words is check out the adwords, word-suggestion tool. I went through this a couple of weeks back, stemmed a few words, added a few extra on-topics to just a couple of pages and they jumped from 50 to top 10 in a week.

TheWhippinpost




msg:112614
 4:09 pm on Feb 8, 2004 (gmt 0)

Claus, thanks for the links - and guidin me to the relevant ones ;)

I'd read a couple of those threads at the time and planted the theory to the back of my mind for (later) fertilisation... I think we're on the same page with this, along with a few others.

Interesting note about the German thing from GG.

It was mentioned somewhere in those threads; natural language. If any of the above was about anything, it's about that IMO.

tenerifejim
Guys, the best way to work out what G is up to with related 'on-topic' words is check out the adwords, word-suggestion tool

Good to hear it seems to be workin for you.

It was the Adwords Suggestion tool, along with the other help docs there, the (recent) stemming addition, this post [webmasterworld.com] that was brought back from the death by Brett recently [#16] (bad cough Brett, Bird-Flu?!), and the "lack" of traction when it's been "hinted" at in the past (from posts such as Claus's), that prompted me to look into it further and throw it out wide using an actual example in the hope that others would maybe see a similar behaviour in their own SERP's too... or not!

It's sometimes said - quite rightly - that '... you're granting G more intelligence than is possible...' or words to that effect. I don't think that comment can be levelled here as it's an ability G can use... That's not to say that it is using it to measure the SERP's ATM - That's what this thread is about... to ask the question.

annej




msg:112615
 7:13 am on Feb 9, 2004 (gmt 0)

This has been so interesting. It's great to find a thread that is really discussing something instead of complaining.

I tried integrating some of the words I discovered when searching with ~widget and ~widgeting. It will be interesting to see if it makes any difference in serps on key words and phrases. Even if it doesn't it should help make sure I have all relevant key words that searchers might use.

It was easy to include them in an natural way as they were all a part of my topic. I just hadn't though of making sure they were all on the page.

steveb




msg:112616
 7:27 am on Feb 9, 2004 (gmt 0)

The ~ thing is an eye opener for me. Never tried it before.

Then, after trying (no quotes) "keyword ~keyword" and "keyword +keyword", I tried "keyword keyword". The serps were remarkably excellent, even better than the normal results. Maybe all these emphasize on page words and discount anchor text. That would explain why I see anchor text junk fall even more when I try these.

One funny thing... the ~ search highlights a related, technical niche term that I didn't think Google was smart enough to know about. (Very important discovery in itself.) The funny thing though is there are three ways to spell this term. All are "correct" but Google only recoginizes TWO of the spellings as synonyms. I have it spelled all three ways on my site, but the majority of times I have it spell the NON-synonym way. Aaargh. Got a lot of search/replace to do.

claus




msg:112617
 10:55 pm on Feb 11, 2004 (gmt 0)

Just adding a pointer to the "Latent Semantic Indexing" thread, as it's relevant to this one:
[webmasterworld.com...]

annej




msg:112618
 6:24 am on Feb 12, 2004 (gmt 0)

Hmmmm, I made the changes I mentioned im msg 7 and today I've moved up from #8 to #7 in my best single keyword. The page also has new fresh tags. Maybe it was just happenstance but I have to wonder if the changes helped.

I just printed out all the LSI stuff, looks really interesting and sure fits what we've been talking about here.

TheWhippinpost




msg:112619
 1:11 pm on Feb 12, 2004 (gmt 0)

Well done claus, saved me posting the link ;)

It's starting to make a whole lot more sense now.

Kwix




msg:112620
 3:08 pm on Feb 12, 2004 (gmt 0)

I have to agree with Claus. The LSI paper is VERY related. I like the example in the paper of the news article test. Because "Iraq", "Gulf" and "War" appeared very frequently with "Saddam" and "Hussein", articles pertaining to the Gulf War or Iraq in general would be returned in a search for "Saddam Hussein" even though his name does not exist in the article.

I have noticed exactly the same sort of response from Google recently, even when quotes are used around the search phrase. Very annoying if you are looking for something specific, but I can see how it would be helpful to the average John Q Searcher.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved