Update Brandy Part 3 - (deprecated) Google News Archive forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Update Brandy Part 3

«
1
...
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
»

GoogleGuy

7:41 pm on Feb 15, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Continued From: [webmasterworld.com...]

"Any clue as to the possible role greater reliance on semantics is playing in your never ending quest for more relevant results?"

I'd say that's inevitable over time. The goal of a good search engine should be both to understand what a document is really about, and to understand (from a very short query) what a user really wants. And then match those things as well as possible. :) Better semantic understanding helps with both those prerequisites and makes the matching easier.

So a good example is stemming. Stemming is basically SEO-neutral, because spammers can create doorway pages with word variants almost as easily as they can to optimize for a single phrase (maybe it's a bit harder to fake realistic doorways now, come to think of it). But webmasters who never think about search engines don't bother to include word variants--they just write whatever natural text they would normally write. Stemming allows us to pull in more good documents that are near-matches. The example I like is [cert advisory]. We can give more weight to www.cert.org/advisories/ because the page has both "advisory" and "advisories" on the page, and "advisories" in the url. Standard stemming isn't necessarily a win for quality, so we took a while and found a way to do it better.

So yes, I think semantics and document/query understanding will be more important in the future. pavlin, I hope that partly answers the second of the two questions that you posted way up near the start of this thread. If not, please ask it again in case I didn't understand it correctly the first time. :)

vplaza

12:07 am on Feb 17, 2004 (gmt 0)

10+ Year Member

I am seeing "64" on Yahoo.

farberama

12:40 am on Feb 17, 2004 (gmt 0)

10+ Year Member

would anyone have an idea why the SERPs from google wiewer would be different that 64, 216, or www? My site doesn't show up at all for my keywords in 216 or www, shows up #54 in 64, and #29 in google viewer results.

<<pulling out what's left of my hair!>>

Chicago

12:48 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Scroll down to the bottom of your results page and see if the results say "provided by Google". If not - this is not 64 - this is the new Y! index.

We are not seeing 64 on Yahoo here. We are seeing Yink.

Kirby

12:51 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

allan, what Im seeing goes along with what steveb wrote. A mix of city sites and relevant directories/authorities and newspaper sites that are specific to the industry and city. Some cities are less impacted by directory/authority types and I cant explain the discrepancy.

I still think this is the evolution of Florida and Austin, but I dont think stemming/semantics plays that much of an effective role. I see many results where pages, including my own, are ranking well based on <title> and anchor text, not content. I'll sticky you an example.

quotations

12:52 am on Feb 17, 2004 (gmt 0)

10+ Year Member

>But does the user want the "complete body
>of knowledge"?

They may or may not. It is hard to tell from a single simple search (another hint) but route optimization, like PR and anchor text, and proximity, and stemming, and localrank and about 100 other things, only plays a small role overall in the algo but it can have a huge effect.

Under this theory, whoever is most efficient in providing direct or nearly direct access to the largest volume of the most important and most relevant information, without negatively impacting the other factors, should and does get a bump in rank.

quotations

1:03 am on Feb 17, 2004 (gmt 0)

10+ Year Member

These Inktomi results on Yahoo are almost embarrassing.

On google, we have #4,5,9,10,17
on yahoo, we have #1,2,3,4,5,6,8,10,11,13,15,16,17,18,20,21,22,23,24 ...

Of course, both are excellent results.

;-)

vplaza

1:04 am on Feb 17, 2004 (gmt 0)

10+ Year Member

Yes, I apologize the results appear to be INK, and not "64" Google.
Interesting SERPS though!

Leosghost

1:11 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Can someone say which "yahoo" we're supposed to be looking at ....
I just checked for my KW #1 +KW #2 and I'm still #1..
but on the bottom of the page it says "Google"..
Pass the generic tranquiliser close my eyes and hope it stays that way or what ..=:o)

< sorry I was typing when you said that >

James_Dale

1:57 am on Feb 17, 2004 (gmt 0)

10+ Year Member

Route optimization is the mechanism by which pages and documents are ranked relative to their ability to provide the most efficient overall path to the definition of a complete body of knowledge.
For example, the "Systems Engineering Body of Knowledge" document (SEBOK) of the International Council on Systems Engineering (INCOSE) would rank very high on that scale due to the "perceived" fact that it contains pointers directly to the entire SE Body of Knowledge.
By providing the link to the SEBOK, all other important documents/web pages related to Systems Engineering could be found by the shortest possible route. A link to the Institute of Physics (IOP) page about Distributed Systems Engineering, on the other hand, would be expected to provide some useful information, but much of that would already be reflected in the SEBOK and the IOP page would therefore have a lower route optimization score.
The ability to recognize the optimal size and contents of a body of knowledge and to optimize the route or paths which must be followed to acquire the entire contents of that body of knowledge is a rudimentary exercise in modeling but not a trivial undertaking with a data set the size of the entire Internet.

Hm, yes, but to add to/clarify some of these points:

Route, optimization, mechanisms, cover, (by default) documentative abilities when ranked according to their relationship; (num root) with the efficient paths. This is the means by which PDF documents and their overall internet presence (sic) are established via semi-autocratic knowledge mechanisms. A prime example of this is that the systems engineering (INCOSE) rankings on the forefront, partially-peaked and indexed according to the traditional dampening factor, as yet perceived, focuses on the 64.x index, which is high enough on the paradigm scale for containment of relevancy pointers.

;)

[edited by: James_Dale at 2:07 am (utc) on Feb. 17, 2004]

steveb

2:05 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

farberama, the Google Viewer normally shows the same junk results as an allinanchor: search.

mbauser2

4:28 am on Feb 17, 2004 (gmt 0)

10+ Year Member

The whole idea of LSI and applied semantics is in determining the meaning of a "document" (a term we've heard before in this very thread) which suggests an entire site.

I fail to see any logic whatsoever in that assertion. You're just spreading misinformation.

Google has always used the word "document" to refer to an independent file. If you don't believe me, search Google for "The Anatomy of a Large-Scale Hypertextual Web Search Engine".

metrostang

4:34 am on Feb 17, 2004 (gmt 0)

10+ Year Member

I was really starting to enjoy this thread again until we started comparing datacenter results. I think we should take the moderators advice and wait until we see things settle down.

I can find 7 different IP addresses coming out of 216 with three different search result possibilities, one of which was mentioned above. Three of those are identical to those from 64. I think this just means it's taking some time.

I don't think we will know it's over until all IP addresses from all datacenters show the same results. Until then, let's go on the assumption that Googleguy was being straight with us and discuss the effects of the update.

Marcia

4:50 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I was really starting to enjoy this thread again until we started comparing datacenter results.
I think we should take the moderators advice and wait until we see things settle down.

Exactly, and thank you. We'll not be comparing datacenter results.

All this '216's on www from the uk' and 'it's 64. from here' are meaningless white noise. That is just the normal cycling of the datacenters.

Exactly, and we'll not be doing any more reporting of data center results in this discussion, which is about the update.

Marcia

5:00 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

The whole idea of LSI and applied semantics is in determining the meaning of a "document" (a term we've heard before in this very thread) which suggests an entire site.

One portion of the LSI paper that can relate nicely to this concept is IDF - Inverse Document Frequency. While document refers to a single document, how about the fact that some folks are of the belief that increasing the breadth of a site, as well as the vocabulary used, can help with rankings.

Is there a possibility that the frequency factor could be taken into account across an entire site as well as on individual pages within the site?

Chicago

5:03 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

To all: Do you see specific examples of some kw phrases doing well on some pages and some kw phrases *significantly* downgraded on other pages, when both pages are contained within the same site with the same level of optimization, on page and off?

If so, are the phrases that are downgraded more important from a search volume standpoint?

Yes, yes across many sites for me.

This 327 message thread spans 22 pages: 327

«
1
...
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
»