Update Brandy Part 3 - (deprecated) Google News Archive forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Update Brandy Part 3

«
1
...
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
»

GoogleGuy

7:41 pm on Feb 15, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Continued From: [webmasterworld.com...]

"Any clue as to the possible role greater reliance on semantics is playing in your never ending quest for more relevant results?"

I'd say that's inevitable over time. The goal of a good search engine should be both to understand what a document is really about, and to understand (from a very short query) what a user really wants. And then match those things as well as possible. :) Better semantic understanding helps with both those prerequisites and makes the matching easier.

So a good example is stemming. Stemming is basically SEO-neutral, because spammers can create doorway pages with word variants almost as easily as they can to optimize for a single phrase (maybe it's a bit harder to fake realistic doorways now, come to think of it). But webmasters who never think about search engines don't bother to include word variants--they just write whatever natural text they would normally write. Stemming allows us to pull in more good documents that are near-matches. The example I like is [cert advisory]. We can give more weight to www.cert.org/advisories/ because the page has both "advisory" and "advisories" on the page, and "advisories" in the url. Standard stemming isn't necessarily a win for quality, so we took a while and found a way to do it better.

So yes, I think semantics and document/query understanding will be more important in the future. pavlin, I hope that partly answers the second of the two questions that you posted way up near the start of this thread. If not, please ask it again in case I didn't understand it correctly the first time. :)

tigger

8:34 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

try webmaster@google.com and put webmaster world in the title

Marcia

8:36 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Here's the report, Jens

http://www.google.com/contact/spamreport.html

or write webmaster(at)google.com

As long as 64. is staying stable I'm OK to wait, the others are just too surrealistic.

For LSI, this is what I've got bookmarked

[javelina.cet.middlebury.edu...]

[edited by: Marcia at 8:40 am (utc) on Feb. 17, 2004]

BeeDeeDubbleU

8:40 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Why are you all still rattling on about 64 and 236 when you were requested to stop on more than one occasion and what good does it do? This could go on for a few more days yet so there is absolutely no point in this!

Do you realise how annoying it is to get notifications in and check the results just to see more insignificant nonsense from you numbers game punters saying I got this and I got that?

Be considerate to those of us who don't give a toss what you are seing in beautiful downtown Burbank. I'm getting angry ;-{

steveb

8:44 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

"This could go on for a few more days yet so there is absolutely no point in this!"

Um, try to keep up with the program. Nobody is talking about that.

It seems the shakeup has settled down now, temporarily at least. The only lasting effect I'm seeing is that a lot of fresh piddle was introduced, and the results have degraded somewhat.

Maybe it was just introducing fresh pages before moving 64 over, but I sure hope they don't do that again anytime soon. That was genuinely scary.

tumpy

8:47 am on Feb 17, 2004 (gmt 0)

10+ Year Member

Well, at last the 64..serps are being reflected in www3 datacenters! Anyone else seeing it as well?

NeverHome

8:49 am on Feb 17, 2004 (gmt 0)

10+ Year Member

With respect BeeDeeDubbleU, what just took place was not simply "rattling on about 64 and 236". Something quite strange flashed passed a few of us, and it certainly was worthy of comment. Enough said. :)

BeeDeeDubbleU

8:50 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Aaaaaarghhhh!

:->

Crush

8:51 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Looks like today is the day or if you are in the states tonight is the night.

Finally I can see on-in and -cw the 64... changes. Also I confirm what you can see tumpy on www3. So looks like it will finally be reality.

vrtlw

8:55 am on Feb 17, 2004 (gmt 0)

10+ Year Member

That was genuinely scary.

It certainly was and I would also like to mention that Brett requested quite categorically not to turn on notifactions for the update threads. Even if not for your own personal sanity then for the sendmail on the server.

quotations

9:05 am on Feb 17, 2004 (gmt 0)

10+ Year Member

>Something quite strange flashed passed a
>few of us, and it certainly was worthy of comment.

Some of us must have commented too early and those comments got deleted. This was not normal fluctuation and was not anything like 64, 216, www. or anything else ever seen. It was as if most of the algo and all of the filters had been turned off.

Single IP addresses were fluctuating wildly, giving different results every time you hit the refresh button.

tigger

9:26 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

>>Do you realise how annoying it is to get notifications in and check the results just to see more insignificant nonsense from you numbers game punters saying I got this and I got that?

Beedee

you were told "not" have email notifications on this thread as it would be a large one, it's your choice to look at this thread

Powdork

9:31 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Maybe it was just introducing fresh pages before moving 64 over, but I sure hope they don't do that again anytime soon. That was genuinely scary.

Scary, huh? Now you have a small taste of what it's like when things that used to work don't "get the same credit anymore".;)

steveb

9:43 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

No Powdork, not at all the same.

The next time someone posts about "why doesn't Google do this algo change/update out of the public eye" some of us will now have an idea of what they probably see privately before they show it to us.

Napolean or Everyman would have croaked if they saw that.

Hissingsid

9:43 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Someone asked for a link to the Latent Semantic Indexing paper LSI Paper [javelina.cet.middlebury.edu]. There you go.

You can also find the CIRCA semantics paper salted away if you know where to look ;)

A brief and very simple summary of what I IMHO think this has to do with this forthcomming (can't come soon enough for me) update. Think of the analogy of finger print analysis. The analyser only looks at ceretain types of feature, whirls, intersections, branches etc and marks their location. The analyser ignores all of the straight uninteresting lines that every finger print has on it. Latent semantic indexing does the same with words, it ignores all of the straight forward words and concentrates on the words that have real meaning. The CIRCA Ontology defines the closeness of match of these words and creates a single statistical vector for each page. The Google algo uses this as a contributor to the SERPs.

The signs are that htis overwhelmed the "old" part of the algo in Florida and to a greater extent in Austin. Now in my opinion either they have, through a process of trial and error, removed or added back an extra feature into the semantic analysis or they have up-weighted part of the old algo designed to bring back the micro relevant sites. Whichever way they have done it, it has worked pretty well in some areas.

I'm becomming convinced that the same technology is spotting dupes. If two pages have the same vector they are the same. Since latent semantic indexing aims to throw out things that don't help it to compare a group of documents, I guess that the first thing it would throw out is duplicates. Too bad for folks on servers that serve up the same pages on www and non-www versions of their domains. I think that this explains the unexplained complete drop from SERPs of previously high ranking pages since the Florida update and possibly before.

The Brandy update adds in or takes out a minor ingredient but LSI/CIRCA is a big part of the recipe.

Best wishes

Sid

BeeDeeDubbleU

10:05 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Another great contribution Sid! (assuming that you are sure that what you are saying is correct :-)

<Too bad for folks on servers that serve up the same pages on www and non-www versions of their domains.>

Unfortunately I was one of these sites that lost all ranking but I have just installed a 301 redirect and hopefully this will get me out of jail. Has anyone who suffered a similar fate as a result of Austin/Brandy recoverd yet? If so was it done through a 301 and how long did it take?

This 327 message thread spans 22 pages: 327

«
1
...
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
»