Forum Moderators: open

Message Too Old, No Replies

Update Brandy Part 3

         

GoogleGuy

7:41 pm on Feb 15, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Continued From: [webmasterworld.com...]

"Any clue as to the possible role greater reliance on semantics is playing in your never ending quest for more relevant results?"

I'd say that's inevitable over time. The goal of a good search engine should be both to understand what a document is really about, and to understand (from a very short query) what a user really wants. And then match those things as well as possible. :) Better semantic understanding helps with both those prerequisites and makes the matching easier.

So a good example is stemming. Stemming is basically SEO-neutral, because spammers can create doorway pages with word variants almost as easily as they can to optimize for a single phrase (maybe it's a bit harder to fake realistic doorways now, come to think of it). But webmasters who never think about search engines don't bother to include word variants--they just write whatever natural text they would normally write. Stemming allows us to pull in more good documents that are near-matches. The example I like is [cert advisory]. We can give more weight to www.cert.org/advisories/ because the page has both "advisory" and "advisories" on the page, and "advisories" in the url. Standard stemming isn't necessarily a win for quality, so we took a while and found a way to do it better.

So yes, I think semantics and document/query understanding will be more important in the future. pavlin, I hope that partly answers the second of the two questions that you posted way up near the start of this thread. If not, please ask it again in case I didn't understand it correctly the first time. :)

WebmasterFisherman

10:09 am on Feb 17, 2004 (gmt 0)



Anyone could you please give a full list of 64 ips to check:

is there are just 2 of them: 233.161.104 and 233.161.99?

Hissingsid

10:30 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



64.233.161.98
64.233.161.99
64.233.161.104

I'm about as sure as I am that the World is not flat and NASA filmed the Lunar landings in the Nevada desert ;)

Oh and Googleguy confirmed that, to paraphrase, "they have found a better way of doing semantic indexing". If it walks like a duck, quacks like a duck and the best ornithologist you know says its a duck, I think its afe to assume that its a duck. Now we know its a duck we can assume that it likes splashing about in ponds, the rain, quacking outrageously at duck jokes etc.

If they are not using LSI to spot dupes what technology do you think they are using?

Best wishes

Sid

edit reason: This CGI is screweing up my posts again

adfree

10:36 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Marcia, Sid, thanks for the details and your analysis, this presents food for thought and proves that G is again ahead of the game.
Jens

SyntheticUpper

10:40 am on Feb 17, 2004 (gmt 0)

10+ Year Member



Hi Sid,

To spot a dupe, both pages would have to show up in the exact same vector position. A single additional token word recognised by the semantic indexing would move the dupe site to a different position in the vector space. Also, with pages containing a very small number of token words, it's not inconceivable that two totally different pages might occupy the same position in the vector space. Just my 2 cents worth, but I'm not sure LSI could esily be used for dupe content spotting.

wine_guru

10:46 am on Feb 17, 2004 (gmt 0)

10+ Year Member



here in UK I'm seeing 64.**** data on www2 and www3, but not yet on .co.uk. It wasn't there when I last looked late last night, so maybe it's moving over during today. Hope so :)

andy_boyd

10:57 am on Feb 17, 2004 (gmt 0)

10+ Year Member



Steady 64 on www2 and www3 from Northern Ireland, not yet on www.google.com. Looks like it will be rolling out today. :-)

wine_guru

11:01 am on Feb 17, 2004 (gmt 0)

10+ Year Member



" rolling out today - hope so :)
Right, back to the salt mine and writing our new wine list as there's not much point in doing too much else until this settles down

Just Guessing

11:02 am on Feb 17, 2004 (gmt 0)

10+ Year Member



I would say that changes to duplicate content spotting has been a big part of both Austin and Brandy.

It also seems to me in some cases that the surviving page gets a boost in the rankings from eliminated pages with duplicate content that link to it. I would guess this would only be the case if the pages are not seen as affiliated.

Does this fit with anything anyone else is seeing? - Sid?

George Abitbol

11:03 am on Feb 17, 2004 (gmt 0)

10+ Year Member



64 showing up on www2 and www3 in france
Also on www-in and www-cw

Tiebreaker

11:10 am on Feb 17, 2004 (gmt 0)

10+ Year Member



Can someone else compare their results on 64 with their results on google.ca?

My results on 64 are great - but on google.ca they are even better still!

I'm thinking that maybe canada has the 64 results, but with the benefit of backlinks added or something - it's been like that consistently for the last couple of days

mbauser2

11:13 am on Feb 17, 2004 (gmt 0)

10+ Year Member



Google has put to much weight on page linking. While so many webmasters have purchased links on high pr rankings sites just to get there page ranks higher, does not mean they have a high quality site. Good quality site should have nothing to do with who is linked to you.

*Sigh* The Same Old Delusion returns. Sometimes, I hate new users.

So you think it's unfair to use a system that takes into account multiple opinions about your site, and that it would be more fair to switch to a system that only uses one opinion of your site? Because that's what you get if you throw away citation analysis: An engine from the bad old days, when everything depended on The Secret Algorithm, and we had absolutely no chance of recognizing or resisting arbitrary filters. Anonymous programmers decided what was important to everyone.

It's truly frightening how many webmasters cry out for a return to search engine dictatorship whenever democracy fails to give them what they want.

WebmasterFisherman

11:15 am on Feb 17, 2004 (gmt 0)



Results on .ca, www2, www3 are DIFFERENT from all I see on 64 for some of my terms. Some are the same others aren't.

It's not rolled in yet at least not entirely

Netzen

11:24 am on Feb 17, 2004 (gmt 0)

10+ Year Member



Seeing 64 results in on www2 and www3 google.com in Germany.
What is www2? A Backup Server?

Hissingsid

11:52 am on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Does this fit with anything anyone else is seeing? - Sid?

Not sure.

Just to clarify something. Naive Bayes = simple page semantic analysis. Things like spam filters on email progs.

Latent semantic indexing = much more accurate.

CIRCA = several orders more accurate than LSI because of its huge Ontology

CIRCA + Google = killer solution. Add what Google knows about pages to what CIRCA senses about pages and linked pages and you should have a very accurate system for SERPs and spotting dupes.

Re Dupes: Its not just one measure thats used. In fact it could be a cascade. If 95% plus certain of dupe then cross reference other algo components.

LSI is like an evolutionary step on the way towards what Google is (in part) implementing now as PART of its algo. If you understand something about LSI then you start to understand what is going on in SERPs.

Many of the papers on LSI and similar analysis methods talk about the use of training sets of data to teach the algorithm right from wrong. I wonder if this is what we are seeing now, ie Google/CIRCA gets to forth grade. If that is the case then this is the first of a much improved implementation of the new technology and it could get better with each update. What a shame "better" is such a subjective word and "ones mans meat is another mans poison".

Best wishes

Sid

Just Guessing

11:54 am on Feb 17, 2004 (gmt 0)

10+ Year Member



It's truly frightening how many webmasters cry out for a return to search engine dictatorship whenever democracy fails to give them what they want.

In this democracy, those that got the vote (PR) in the last election, get to choose who wins in the next election - that's quite often how dictatorship starts.

This 327 message thread spans 22 pages: 327