Google just the beginning

Forum Moderators: open

Message Too Old, No Replies

Google just the beginning

New projects at Stanford

JamesR

10:24 pm on Jul 17, 2000 (gmt 0)

Just got in from wading the papers from the big brains at Stanford Digital Library [www-diglib.stanford.edu]. Looks like Google is just the start of the next generation of search engines. Google's method of filtering by page value (link analysis) is just one method and the start of a whole lot more. They refer to Google as just a "spin off" company from this much larger and ongoing research project. Considering the mulah Sergei and Larry are enjoying, I don't doubt current students are seeing a future payoff for their hard work. With the resounding success of Google, I believe this type of indexing and sophistication is becoming or has become the industry standard. SE's will need to pull engineers from this level of noodle power in order to compete.

Edited by: JamesR

Brett_Tabke

2:27 am on Jul 19, 2000 (gmt 0)

Yes, there is a small little core group of about 12 that are completely into it. There are so many ways they can dice and slice accumulated data that are yet untried. The only problem? None of this research has put out a definative search engine worth using that is any better than say Alta or even old style Fast/Alltheweb. Google results in their own way, are just as spammy as some of the lesser engines.

tedster

8:39 pm on Apr 1, 2001 (gmt 0)

>> Google results in their own way, are just as spammy as some of the lesser engines.

On another thread, a paper was referenced on the Voting Model for Ranking Web Pages [citeseer.nj.nec.com] It's a good read. Google's model -- the random walk -- is a subset of this Voting Model. I understand that the paper's author, Maxim Lifantsev [cs.sunysb.edu], is contributing to the Google brain power. He's also involved with the GRiD Project [webmasterworld.com], mentioned on another thread here.

Here are some intriguing passages:

In this model we consider nodes (collections of Web pages authored by the same entity) and their connectivity, rather than just individual pages and links between them. In practice the borders of such nodes can be estimated using simple heuristics analyzing the structure of the URL's of the pages...
The voting model also allows for very strong search engine persuasion (SEP) protection: the only source of SEP is the votes originating from fake nodes created by the same physical entity. Since the only place where votes are created is when we consider how many nodes we have, it is easy to hunt down SEP attempts because, in order to have noticeable influence, a SEP attempt must have an abnormally high number of fake authors hosted on Web servers with the same or similar IP addresses.

pete

9:16 am on Apr 4, 2001 (gmt 0)

Agree with you brett that Google with all their PRO and R&D arent that much ahead of the game! Infact take away their over reliance on ODP data and not sure where they would be in the relevancy count.

From Tedsters post:
"...hunt down SEP attempts because, in order to have noticeable influence, a SEP attempt must have an abnormally high number of fake authors hosted on Web servers with the same or similar IP addresses"

That would definitely smoke us if they got it right!

JamesR

4:16 pm on Apr 4, 2001 (gmt 0)

I think the power of Google lies in the potential. They have a good techno infrastructure, smart spidering, and some of the brightest minds in the business. They are also obviously devote to R&D which will continue to take them to the next level (which they did with link analysis). I don't see anyone in their playing field besides Alta which has had trouble converting theory into solid results.