Forum Moderators: bakedjake

Message Too Old, No Replies

Freshbot v Grub

         

MyWifeSays

1:47 pm on May 10, 2003 (gmt 0)

10+ Year Member



If this catches on there's only one winner in my book.

[grub.org...]

heini

1:54 pm on May 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It is an interesting concept, which leave us with a lot of questions to be solved.
Basically the decentralized, shared crawling of Grub addresses just one of the 4 basic parts of websearch:
1 Crawl
2 Index
3 Rank
4 Present

The real question is what are they going to make of it.
Permanent crawling of the web is a good basis for the next three steps, nothing more, nothing less.

jeremy goodrich

8:31 pm on May 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My weekend project yesterday / today has been to go over some PageRank research from Google / Stanford / and Microsoft. :) I've got a mathemetician coming over to help me on some of the finer points.

They have some amazing things going on, even the Microsoft research is fascinating.

Grub is a neat toy - for grabbing web pages, as heini said. The next steps are the crucial ones - what do you with all that data.

As far as I know (spent some time reading their forums the other day at grub.org) they don't have the ability for one grub client to communicate with the next.

A few years ago, I designed a multi agent system which accomplished some very similar tasks, but creating a system such as that without adequate agent communication creates a ton of redundancy, and rework that does not need to happen.

Andre Stechert admitted as such in that many clients will be given the same task list. As there is no 'trust' built into the framework, the efficiency gains for their system are NOT maximized.

However, even with the lack of a maximum efficiency in their crawling algorithms :) there are still those other problems to be solved.