Forum Moderators: bakedjake
[grub.org...]
The real question is what are they going to make of it.
Permanent crawling of the web is a good basis for the next three steps, nothing more, nothing less.
They have some amazing things going on, even the Microsoft research is fascinating.
Grub is a neat toy - for grabbing web pages, as heini said. The next steps are the crucial ones - what do you with all that data.
As far as I know (spent some time reading their forums the other day at grub.org) they don't have the ability for one grub client to communicate with the next.
A few years ago, I designed a multi agent system which accomplished some very similar tasks, but creating a system such as that without adequate agent communication creates a ton of redundancy, and rework that does not need to happen.
Andre Stechert admitted as such in that many clients will be given the same task list. As there is no 'trust' built into the framework, the efficiency gains for their system are NOT maximized.
However, even with the lack of a maximum efficiency in their crawling algorithms :) there are still those other problems to be solved.