Forum Moderators: bakedjake

Message Too Old, No Replies

The Grub Bot ( Looky )

On a mission

         

xcandyman

2:37 pm on Apr 16, 2003 (gmt 0)

10+ Year Member



The grub bot has been on a major mission on my new site spidering everything in its path even highly dynamic pages with long strings even google wouldn't look at twice.

Looks like the client has been taken up well went on grub.org recently and havnt seen it drop below 1000 concurent connections.

Anyone had this or have you all banned grubby?

Thanks

Steve

korkus2000

2:40 pm on Apr 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I know LookSmart just bought them. It could be a LS org spidering you, don't know. What is the IP?

pendanticist

3:53 pm on Apr 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>Anyone had this or have you all banned grubby?

I'd suggest doing a Site Search [webmasterworld.com] for either: Grub, Grub.org or grub-client.

There was a thread (couldn't be more than a weeks ago) in which I posted a compilation of the most signifcant posts regarding Grub.

You may recall, I 'Titled' them as the threads themselves were originally titled.

If anyone finds it, or knows where it is, could you post it please? Might save xcandyman from doing some digging.

Thanks.

Pendanticist.

Brett_Tabke

3:56 pm on Apr 16, 2003 (gmt 0)

xcandyman

4:12 pm on Apr 16, 2003 (gmt 0)

10+ Year Member



Thanks guys

I was just wandering the ratio of people blocking grub and who is allowing it through. Also if they are letting it through how vicious is it acting with them?

Thanks

Steve

jeremy goodrich

4:15 pm on Apr 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



pendanticist, here is the post [webmasterworld.com] you were referring to about Grub. (messge #19)

pendanticist

4:25 pm on Apr 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thank You! jeremy goodrich. Now xcandyman can peruse to his/her hearts content. :)

Now, to answer xcandyman: Me no like Grubs. The thread above will go into the details.

I couldn't begin to speculate as to ratio, but it seems to me as though the general concensus (here at WebmasterWorld) is negative and for a variety of reasons.

Pendanticist.

stechert

5:38 pm on Apr 16, 2003 (gmt 0)

10+ Year Member



Hi,

Given that the number of clients has grown by a factor of 20 in the last 4 weeks, we think the reception's been remarkably good.

Re: changes folks would like made to the infrastructure or algorithms, we have open forums for discussing what's appropriate or not (won't post link -- you can find the Grub forums easily enough if you want to). In particular, Pedanticist has a few pet features that he likes (e.g., "403 should mean never come back" vs. restrict the path via robots.txt like the rest of the world does) and for which he has openly criticized grub at every opportunity. So, we'd like to once again extend the invitation to explain why things are that way (consider, e.g., whether or not you've ever accidentally misset permissions on a web page which then came up 403 -- would you really want spiders to never come back? -- this being said, I wasn't sure whether or not the 403 was for regular page or for a robots.txt file, in which case it should be considered totally off limits) and probably should continue to be...we're also open to discussing the concerns and whether or not there are other solutions that address those concerns in addition to the currrent requirements.

Also, if anyone's having issues with the pace at which grub is crawling them, please let us know. We take the operation of this crawler very seriously and will aggressively tackle any problems that are uncovered.

Cheers,
Andre

jomaxx

7:30 pm on Apr 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What exactly is the benefit of the P2P model in grub's case? Don't all the pages have to get sent back to Looksmart anyway (possibly in compressed or predigested form)? How does this not use up more bandwidth overall than conventional SE spiders?