Forum Moderators: open

Message Too Old, No Replies

LookSmart - get your site listed

Any comments on this idea put forward by looksmart

         

SinclairUser

2:30 pm on Apr 14, 2003 (gmt 0)

10+ Year Member



Does anybody have any thoughts or comments about the new LookSmart distributed spider. Here is a section from an email I got:

It is based on open source software and is made available to all to distribute and even modify! Since installing "Grubby" on my own PC three days ago, I've personally contributed over 100,000 URL's to the LookSmart database! ...

...There are no drawbacks here, just benefits. I help "Grubby" to crawl the web for LookSmart and Grubby helps me to get my own sites listed there by agreeing to crawl them regularly.

Any help would be appreciated.

SEO practioner

3:18 pm on Apr 14, 2003 (gmt 0)

10+ Year Member



Sinclair user hi and welcome to WW

I'm not sure what to make of this except to ask you: are you 100% certain of the source that provided you this information? It's the first time I ever heard of this from anybody.

martinibuster

3:25 pm on Apr 14, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



This was in c-net last month. Mult1-page thread below:
[webmasterworld.com ]

If you do a site search you'll find a bunch of others.

SinclairUser

4:15 pm on Apr 14, 2003 (gmt 0)

10+ Year Member



Hmmm,

There seem to be a few unhappy people on the GRUB forum saying that the bot does not handle robots.txt file correctly. As this is distributed technology do I want to be crawling peoples sites uninvited?

Perhaps when the iron out the problems it will be worth at look.

stechert

5:50 pm on Apr 16, 2003 (gmt 0)

10+ Year Member



The robots.txt handling in grub is done on a periodic basis, just like with most other production crawling infrastructures.

We're moving it to be closer to realtime, but in the interim, we have a web page you can go to that will refresh your robots.txt entries on demand on the server side. That will then adjust our URL scheduling (though there may be work units already queued and/or distributed).

Cheers,
Andre

Napoleon

6:09 pm on Apr 16, 2003 (gmt 0)



>> Perhaps when the iron out the problems it will be worth at look. <<

But then again... it's Looksmart... never forget. Keep your wallet TIGHTLY closed and expect the worst.

NFFC

6:09 pm on Apr 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the input Andre.

>robots.txt handling

It is critical that it is handled correctly, make or break for most new services. I would bump it up to the top of the priority list if I were you. ;)

jrobbio

8:16 am on Apr 21, 2003 (gmt 0)

10+ Year Member



Apparently the server was becoming so overrun that it wasn't finding time to crawl the robots.txt so by the end of the week they will have something like 25 servers instead of the 8 they currently have. It also said on the grub site that they are aiming to check the robots.txt roughly once a week.

I foresee one problem though as far as I can see, it won't be the client that checks the robots.txt, it is the Looksmart machines that will do that and since its so new and relatively unknown, people won't realise it is adhering to the robots.txt and ban it in an instant.

Currently it is using the Wisenut database to give it a headstart, but I don't think it is documented well enough that this is the current procedure.

Rob

SinclairUser

10:57 pm on Apr 25, 2003 (gmt 0)

10+ Year Member



<apologies>I just read the TOS and it says you should not post bits of e-mails here - sorry</apologies>

I had no alterior motive...