Forum Moderators: open

Message Too Old, No Replies

A Sneaky Spider

what the heck is compete.com spidering for?

         

idiotgirl

12:24 pm on Sep 5, 2001 (gmt 0)

10+ Year Member Top Contributors Of The Month



I suddenly got some spider crawls from compete.com:

64.211.63.249 "larbin_2.2.0 (crawl@compete.com)"

I haven't seen these before. Has anyone checked out their web site? Does anyone know if they're just a generic crawler or if they must be requested (presumably by your competition) to crawl your site? I wonder what makes this crawler so super-special unless they're just gathering statistics for people who will pay for them.

Just wondering if anyone else gets hit by this one.

Idiotgirl

Will

1:11 pm on Sep 5, 2001 (gmt 0)



As far as I am aware "larbin" is a generic crawler - a couple of months back it visited from "para.inria.fr"

A look at their web site reveals this to be the home of "The Moscova Project":

"interested in the development, the compilation and the semantics of concurrent and functional languages for distributed environments, with possible migrations."

Hmm.

Macguru

1:14 pm on Sep 5, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Idiotgirl,

larbin is a freely available spider that run on Linux, it is not designed to index pages but anyone can use it or configure it do whathever they want to. Are your pages XML?

[pauillac.inria.fr...]

Compete.com seem to run some market study on your site topic. Nothing to worry about IMHO.

idiotgirl

10:14 pm on Sep 5, 2001 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks! I did an IP check last night and it came back as idealab.com that works with goto, petsmart, etc. collecting data and other marketing stuff. Went through compete.com's web site and was turned off. While the data collection for comparison is surely an upscale marketing ploy - I don't feel like offering any data to them. The two sites crawled (one is only data storage and scripts and one an active site) are ranking in the top ten in most search engines, so I'm not concerned about catering to compete.com's crawler.

Verdict: banned it.

Thanks again,

Idiotgirl

Brett_Tabke

9:04 am on Sep 6, 2001 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Idealab started Goto and Bill Gross is still a major stock holder. Have you got any Goto keywords boughten?

idiotgirl

9:18 am on Sep 6, 2001 (gmt 0)

10+ Year Member Top Contributors Of The Month



Dear Brett:

Nah. Didn't buy keywords and these aren't XML sites. One is almost totally database driven and not even really a public site. The other is pretty much vanilla html, uses frames (do I hear flames?), css, js, cgi, ssi, server-embedded commands, and has a lot of internal and external links and directories. The latter is the one that's generally in the top ten for the most commonly searched key words for that particular online niche.

After reading about compete.com's lineup of services, I was curious whether they are simply randoming crawling the net to gather data so it's handy for if and when a compete.com client requests that particular type of data OR if they only crawl sites that a client has requested the ability to track. My guess is a random crawler.

However, if the crawls are done at the request of a 'competitor', I say ban the bot and let the competitior figure it out for them self. It's not like the pages haven't already been indexed by other crawlers. The info's out there if they bother to look. It isn't my job to serve it up on a plate to them :)

Idiotgirl