Forum Moderators: open

Message Too Old, No Replies

Why is important to know which spiders

are visiting?

         

dauction

4:23 pm on Oct 12, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Never paid much attention..I do understand that you dont want them eating up bandwidth ..

I also understand you need like some so you can be indexed etc..

What types of damages can spiders inflict?

What are the main concerns?

SmallTime

6:05 pm on Oct 12, 2002 (gmt 0)

10+ Year Member



1. email address harvesting for spam purposes (very common)
2. site downloading for copying purposes (offline browsing is arguably a valid use)
3. variety of non-browsing uses you may object to for political or practical reasons, anti-plagarism by student bots, music or image harvesting, commercial intelligence databases, etc.

Dreamquick

8:18 pm on Oct 12, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



could also add

4) spot when search engines start using new systems (e.g. if an existing bot comes around but with PDF appended to its UA you have to figure they are planning to search pdfs sometime soon)

5) spot when totally new search engines are starting up

Either of these might potentially require some work - either to make the most out of the new features or with new search engines it's always useful to see if its worth trying to make use of them, and how you rank in them (for all you know they could be the next google).

dcheney

8:46 pm on Oct 12, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



6) help spot (and hopefully fix) brain-dead spiders [one current example: 13k requests, of those 193 were valid! (spider can't handle subdirectories)]

dauction

10:24 pm on Oct 12, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I see ..thanks all

JustTrying

10:33 pm on Oct 12, 2002 (gmt 0)

10+ Year Member



7) to gather important spider information for those involved in cloaking

carfac

3:42 am on Oct 13, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



8) Because we can!

dave

Sorry, not to be flip, but some of my sites are pretty high (1gig/day+) bandwidth, and I need to save where I can!