| 1:35 am on Nov 1, 2002 (gmt 0)|
I've read numerous bits that people know where, when, how, and by whom their sites are being spidered. How does one know and monitor this information?
| 4:07 am on Nov 1, 2002 (gmt 0)|
By reading the server logs for the domain in question. Most decent web hosting packages will provide raw logs (a line-by-line list of every request for a document, image, script, etc. requested by a browser or spider) and also some sort of log analysis tool that takes those raws logs and crunches them into more usable forms, such as "List of browsers used by visitors", "List of referers", etc.
Sites hosted on "freebie" accounts like GeoCities usually have no logs available. An alternative is to install a link on your page(s) to a remote "hit counting" and logging service. These hit counters will log basic summary information about visitors to your site. Some are free, and some charge a monthly fee. An example would be webstats (do a search for many more). There may be privacy issues involved for both you and your vistors - I would urge you to read the Terms of Service thoroughly before signing up with one of these services.
| 11:14 pm on Nov 1, 2002 (gmt 0)|
OK, I've got the server logs. I see the IP's but don't know how to tell who they are.
| 11:28 pm on Nov 1, 2002 (gmt 0)|
There's a good resource over on Search Engine World, WebmasterWorld's sister site:
Also, see the Spider Knowledge Base:
| 11:49 pm on Nov 1, 2002 (gmt 0)|
Great, I've got it. But is there a way to automate the search in these large daily log files so to find and report on spider activity? The web stats package that is available to me (DeepMetrix LiveStats) seems not to report on crawling, but every other kind of stat imaginable.