Well, I'm a newbie on the forum, too, but I guess I'll take a crack at this. Basically, a spider, is in most cases, identified by it's User Agent, IP address, and/or host name. For example.
Inktomi sends spiders out under different Hostnames like si3000.inktomi.com, and si4001.inktomi.com, and many others. Of course these can be associated with their IP address.
The User Agent for Inktomi is usually called Slurp such as (Slurp/si; email@example.com; [inktomi.com...]
or Slurp/2.0-Owl_Weekly_Temp, Slurp/3.0-c, and so on.
The trick is to find this stuff in your access logs. If you're running a standard Linux/Apachee setup your user logs could be found in /www/logs/your-domain-access-log The access log can be rather large and hard to deal with. Instead I use a tail command based on how many hits I want to look at, or grep, if I know who I'm looking for, and pipe the info to a text file for later viewing.
But even easier than that is to use a tracking system that gets the header information from your environment table and writes it to a log. That's the same thing that's happening to your access log, but this is different in that it's more manageable. You have much better management over log files created by a tracking system like Axs, than you would over your servers log files. It's best not go pruning them anyway and to leave them to be tared up by the system. With a tracking system I can decide which pages I want to track, unlike my servers access log which is tracking every hit.
Keep visiting the forum here and you'll learn much. There's a great site search feature at [searchengineworld.com...]
some nice tools at [searchengineworld.com...] and a lot of spider information is located at [searchengineworld.com...] You'll also want to get something for doing home name lookups, IP blocks, etc.
Hope this helps.