Forum Moderators: open
If you want an easy-to-use program there is one called Web Site Traffic Analyzer. It will set you back about a 100$. You just dump in as many log files as you want and it will sort everything for you.
Since most spiders will request a robots.txt file, I'll first click the bar next to the robots.txt listing, so I get a list of all the machines that looked at it... then I can scroll down and see which pages each of those machines looked at, by clicking on the bars next to their names.
Then, for any oddballs that didn't request a robots.txt, you can click on any odd looking user agents, and see what machine name/IP they were from, etc. etc. Makes correlating the basic info very easy.
All of you who didn't yet, also look at a post I made in [webmasterworld.com...] , regarding the impossibility to tell how many hits you have, and how many of them are spiders.
Very true... even log analayzers that distinguish between "hits" and "visits" are far from perfectly accurate.
However, since most major SEs fall under your description of well-behaved spiders, a good log analysis program can be quite useful in determining which major SEs are actually visiting your site, what they're fetching, etc...
And beyond the major SEs, I don't pay too much mind to other spiders unless they're grossly misbehaving, so log analysis programs are fairly useful for my purposes.
The 'no referrer' log will log everything that does not have a HTTP_REFERER.
The 'human no referrer' is the same list but screened against my list of known spider IPs.
Often I ftp up to my servers through a browser, and view the logs that way. I'll use Netscape's Find feature to search for keywords - such as the spider UA's, IP blocks, and REMOTE_HOST strings. I'll also visually go through the logs and look for patterns. This type of attention to detail is necessary if you are going to play the cloaking game.
<edit>Bad punctuation</edit>
Edited by: littleman
You can get it at
[awsd.com]
[am-soft.ru...]
I currently just use Notepad/Wordpad daily. At months end Analog is used to compile stats. It is quite configuareable.
Some of the free Java counters can be used effectively to compile stats.
I have sitemeter on my pages which offeres online views.
This is a good beginning point for Spider ID.
Apparently Bots are supposed to regsiter here under some RFC compliance.
This page doesn't get updated very often and there are better resources.
[robotstxt.org...]