| 9:04 pm on Jul 27, 2003 (gmt 0)|
Urchin just includes them as surfers and counts them for hits and visits; at least the versions ISPs give you. Some programs such as WebTrends have filters you can use to include or exclude in this case. That lets you exclude a host ID like googlebot. In this case you must have a separate profile without the exclude filer.
| 6:12 am on Jul 28, 2003 (gmt 0)|
So how do I create a separate profile? Is it some kind of a configuration file? I'm using the ISP version of Urchin.
| 2:00 pm on Jul 28, 2003 (gmt 0)|
Urchin lets you set exclude filters too, though you need an admin account to do it. Maybe you could ask your host to set one up for you?
| 3:07 am on Aug 2, 2003 (gmt 0)|
I don't have administrative priveleges. Can I not download the raw logs and run Analog on my client machine? Does Analog separate the wheat from the chaff (the spidering bloat)?
| 3:21 am on Aug 2, 2003 (gmt 0)|
Sure, as long as you can download your log files, you can run Analog on your local machine. Analog can definitely separate out the spiders, either by user-agent or by IP address. It does take a bit of jigging with the config file to get set up, though, and some regular minor maintenance to keep the spider list up to date.
| 8:36 am on Aug 3, 2003 (gmt 0)|
Thanks, Peter. I assume I'll need to add the HOSTEXCLUDE command to the Analog config file. Now from where do I get a complete spider list (IP adress and/or host name)?
| 5:51 pm on Aug 3, 2003 (gmt 0)|
Yeah, I use both BROWEXCLUDE and HOSTEXCLUDE, but I rely more on BROWEXCLUDE. (It looks to me like spoofed UAs make up only a tiny fraction of total traffic, although there are occasionally obvious spiders with faked or empty UAs that you have to filter out by IP address or domain name).
I don't know where you can download a complete list--I have just built up my own over time. Usually I run Analog, look through the Browser Summary for UAs I don't like, copy them from the report and paste them right into the exclude list, then re-run Analog.
Over the last year, I've added about 200 entries to my exclude list (which, for convenience, is in a separate config file that I call from the main config file).
I'm sure I'm missing some low volume bots that grab one or two pages here and there, but the big ones make up most of the spider traffic anyway. The little ones are just statistical noise. Server log file analysis is an inherently imperfect exercise anyway!
Hope this helps,
| 6:01 pm on Aug 3, 2003 (gmt 0)|
Oops, forgot to mention--if you also do FILEEXCLUDE for things like
etc., you'll filter out all that virus and scanning junk, which doesn't give you any useful information.
| 7:41 pm on Aug 5, 2003 (gmt 0)|
Thanks for all the info - Peter