Forum Moderators: DixonJones

Message Too Old, No Replies

Tracking googlebot, slurp, etc.

What are the best free programs?

         

hdpt00

9:24 pm on Jun 27, 2004 (gmt 0)



I always here about people complaining googlebot isn't going deep enough, etc. Are there any free programs that read raw stats that will tell you how deep googlebot goes? Currently I use AWStats and Analog and neither of them seem to tell me.

Thanks!

nalin

2:53 pm on Jun 28, 2004 (gmt 0)

10+ Year Member



I use awstats (for a monthly cumulative count) as well as a (linux or unix based) cron job and custom script to check daily spidering (reports urls of the pages spidered, total number of pages, and total number of unique pages).

The script (googlebot_totals.sh user=root, perm=700) is as follows:


#!/bin/bash
LOG_FILE="/var/log/httpd/$1/access_log"
SED_URL="http:\/\/$1"
echo "GOOGLEBOT REPORT FOR $2/$3/$4"
echo "UNIQUE FILES VIEWED"
grep $2/$3/$4.*googlebot $LOG_FILE ¦ sed -e 's/^.*GET /'`echo ${SED_URL}`'/' -e 's/HTTP.*$//' ¦ sort ¦ uniq
echo -n "TOTAL HITS:"
grep $2/$3/$4.*googlebot $LOG_FILE ¦ uniq ¦ wc -l
echo -n "TOTAL UNIQUE HITS:"
grep $2/$3/$4.*googlebot $LOG_FILE ¦ sed -e 's/^.*GET//' -e 's/200 [0-9]*//' ¦ sort ¦ uniq ¦ wc -l

it is called as follows (daily count):
googlebot_cron.sh www.site.tld `date +"%d %b %Y"`

(monthly count):
googlebot_cron.sh www.site.tld ".." `date +"%b %Y"`

You could also extend it to yearly, days ending in 7 etc by making liberal use of regular expressions

the cronjob should be something like
[code]59 23 * * * /path/to/googlebot_cron.sh www.site.tld `date +"%d %b %Y"`[code]

The use of date (combined with log-rotation) give you a real narrow window for accurate results.

Licensing Information:
Provided as is with no warranty. Commercial modifications and redistribution explicitly forbidden, please sticky me if you wish to commercially use or redistribute this code and I will properly release it under GPL.