Forum Moderators: DixonJones

Message Too Old, No Replies

Monitoring spiders on my site

         

Johny Favourite

11:06 am on Feb 23, 2004 (gmt 0)

10+ Year Member



I've just spend the last 2 hours downloading log files and reading on here about the above yet I'm still in the dark how I can monitor spiders.

I've downloaded weblog expert lite and analog and also awstats.

All of the above seem to tell me exactly what live stats does and thats all.

I'm basically wanting to know if google is spidering my site and also if it's going past the index page as previous I had a java/form navigation and now I have a simple link navigation.

Can anyone please point me in the right direction?

Many Thanks!

bakedjake

11:42 pm on Feb 23, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Easiest Way:

Import your log into Excel. Use the text->column feature to get the fields sorted properly. Sort by IP address. Filter by a list of known bot IP addresses (try iplists.com).

You now have a list of pages complete with data and time that a particular bot hit.

dcrombie

12:06 pm on Feb 24, 2004 (gmt 0)



If you've got command-line access and a combined log file then you can use:

$ awk -F[\"] '($6 ~ "Googlebot"){print $2}' yoursite-combined_log

flashfan

2:39 pm on Feb 25, 2004 (gmt 0)

10+ Year Member



Will the command work for this?

[i]
64.68.87.41 - - [22/Feb/2004:16:19:18 -0600] "GET /robots.txt HTTP/1.0" 200 78 "-" "Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html)" "-"
[/i]

dcrombie

2:58 pm on Feb 25, 2004 (gmt 0)



No, only where the string "Googlebot" appears. If you want to detect both Google indexers then you can reduce the string to "Google". If you only want to track the adsense crawler then change it to "Mediapartners" or similar.

The adsense crawler will go directly to any page where you are serving ads. The Googlebot spider will generally start from the home page and work inwards. It's not clear if/how they share their data but I have my suspicions ;)

jchance

4:58 pm on Mar 3, 2004 (gmt 0)

10+ Year Member



Is there no free software that will automate this?

All I want to know is when the spiders came and which files they spidered.

dkin

2:01 am on Mar 4, 2004 (gmt 0)

10+ Year Member



Try using this script [hotscripts.com...]
Anytime google comes to your site it will email you telling you when and what page it crawled. Hope it helps you.