Forum Moderators: open
I'm still in my rookie year of SEO and this is the first time I've encountered a need to learn more about googlebot and his/her behaviour. In the past, when I created a site, I would just link to it from all the pages I own (~3500 non-dynamic pages ranging in pagerank from 3-7). I don't run a linkfarm, I just build sites all within a niche so the new sites I create are purposefully on topic. Google almost immediately picked up and indexed any new sites so I've never thought much about it, in fact I haven't yet considered using a robots.txt either.
I've started another thread here that is similar to this one which gave me some really good suggestions. However I think the point I'm looking for in this thread is significantly off topic of the last so I wanted to start a new one fresh.
I currently use 3 different types of software to measure my site's statistics; (1) Awstats, (2) Webalizer, and (3) Analog stats. Since my sites release 3 weeks ago, I've used these traffic measures to determine that googlebot (I do not know which ip) has visited my site. Beyond that I know very little. Namely, all awstats tells me is this:
Googlebot (Google) 70 476.01 KB 25 Sep 2003 - 16:01
I also know that various subdomain googlebots have also visited my site over the last few weeks:
crawler8.googlebot.com
crawler9.googlebot.com
crawler14.googlebot.com
etc.
What I do not know, but have been asked about is, what are these bots looking at? When they reach my page are they indexing pages or are they leaving? I understand that the deep google bot seems to have been merged with freshbot so I can't check that way anymore.
How can I learn more about googlebots behaviour while it is on my site? This will give me a better ability to troubleshoot through some of the suggestions I've read here on WebmasterWorld.
Thanks,
Chris
64.68.82.167 - - [26/Sep/2003:09:49:42 +1000] "GET /robots.txt HTTP/1.0" 404 288 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
This way you can track it with your own eyes.
Ask your provider whether you have access to the original log files.
Scarey stuff too, some of the requests actually carry user login IDs.
I can see why a separate program would be needed and fast, with hardly any traffic to the site I'd already like to have some. Can you recommend some software tailored specifically to handle log files? Either something that parses the file directly on the server or alternatively by opening a raw access file on my home machine.. I'll take a quick peak on google to see what's available as well.
Claus, yes I got it open btw in excel but it's a bit of a mess :P
If you have lots of subdomains (virtual hosts) in your logs, it helps to grep them too:
cat access_log¦grep "subdomain"¦grep "Googlebot"
For the overview I use webalizer.
Though a tool would be nice (for free of course :) that can track individual IPs and their behavior by mouse click so you could trace their path through your site :) [based on log files]
This way you wouldn't have to do the grepping.
It's easier to do than to describe. I think you'll like it.
Barry Welford
just remember to define what you mean my "pageview" and filter out the jpgs, gifs, non-content scripts, js, ico, etc etc and it becomes much more manageable.
Also analog is infinitely flexible if you take the time to read the manual. you can create separate reports for example which just count googlebot variant visits.
Personally, i would recommend you to look into the Excel feature "Pivot Tables" - not just for log files, but for all kinds of excel based analysis. It's a vey valuable tool, and it can collect and aggregate data from more than one sheet (giving you more than 65,000 lines to work with).
If/when your files get longer you will have to:
(a) Importing raw logs into Excel
[webmasterworld.com...]
Once in a while a (usually long) thread is started with suggestions on software, and between those a few good ones also show up. Here are 10 relevant ones i could find within the last 20 or so pages (longer threads in bold):
(1) Poll: What web stats service do you use?
[webmasterworld.com...]
(2) Web statistics
[webmasterworld.com...]
(3) Reading Access Logs
[webmasterworld.com...]
(4) How to open a 500MB log file?
[webmasterworld.com...]
(5) Log Analysis vs. Outsourced (3rd Party)
[webmasterworld.com...]
(6) Log Analysis
[webmasterworld.com...]
(7) What is the best choice today regarding log analysis software?
[webmasterworld.com...]
(8) Free Log Analyzer/Stats Program
[webmasterworld.com...]
(9) Best text editor for large log files
[webmasterworld.com...]
(10) A satisfactory stats package
[webmasterworld.com...]
Link no #1 gets into a discussion around page 7-8 and on page 8, post #113 i've explained my own take on log files. Essentially, they are needed to study spider/bot behavior but they are not very good for studying human behavior.
/claus