Forum Moderators: DixonJones

Message Too Old, No Replies

tracking spiders or bots

how to analyze bots crawls through web site

         

canadaeh

2:52 pm on Feb 24, 2003 (gmt 0)

10+ Year Member



Hi, does anyone know of a method of tracking googlebot's crawling of a Web site? I want to do it through log file analysis and I want to understand why it requests the pages it does. Is Googlebot a focused crawler?

lorax

3:08 pm on Feb 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



canadaeh,

>> method of tracking googlebot's crawling of a Web site

There are a variety of log analysis tools available to review logs. I use NetTracker because it does allow me to follow a particular user agent (a bot registers a user agent just like a browser does).

>> Is Googlebot a focused crawler?

Not sure what you mean by this. Googlebot can be found under a variety of IPs and there are two versions of the bot: Fresh and DeepCrawler. A search here on WebmasterWorld should turn up more than you ever wanted to know about Google - and read this [searchengineworld.com]. ;)

canadaeh

3:53 pm on Feb 24, 2003 (gmt 0)

10+ Year Member



Thanks Iorax. I use WebTrends but it doesn't give the path very well. The filters aren't that agile.

A focused crawl is where the robot enters a site looking for pages on a particular theme. Lets say the inbound link's anchor text says "log analyzers." Focused crawlers would only follow links that are relevant to those keywords, although each search engine may have different "focuses" for a visit. I wonder if Freshbot is a focused crawler and what its mission is each time out.

According to some postings I remember seeing last year, some Webmasters seem aware of which pages the crawlers were picking up. So I'm wondering if any of them have some insight into the robot's purpose for a specific visit and which pages get picked up.

lorax

5:28 pm on Feb 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ah, now I see what you mean. While I'm not sure, it would make sense that Google is somewhat focused when it comes to visit. Why else would it follow some links and ignore others? How can you find out what it's after - that's a 64 million dollar question. Wish I had the answer.