Forum Moderators: DixonJones

Message Too Old, No Replies

How a bot visits

Any clues?

         

DerekH

7:07 pm on Jan 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've got access to Webalizer logs but not raw logs.
I can see limited information - for example I can see the 20 most active IP addresses that visit me - always different bots rather than human IP addresses, and not surprising.

Webalizer gives me summary stats including "visits" (a count of the number of times an IP address carries out one or more page fetches in a time window).

I also get "entry" pages - the first page in a visit. These are quite interesting...

I'm trying to reconcile how human visitors come into the site (front door or side door!) and the logs I have are necessarily contaminated with bot droppings.

In an effort to see behind the curtain, can anyone give me a bit more info about how a bot appears in the log...
When the bot is following links, do second and subsequent pages in a visit get requested with a referrer of the URL of the first page taken, or is the referrer always the bot?

It will help me work out whether the "entry" pages in Webalizer are as informative as I hope, or as useless as a chocolate teapot.

Thanks
DerekH

ncw164x

7:31 pm on Jan 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here is a list of a few I have seen today, most are friendly but a few just steal bandwidth

Googlebot/2.1 (+http*://www.google.com/bot.html)
Mozilla/5.0 (compatible; Yahoo! Slurp; http*://help.yahoo.com/help/us/ysearch/slurp)
Yahoo-MMCrawler/3.x (mms dash mmcrawler dash support at yahoo dash inc dot com)
msnbot/0.3 (+http*://search.msn.com/msnbot.htm)
appie 1.1 (www.walhello.com)
Mozilla/2.0 (compatible; Ask Jeeves/Teoma)
Seekbot/1.0 (http*://www.seekbot.net/bot.html) HTTPFetcher/0.3
http*://www.almaden.ibm.com/cs/crawler [bc20]
ia_archiver
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a3) Gecko/20040817
psbot/0.1 (+http*://www.picsearch.com/bot.html)
Mozilla/4.0 compatible ZyBorg/1.0 Dead Link Checker (wn.dlc@looksmart.net; http*://www.WISEnutbot.com)
ConveraCrawler/0.4
TurnitinBot/2.0 http*://www.turnitin.com/robot/crawlerinfo.html
ichiro/1.0 (ichiro@nttr.co.jp)
MJ12bot/v0.8.3 (http*://www.majestic12.co.uk/projects/dsearch/mj12bot.php?V=v0.8.3&NID=B0E44C4EE98B33C4&MID=EE1DD60ABC2AE863&BID=0D2F47BADD52ECA93161EE0C2C18F4B4

DerekH

7:48 pm on Jan 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks, but that wasn't really what I was looking for.
I'll try to say it a different way...

With Webalizer I can't do much tracking of visitors round the site, but I do get a very large list of the pages where a visitor entered the site. Each visit (each collection of pages taken in a contiguous band of time) has an entry page.

My question - when a bot collects a number of pages by following links, does the referrer on the second page it takes look like someone's clicked on the first page it took? Or does it look like a brand new access to the site?

Is that any clearer? I hate email sometimes <grin>
DerekH

ncw164x

8:01 pm on Jan 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It looks like a brand new access to the site each time, I take it you are trying to see if it goes to page 1 then to page 2 and so on.

No it does not work that way, as an example download "xenu" and watch how the program traverses down you site. what will happen is it may do it in a certain order until it get to a page what is loading a little slower than the rest, the spider will carry on even though the results have not been returned fetching other pages and then the slow loading page will catch up and be seen in the results, its hard to explain in text on a page but if you download the program (its free) you will see what I mean.

Lord Majestic

8:03 pm on Jan 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



MJ12bot/v0.8.3

I hope you counted us in the "friendly" category ;)

Or does it look like a brand new access to the site?

Most bots don't supply referer since they can get more than one page pointing to your site and referer kind of makes no sense in this case (PITA to program as well).

larryn

8:12 pm on Jan 23, 2005 (gmt 0)

10+ Year Member



Derek,

My experience is that bots tend to have no referrer, so that all the hits look direct. Also they don't usually bother with collateral files if that helps.

Larry

ncw164x

8:20 pm on Jan 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I hope you counted us in the "friendly" category

yes of course I did ;)

DerekH

9:13 pm on Jan 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ah - thanks chums!
What a great resource this is!

Your concensus tells me all I needed to know so that when I read my Webalizer files I can take account of the "visits" and "entry pages" in the correct way.
Your opinions do make a lot of sense in interpreting the stats.

Thanks
DerekH