Forum Moderators: DixonJones
Webalizer gives me summary stats including "visits" (a count of the number of times an IP address carries out one or more page fetches in a time window).
I also get "entry" pages - the first page in a visit. These are quite interesting...
I'm trying to reconcile how human visitors come into the site (front door or side door!) and the logs I have are necessarily contaminated with bot droppings.
In an effort to see behind the curtain, can anyone give me a bit more info about how a bot appears in the log...
When the bot is following links, do second and subsequent pages in a visit get requested with a referrer of the URL of the first page taken, or is the referrer always the bot?
It will help me work out whether the "entry" pages in Webalizer are as informative as I hope, or as useless as a chocolate teapot.
Thanks
DerekH
Googlebot/2.1 (+http*://www.google.com/bot.html)
Mozilla/5.0 (compatible; Yahoo! Slurp; http*://help.yahoo.com/help/us/ysearch/slurp)
Yahoo-MMCrawler/3.x (mms dash mmcrawler dash support at yahoo dash inc dot com)
msnbot/0.3 (+http*://search.msn.com/msnbot.htm)
appie 1.1 (www.walhello.com)
Mozilla/2.0 (compatible; Ask Jeeves/Teoma)
Seekbot/1.0 (http*://www.seekbot.net/bot.html) HTTPFetcher/0.3
http*://www.almaden.ibm.com/cs/crawler [bc20]
ia_archiver
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a3) Gecko/20040817
psbot/0.1 (+http*://www.picsearch.com/bot.html)
Mozilla/4.0 compatible ZyBorg/1.0 Dead Link Checker (wn.dlc@looksmart.net; http*://www.WISEnutbot.com)
ConveraCrawler/0.4
TurnitinBot/2.0 http*://www.turnitin.com/robot/crawlerinfo.html
ichiro/1.0 (ichiro@nttr.co.jp)
MJ12bot/v0.8.3 (http*://www.majestic12.co.uk/projects/dsearch/mj12bot.php?V=v0.8.3&NID=B0E44C4EE98B33C4&MID=EE1DD60ABC2AE863&BID=0D2F47BADD52ECA93161EE0C2C18F4B4
With Webalizer I can't do much tracking of visitors round the site, but I do get a very large list of the pages where a visitor entered the site. Each visit (each collection of pages taken in a contiguous band of time) has an entry page.
My question - when a bot collects a number of pages by following links, does the referrer on the second page it takes look like someone's clicked on the first page it took? Or does it look like a brand new access to the site?
Is that any clearer? I hate email sometimes <grin>
DerekH
No it does not work that way, as an example download "xenu" and watch how the program traverses down you site. what will happen is it may do it in a certain order until it get to a page what is loading a little slower than the rest, the spider will carry on even though the results have not been returned fetching other pages and then the slow loading page will catch up and be seen in the results, its hard to explain in text on a page but if you download the program (its free) you will see what I mean.