Forum Moderators: DixonJones

Message Too Old, No Replies

What constitutes a visit in a web log file?

         

guybroster

2:51 pm on Jan 8, 2003 (gmt 0)

10+ Year Member



Hi all,

I'm currently playing with IIS log files to run up a few custom reports. I was thinking about putting together a visits reports and was wondering if there is any standard on what constitutes a visit in terms of the data available in a standard log file. When do I class hits from a particular user as a separate visit? Is there some kind of time between hits that marks this separation? Does anywhere know of a resource that discusses this?

Thanks for any help.

Guy

ken_b

6:45 pm on Jan 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As I understand it, 30 minutes between hits from a given IP number seems to be frequently used for calling it a new visit.

But that's just a general guideline and apparently you can use any number you choose.

Also, it seems that counting visits can be problematic for a number of other reasons.

For example, as I understand it, AOL may assign a new IP number for each hit, making it pretty hard to count visits from AOL users.

Others here may be able to give a more definitive answer.

chiyo

6:59 pm on Jan 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No definitive answer, and im no expert, but i think its either FastStats or Webtrends that uses (or used in previous versions) 30 minutes as the "cut off". Whatever it is, it is a totally arbitrary figure. It is extremely difficult to say what a visit is because the raw data that analysers work with is just so poor. As Kenb says caching at various points reduces the data to nonsense. The best but still very imperfect way is to forget about inferring sessions or visits from logs altogether, and just use page views and unique server visits over a certain period of time. Just remember that a server visit is not ncessarily a person visit. Just for an example a major minority of our hits comes from SE spiders, email address slurpers, page checkers, link checkers, and heaps of robots on the rampage.

TheDave

3:35 am on Jan 9, 2003 (gmt 0)

10+ Year Member



I'm just writing a little stats program for myself at the moment. First I cull out everything that I know is a bot or myself (by my own IP). I use the first 2 parts of the IP and check the browsers match, and I'm about to start looking at referal data, to try and pick out visitors. So far I havn't seen any obscurities in the browsing patterns, like someone who's looking at a product and suddenly they're off reading about something which wasn't linked to from that product, so it seems to be working fairly accurately for me, but my site doesn't have hords of traffic.

guybroster

8:42 am on Jan 9, 2003 (gmt 0)

10+ Year Member



I have no problem with traffic from Bots and developers as I can easily filter that kind of stuff out of the log data. I was more interested in something like the length of time that a user had to be dormant for their next hit to be counted as a visit.

I checked an install of webtrends and it does indeed say that it uses a break of 30 minutes, though I would have thought using the default session timeout of 20 minutes would make more sense!

I guess if other well respected and widely used tools go with the 30 minute break then I should follow suit.

Thanks,
Guy

vitaplease

8:46 am on Jan 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



guybroster

maybe this thread will help out as well:

[webmasterworld.com...]

Sinner_G

8:55 am on Jan 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



30 minutes between 2 hits seems awfully long to me. I mean, are visitors supposed to take a coffe break between 2 clicks? On a site heavily visited by e.g. AOL users, wouldn't that make only one visit, since they come in with only a few different IPs?

cornwall

9:10 am on Jan 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As others have said, the data at best can only give you partial information

That is you can compare one month (say) with another and see if visits in total are up or down in percentage terms, if visits from one referrer are up or down in percentage terms, etc. In other words the log file results are relative, not absolute. TRhey give you a trend.

But the log file analysis will not/cannot give you an absolute figure for the number of visits, or where they came from.

Caching generally appears to reduce the number of actual visits by around 25%. And on top of that you have to consider the hows and whys of referrals from "no referrer" and "mydomaine.com" and where those people actually came from.

With that amount of uncertainty in the system, I think I would go with the half hour cut off!