Forum Moderators: DixonJones
But now as I venture further into the bowels of the whole spectrum of the WWW, in looking at this particular log, I'm trying to figure what goes on in the compilation of the analysis. Below is what I have been pondering over.
000.000.00.0 - - [10/Dec/2002:00:13:13 -0800] "GET /pagename.html HTTP/1.0" 200 12339 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
From my uneducated guess, I interpret the above as,
000.000.00.0 = computer address
[10/Dec/2002:00:13:13 -0800] = date & time this page was probed or visited.
pagename= url
1.0=dtd
200=not updated
12339=looks like a month or something.
Googlebot=presence
next 9 lines
data changed for obvious reasons.
000.00.00.00 - - [10/Dec/2002:00:09:44 -0800] "GET /pagename.html HTTP/1.0" 200 15737 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)"
000.00.00.00 - - [10/Dec/2002:00:09:49 -0800] "GET /images/contents.jpg HTTP/1.0" 200 21981 "http://myname.com/pagename.html" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)"
00.00.000.00 - - [10/Dec/2002:00:09:50 -0800] "GET /images/image2.jpg HTTP/1.0" 200 12583 "http://myname.com/pagename.html" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)"
000.00.000.00 - - [10/Dec/2002:00:09:51 -0800] "GET /images/rule16.gif HTTP/1.0" 200 237 "http://myname.com/pagename.html" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)"
000.00.00.00 - - [10/Dec/2002:00:09:44 -0800] "GET /mystyle.css HTTP/1.0" 200 5809
What I'm trying to understand is first, the initial entry.
"Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
Were they looking at this page in particular, because someone did a search on a particular keyword, or was it a more comprehensive look at the page on their behalf for SE protocol in listing as far as pr goes?.
Secondly from the data provided, I see time dated stats wherein googlebot showed up first in the log, yet may have accessed the page/site after others, according to time stamped. Later an image, (because of it's size?) showed up on the log first, later the style sheet.
Is this an errant logging thing, or one of time, date/stamp, where one supersedes another because of hierarchy, size of file, or protocol?.
BTW
I've heard about Doctors without walls, but, I have never heard about the WWW education without walls. If there ever was one, this has got to be it.
Merry christmas, or whatever you celebrate. My wish for you beside being on the top ten of the more popular SE's is that you continue to help us, up coming wannabe's, so we can pass on what you have thaught us.
In the learning process.
Thanks everyone
Jaybee
Before I move your post to the Tracking and Logging forum ;) let me suggest this page [mach5.com] for an overview of log files.
From the log extract you included, I suspect that your host is only providing "basic logs." Referrer data is available in "extended logs." My host charges a one time $10 setup fee for this option, not sure how others do it. It's money well spent!
I went over the referral and understand precisely.
I do get a more comprehensive output of the daily
activity to the site. Everything is covered. But I
just cant understand, or perhaps I should just ignore
why, time wise, some stats appears in an untimely
order.
I'd like to believe because of bloated files, although
requested at the same time with smaller ones on the
same page, the smaller obviously would show up
first.
The thing that I'm looking at is the time difference
between the files as they appear in the log. The files
may be a bit bigger, but my thinking is, it shouldn't
jump out at you where seconds count, unless where
the different packaging route is concerned.
I guess my main concern is, how long the pages takes
to load in the browser.
As far as the first question is concerned, I'm just
curious as to whether that was a request or a probe.
jaybee
I'm not sure I understand your second question...
Googlebot visits normally show substantial time between requests to avoid overloading a site. I suspect that an "unthrottled" Google spider could overload many servers.
or maybe
A single visit to one page from a surfer can generate multiple log entries depending on the composition of the page. A page with six images will generate seven requests, one for the HTML and one for each image. In my logs, these are usually in chronological order. If yours aren't and your question is, "Why?" I don't know. Perhaps you have a much higher volume of traffic then me creating nearly simultaneous requests from multiple users. Perhaps it has to do with server setup.
Did I get close to answering your question? :)
But since you mentioned it, I remember seeing an anouncement about the host having problems with their logging system.
Thanks again
jaybee