Forum Moderators: phranque
I'm looking through my logs and see that the exact URI has not been recorded. For example, if Google requests the www. version of a URL, I 301 redirect to the non-www. version but all I see in this instance is:
GET / HTTP/1.1 301
When I'd really like to see:
GET [mysite.tld...] HTTP/1.1" 301
Is this possible?
Remember, HTTP clients (browsers, robots) don't request "mysite.tld/page.abc", they send a request for "/page.abc" to the IP address returned for a DNS lookup of "mysite.tld". In HTTP/1.1 (but not in true HTTP/1.0), the client sends a separate HTTP request header containing the hostname, in this case, sent as "Host: mysite.tld". This host header is not needed unless the server is a name-based virtual host, otherwise, it is redundant with the fact that the request has already arrived at this server at this IP address. On name-based virtual hosts, the Host: header is used to 'sort out' which of the name-based virtual hosts at this IP address is being requested to service this request.
So what you see logged in your raw server access log is the actual request line sent by the client, including HTTP method (GET, HEAD, PUT, etc.) the URL-path, and the HTTP protocol version, as in your example log-line above.
If you'd like to see HTTP transactions in action, try the "Live HTTP Headers" add-on for Firefox/Mozilla browsers.
Jim
The problem I'm having is that the Google 'crawler' keeps requesting (what I see as) the same page but each returns a different response/file size:
64.22.143.239 - 09/Oct/2009:04:27:42 - /widget/blue/ - 301 235
64.22.143.239 - 09/Oct/2009:04:27:42 - /widget/blue/ - 200 185
64.22.143.239 - 09/Oct/2009:04:27:42 - /widget/blue/ - 200 47325