Forum Moderators: phranque
Anyway - looking through my raw log files I see 99% of requests are from GoogleBot are GETs:
64.22.143.239 - - [09/Oct/2009:04:27:42 -0400] "HEAD /my/page/ HTTP/1.1" 200 186 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
However, I also see there are a few HEAD requests for the same page:
64.22.143.239 - - [09/Oct/2009:04:27:42 -0400] "GET /my/page/ HTTP/1.1" 200 47328 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Are these HEAD requests simply asking for the HTTP HEADER response only?
They also use what's called a 'conditional GET' for the same purpose; They send a GET, but with an "If-Modified-Since" header. If the page has not changed since the date and time sent in that header, then your sever will (should) return a 304-Not Modified response. So, they may be comparing the results of these two kinds of bandwidth-saving methods to verify that your server is properly configured and that they can use whichever method they prefer to check for updates on your site.
Jim