Forum Moderators: open

Message Too Old, No Replies

If-modified-since, 304, 200

GoogleBot behaves weird - or is it my server?

         

RonPK

2:50 pm on Oct 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In my server's access logs I'm seeing this happening a lot:

www.mysite.tld 66.249.64.195 - - [26/Oct/2004:16:11:05 +0200] "GET /widgets.html HTTP/1.0" 304 0 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"
www.mysite.tld 66.249.64.195 - - [26/Oct/2004:16:11:05 +0200] "GET /widgets.html HTTP/1.0" 200 2981 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

First the bot gets a 304 (Not modified), which is normal behavior (read this thread [webmasterworld.com] for more info). But a split second later, from the same IP address, it says 'gimme that file anyway'.

This happens sometimes. On other occasions the bot seems happy with the 304 and does not return immediately.

It doesn't make any sense to me, so hopefully someone knows why this happens?

g1smd

9:59 pm on Oct 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



With the ability of PHP to do all sorts of "clever scripting" maybe they still request a page that YOU say isn't modified, to check that you really have NOT modified it from what they have already indexed. Maybe some clever people write a page full of spam that indexes well, then replace it with something else that they really want visitors to see, while all the while telling Google that the "Page is Not Modified". Whenever you see Google doing something odd, think about what sort of spam it is that they might be checking for. I would think that search engine algorithm development is mostly about combatting spam more than anything else these days.

RonPK

2:30 pm on Oct 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks g1smd, I guess that might be it. Not that I'm fooling Google but that they're checking.

For the record: the log sample I showed refers to perfectly normal html-files. On some dynamically generated pages I do indeed compare the if-modified-since date with the last-modified field of that page's database record. Works like a charm and saves me some bandwidth while I'm waiting for myself to implement static publishing.