Forum Moderators: open

Message Too Old, No Replies

Googlebot and HTTP_IF_MODIFIED_SINCE

Googlebot is not telling the truth

         

leifwessman

10:58 am on May 23, 2003 (gmt 0)

10+ Year Member




I have a new website that Googlebot is visiting for the first time. I'm logging everything - so I'm completely sure when saying that Googlebot is sending me wrong HTTP_IF_MODIFIED_SINCE headers.

when requesting my brand new pages it sends
HTTP_IF_MODIFIED_SINCE=Mon, 10 Mar 2003 10:00:00 GMT

But Googlebot is not telling the truth! 10th of March my new domain wasn't even registered!

Does anyone else have any exprecience of this?

Leif

HitProf

11:12 am on May 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello leifwessman,

What's the problem with that? If your pages are modified after that date (which they are if the site is brand new) then they will be served to Google and that's exactly what you want.

leifwessman

11:20 am on May 23, 2003 (gmt 0)

10+ Year Member




I have a dynamic website. I don't want to serve Googlebot pages unless it's really necessary (to save bandwidth). I look at the HTTP_IF_MODIFIED_SINCE, if the bot (any bot) is sending me one, then I'm returning

header("Status: 304 Not Modified")

Therefore it's really important that the bot is telling the truth, or else it will not get my pages. If it turns out that the bot sending the header wrongly, I would have to check by myself if the bot is telling the truth or not. that would be a very time consuming task.

I think that everyone should expect that if a bot is sending a HTTP_IF_MODIFIED_SINCE date, then it already have a copy of the webpage. That's the whole point!

Leif

ruserious

11:30 am on May 23, 2003 (gmt 0)

10+ Year Member



Sorry if it sounds harsh, but it seems to me it is poorly implemented in your site. You say:
I look at the HTTP_IF_MODIFIED_SINCE, if the bot (any bot) is sending me one, then I'm returning header("Status: 304 Not Modified")

That is wrong, you should not just look wether it is sent or not, you should compare the date it sends after the tag to the date of the last time your content/page changed.

Of course the bot may also be doing a mistake, maybe intentionally (They always send the "time" of the last major crawl or sth. like that, simply for performance reasons) or maybe in connection with the grand change that is taking place right now.

I guess we learn another one of life's standard lessons: If two parties rely on unproved assumptions when they meet, they might both be disappointed... ;)

madweb

2:36 pm on May 23, 2003 (gmt 0)

10+ Year Member



I think you have misunderstood the meaning of this header. Google is not lying, it is not saying anything about your site, it is just asking a question.

GoogleGuy

4:12 pm on May 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Right. Googlebot is probably using a safe date from before the previous crawl in order to let your web server say "yes, I have a new page" or "no, I don't." If you're using Apache with static pages, it can just use the file-modified time to answer this correctly. In your case, since you have a dynamic site, the burden is on you to see if the dyanamic page has changed since that date.

If-modified-since is a little bit of trouble to set up for dynamic sites (not so hard for static sites), but it's definitely worth it. If Googlebot can see that it doesn't need to crawl an older page, that frees up resources to get a new page.