Forum Moderators: open
I tested this by sending requests for pages using an If-modified-since header, and as expected, 304 was returned if IMS date was later than the last modified date of the .shtml.
Twice on the 10th, once on the 11th and once on the 12th Googlebot has crawled this pages, and each time the pages have been returned in full with "200 OK".
Is Googlebot not sending IMS headers for .shtml pages? If it is, why aren't my pages being sent as 304 Not Modified?
I could rename all my files to .html and set the user execute bit if I have to but I'd rather clear this up first.
I'm not sure I see what you are explaining or asking there. The fact the pages are ssi, isn't really germane to any discussion about GoogleBot (hint, the page you are currently reading is partially ssi)
The bot pulls the same thing as a browser. What you see in the browser for source is what the bot sees.
The XBitHack directives controls the parsing of ordinary html documents. This directive only affects files associated with the MIME type text/html.
I don't think the XBitHack has any effect on .shtml pages, only .html pages. Check the headers your server is sending to see if it actually is sending Last-Modified: headers.
Also, could Google have downloaded the pages before you turned XBitHack on? If so, Google might not have had a date to use for a If-modified-since conditional GET.
First, welcome to WebmasterWorld.
Many thanks :)
I don't think the XBitHack has any effect on .shtml pages, only .html pages. Check the headers your server is sending to see if it actually is sending Last-Modified: headers.
The MIME type of SSI is also text/html. XBitHack most certainly does work on .shtml, although the user execute bit does not need setting. The group execute bit turns on Last-modified and IMS.
Also, could Google have downloaded the pages before you turned XBitHack on? If so, Google might not have had a date to use for a If-modified-since conditional GET.
As my post stated, googlebot has visited 4 times since I configured this.
I have also sent IMS headers to my shtml pages and received 304 responses.
No, G does not send IMS headers at all.
At all? IMS headers are certainly sent by G to php pages, it saves me a huge amount of bandwidth.
(Apologies for mixing quotes, I wanted to make one reply :) )
Last-Modified: Sat, 27 Sep 2003 05:58:29 GMT
Geting the permission setting right was the challenging part for me when I was trying to figure out how to have IMS on .shtml files.
[simon.incutio.com...]
I set up a .shtml page which logs the headers which are sent to it. In Googlebot's FIRST visit it sent an IMS header of "Tue, 22 Jun 2004 23:16:24 GMT".
Why on earth would G send IMS headers for a page it's never crawled before? The only reason I can think of is that G only wanted to find out what was in that page if it was created after 22 Jun which seems an odd approach. You'd expect G to be a little more curious ;)
Having said that, I have seen G send IMS, get a 304 reply, then immediately send the same GET with no IMS so it might have done this...
Well if anything it shows that G does send IMS to shtml but I'll have to see what happens on the next few crawls.
Note: When handling an If-Modified-Since header field, some servers will use an exact date comparison function, rather than a less-than function, for deciding whether to send a 304 (Not Modified) response. To get best results when sending an If-Modified-Since header field for cache validation, clients are advised to use the exact date string received in a previous Last-Modified header field whenever possible.
So if Google's only wanted to see if the page was modified after an arbitrary date, it's unlikely to be reliable. It would be far better to look at the Last-Modified date (either from a GET or HEAD request).
This means that you *must* use a less-than comparison in scripts as opposed to a straight comparison with it's own Last Modified value. I think the IMS G sends is roughly the date the page was last crawled.