|If-Modified-Since HTTP Header: Complete Confusion.|
Can't find example for beginners.
I have recently submitted my website to Google and whilst I am waiting for it to be indexed (six days so far!) I have been reading through their guidelines in the Webmaster Tools information.
They mention using the If-Modified-Since header. I have read and understood what it does, but I can't seem to find any simple examples of how to use it, including exactly where to put it and how to get it to work. I would really like to use this feature.
Do I need to put something on every web page? Is it that big block of code I have seen in some examples, or the simple line (time and date) that I have seen in other examples?
Bizarrely there doesn't seem to be a simple explanation of how to implement it anywhere on the internet. I can find dozens of references to it, including examples of what the code looks like, but nothing that puts it into context and tells you where to put the code and how to get it to work.
Does anyone know of any links to a good explanation for a beginner who wants to use this on their website?
Thanks very much,
I'm wondering if there is actually some system already working. In my Visitor Statistics for my website's control panel it shows quite a few entries for 'Code 304 - Not Modified'.
Does that mean that something is telling crawlers/robots that pages haven't been modified since their last visit? Is this the If-Modified-Since system in operation? If so, I haven't added anything to my pages to make it work.
Still confused here!
HTTP clients (browsers and robots) send the If-Modified-Since HTTP request header, indicating the date and time that they last cached a copy of the requested resource (page, image, etc.) at the URL they are requesting.
The server looks at this date and time, and compares it to the Last-Modified date and time on the file to which the requested URL resolves. If using static files for each page and object, this is the timestamp on the file itself, available from the filesystem. If using a script-based page-generation scheme then things get more complicated, but in correctly-implemented cases, the last-modified date and time can be retrieved from the database that is used to generate pages.
If the resource's Last-Modified date/time is later than date and time sent by the client in its If-Modified-Since request header, then the server will send the new resource contents, a 200-OK status, and a new/updated Last-Modified header. If the resource has not been modified since the client last cached it, then the server sends no content, but simply replies with a 304-Not Modified header. This saves transmission time and bandwidth, which is the purpose of the system.
Handling If-Modified-Since and Last-Modified can be done either by the server itself for static resources, or by the scripts used to generate dynamic pages, or either as needed.
[added] If you want the details, read RFC2616 -- Hypertext Transfer Protocol -- HTTP/1.1 [w3.org] -- Every Webmaster should review this document at least once, even if it isn't particularly-easy reading, as it specifies 'the rules' we have to work within. [/added]
[edited by: jdMorgan at 7:27 pm (utc) on Dec. 28, 2008]
Thanks for your reply Jim :-)
It's a little bit beyond me at the moment, but I think I understand.
My site uses static files, so am I right in assuming that I don't need to do anything more to prevent robots from constantly crawling my site and wasting bandwidth on pages that haven't changed?
As far as I can see there, everything that needs to happen will happen. If I have uploaded a modified page then the request will return a different date and time, so modified pages will be indexed/sent. If I haven't modified the page then the sever will sent a 304 Not Modified, won't it?
From what I read on Google and what I read via searches it initially appeared that I had to do something to my pages to get it all to work!
Thanks very much,
There's not one always-true answer. Your server has to be properly configured in order for it to work, and it doesn't just happen by magic on all servers.
Use the "Live HTTP Headers" or a similar add-on for Firefox to inspect requests from your browser to your server and your server's responses. Take care to flush your browser's cache when needed in your tests to get valid results. For example:
1) Flush browser cache.
2) Request a valid page URL. Server should always respond with 200-OK.
3) Request same page again. Browser should send If-Modified-Since and server should respond with 304-Not Modified.
4) Change the page on the server.
5) Request the same URL as before, browser should send If-Modified-Since, server should respond with 200-OK and new page content, and new Last-Modified timestamp.
Note that if your server is configured to send HTTP Cache-Control: "no-cache" headers, it may tell the browser never to cache the page, in which case you won't see the If-Modified-Since request headers or the 304-Not Modified responses. If your server is configured to send "Expires" headers, then your browser may not even send a request to your server when you (user) request a page -- It will probably just serve the page from cache until the server-specified Expires time is reached.
It's all highly-complicated but in fact, it all does make sense when one realizes that in addition to browser caches, there may be additional network caches between the client and the server. All of the complications are intended to allow caches to work together to save network and server bandwidth.
Thanks again Jim.
I'll need to do a lot more reading on this subject by the look of it!
The way this If-Modified-Since system was mentioned casually in the Google information, with no additional guide, gave the impression that it would be a lot easier to sort out than it is!