Forum Moderators: Robert Charlton & goodroi
IIS allows you to set a content expiration on the site, however it add two headers:
Cache-control: (number)
Expires: (date)
I want to set the cache limit for the South Africa cache to six hours, but I am worried that Google will interpret this to it's own cache too, and refresh their cache of out site every six hours. That would be crazy. The question is what HTTP content expiration headers does Google interpret for their cache?
I deliberately set relatively short cache times (~1hr for most dynamic pages) to get browsers to reload from time to time, and it does not seem to hurt (have been doing it since 1997!).
However:
1) IE6 cacheing is completely broken. Sometimes it respects the Cache-Control: and Expires: headers (BTW, they are more complex than you think, look up their correct use on www.w3.org), and sometimes it resurrects versions of pages months older than the one it just retrieved. Just had a major problem with an IE6 browser on ME last night as it happens. So, for IE users, recite after me "when something funny happens, clear the cache, close *ALL* IE windows, and restart IE".
2) When the visitor is *clearly* a bot, I set a much longer expiry time (30 days+) with the Cache-Control and Expires: headers, and I add a "Revisit-After" header too (though I doubt many bots take any notice). I *also* add text to the foot of the page in some cases warning the user that they may have a stale page and to hit RELOAD.
3) You may also want to set the Vary: header if the content depends on something like the user's locale or browser, to help keep aggressive caches in check.
4) You can resort to cache-busting techniques such as appending "?rnd=largerandomnumber" to your URLs to force refetches (though turn it off for SE bots).
5) Your client's ISP needs to get its cache fixed or removed; reverse or "transparent" caches almost never are, and in the past I've threatened an upstream ISP with legal action if they didn't remove one they installed without notice since it intefered with security and tracking and was a form of "attack" and was not contractually permitted (they removed the cache forthwith!).
Rgds
Damon
Here's one set (the set I serve to obvious spiders, such as G):
Expires: Mon, 12 Dec 2005 14:20:20 GMT
Content-Language: en
Date: Tue, 13 Sep 2005 14:20:20 GMT
Cache-Control: max-age=7776000
Vary: Accept-Language
I reduce the expiry (ie Expires and Cache-Control) to ~1hr for other vistors for most pages.
Rgds
Damon
There are at least three ways of detecting a bot:
1) IP address (or DNS name, by reverse lookup).
2) User-Agent string.
3) Referer string (usually absent for bots, though sometimes absent for human vistors too).
I don't use (2) since it is easily forged one way or another, or may simply change.
I use a combination of (1) and (3), with almost no manual maintenance except to react to warnings in my logs every few months.
Rgds
Damon