Pragma & Cache-Control

Forum Moderators: goodroi

Message Too Old, No Replies

Pragma & Cache-Control

Are these Safe with Search Engines

stuart

8:28 am on May 26, 2003 (gmt 0)

Is this code safe in the <head> section, put it up ages ago to try and refresh pages each time customers look at them plus stop ISP's caching pages that might be updated that day. But could it be doing any harm? Or is there a better method?

TIA for any ideas.
stuart

jdMorgan

5:21 pm on May 26, 2003 (gmt 0)

stuart,

The "best way" to do this is to configure your server to return proper last-modified and cache-control headers. The methods used to do this depend on what server your site is hosted on.

Jim

stuart

2:56 pm on May 27, 2003 (gmt 0)

Thanks Jim, much appreciated. I'll most likely be back once I find out!

stuart

jdMorgan

3:20 pm on May 27, 2003 (gmt 0)

stuart,

Here's a quick way [webmasterworld.com] to find out.

Jim

stuart

3:34 pm on May 27, 2003 (gmt 0)

Am I pushing my luck asking where to go from here?

HTTP/1.1 200 OK
Date: Tue, 27 May 2003 16:41:57 GMT
Server: Apache/1.3.27 (Unix) mod_perl/1.26 mod_throttle/3.1.2 PHP/4.2.2 FrontPage/4.0.4.3 mod_ssl/2.8.11 OpenSSL/0.9.6f
Last-Modified: Sun, 18 May 2003 03:11:24 GMT
Accept-Ranges: bytes
Content-Length: 8573
Connection: close
Content-Type: text/html

jdMorgan

4:01 pm on May 27, 2003 (gmt 0)

stuart,

The following code, placed in your web-root .htaccess file, will set the no-cache and must-revalidate headers for two files called "test.html" and "eval.html", making them uncacheable. It will also declare a default setting for files for which you do not explicitly declare a caching policy, and set your robots.txt file to expire after two hours.

The syntax used for FilesMatch patterns is regular expressions syntax. Be aware that the construct "(test�eval)" means "test OR eval" and that the vertical pipe "�" symbol is modified by posting in this forum - You must replace it with the solid vertical pipe character from your keyboard.

You can check that this code is working properly by using the WebmasterWorld header checker or the cacheability checker cited below after you modify/install it.


# Set up Cache Control headers
ExpiresActive On
# Default - Set http header to expire everything 1 week from last access, set must-revalidate
ExpiresDefault A604800
Header append Cache-Control: "must-revalidate"
# Apply a customized Cache-Control header to frequently-updated files
<FilesMatch "^(test�eval)\.html$">
ExpiresDefault A1
Header unset Cache-Control:
Header append Cache-Control: "no-cache, must-revalidate"
</FilesMatch>
</FilesMatch>
<FilesMatch "^robots\.txt$">
ExpiresDefault A7200
</FilesMatch>

Ref: mod_headers [httpd.apache.org], mod_expires [httpd.apache.org], FilesMatch [httpd.apache.org], regular expressions tutorial [etext.lib.virginia.edu], caching tutorial [mnot.net], cacheability checker [ircache.net]

HTH,
Jim

stuart

6:44 pm on May 27, 2003 (gmt 0)

Thanks Jim. Sorry to ask more but is this 100% safe with search engines? Its abit beyond me but have help at hand.

stuart

jdMorgan

8:04 pm on May 27, 2003 (gmt 0)

stuart,

Search engines don't care* about no-cache and must-revalidate. They use the word "cache" in a different way - they really mean "temporary archive." This is evidenced by the fact that Google - for example - has a special on-page HTML <meta name="robots" content="noarchive"> tag that you must use if you don't want your page copied into the Google "cache." This is a completely different thing from proxy and browser caches.

I have used a more-complicated version of the code I posted for years to control browser and proxy caching of my sites. It hasn't bumped me out of the top three listings yet!

* One exception: Do not set the expires time of your robots.txt file to a very short time. Anything under 30 minutes might cause a search engine to "give up" on spidering your site. In many cases, search engines will fetch your robots.txt and then come back a while later to actually fetch files from your site. If the timeout is too short, they may not want to risk violating your robots.txt if they can't fetch your pages before your robots.txt expires. This is a worst-case scenario, though.

The code I posted has robots.txt expiring in two hours, but my sites change frequently. You can set it for a couple of days (172800) if this worries you. However, I once had it set to expire after one second, and it stayed that way for a month. The only deleterious effect I saw was that the robots would fetch robots.txt, then a single file, then fetch robots.txt again, then another file, etc. I noted that they did not give up, even though the robots.txt file expired several seconds before they fetched a content file. So they were "forgiving" of my dumb mistake - thankfully.

Again, I've had absolutely no trouble and don't expect any, since my usage conforms with the specifications of how it's all supposed to work.

HTH,
Jim

stuart

8:32 pm on May 27, 2003 (gmt 0)

Fantastic information Jim, thank you. Another arrow to the bow.

stuart